Reacher_Continuous_Control

Reacher Continuous Control using DDPG

Reacher Enviroment

1. Setup Environment and Install Requirements

The Project use the MLAngents version 0.4.0 so we need to do some following steps

1.1 Install Anaconda & Create virtual environment using conda

Install Anaconda following link bellow

https://www.anaconda.com/products/individual

Create virtual environment

# Create the virtual env DQN
conda create -n DQN_navigation python=3.6
# activate environment
source activate DQN_navigation

1.2 Install all dependence


# clone the udacity repo
git clone https://github.com/udacity/deep-reinforcement-learning.git

# go to the python folder of the repo
cd deep-reinforcement-learning/python

# install the unityagents package from this folder
pip install -e .

# git clone DQN_Navigation_Project
cd ..
https://github.com/TriKnight/Reacher_Continuous_Control

# install the requirements from our package
cd Reacher_Continuous_Control
pip install -r requirements.txt

1.3 Create and Adding Kernel to Jupyter notebook

conda install -c anaconda ipykernel
python -m ipykernel install  --user --name=DQN_navigation

Open Jupyter notebok.

jupyter notebook

Open Reacher_Continuous_Control
Change the Kernel Kernel/Change Kernel/DQN_navigation

2. Environment

Note that your project submission need only solve one of the two versions of the environment 1 agent and 20 agents

Getting Started

Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Version 1: One (1) Agent
  - Linux: click here
  - Mac OSX: click here
  - Windows (32-bit): click here
  - Windows (64-bit): click here
- Version 2: Twenty (20) Agents
  - Linux: click here
  - Mac OSX: click here
  - Windows (32-bit): click here
  - Windows (64-bit): click here

Option 1: Solve the First Version

The task is episodic, and in order to solve the environment, your agent must get an average score of +30 over 100 consecutive episodes.

Observations: Each agent receives an observation consisting of a 33-dimensional vector with measurements like relative position and orientations of the links, relative position of the goal and its speed, etc..

Actions: Each agent moves its arm around by applying actions consisting of 4 torques applied to each of the 2 actuated joints (2 torques per joint).

Rewards: Each agent gets a reward of +0.1 each step its end effector is within the limits of the goal. The environment is considered solved once the agent gets an average reward of +30 over 100 episodes.

Option 2: Solve the Second Version

The barrier for solving the second version of the environment is slightly different, to take into account the presence of many agents. In particular, your agents must get an average score of +30 (over 100 consecutive episodes, and over all agents). Specifically,

After each episode, we add up the rewards that each agent received (without discounting), to get a score for each agent. This yields 20 (potentially different) scores. We then take the average of these 20 scores.
This yields an average score for each episode (where the average is over all 20 agents).

The environment is considered solved, when the average (over 100 episodes) of those average scores is at least +30.

3. DDPG Algorithms

4. Solve the First Version Reacher One Arm

In this version we use Deep Deterministic Policy Gradient (DDPG) to solve problem.

DDPG is an off-policy algorithm.
DDPG can only be used for environments with continuous action spaces.
DDPG can be thought of as being deep Q-learning for continuous action spaces.
The Spinning Up implementation of DDPG does not support parallelization.

The Neural network of the Actor-Critic

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
misc		misc
Continuous_Control.ipynb		Continuous_Control.ipynb
DDPG_AGENT.py		DDPG_AGENT.py
README.md		README.md
Report.pdf		Report.pdf
checkpoint_actor.pth		checkpoint_actor.pth
checkpoint_critic.pth		checkpoint_critic.pth
model.py		model.py
unity-environment.log		unity-environment.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reacher_Continuous_Control

Reacher Enviroment

1. Setup Environment and Install Requirements

1.1 Install Anaconda & Create virtual environment using conda

1.2 Install all dependence

1.3 Create and Adding Kernel to Jupyter notebook

2. Environment

Getting Started

Option 1: Solve the First Version

Option 2: Solve the Second Version

3. DDPG Algorithms

4. Solve the First Version Reacher One Arm

References

About

Uh oh!

Releases

Packages

Languages

TriKnight/Reacher_Continuous_Control

Folders and files

Latest commit

History

Repository files navigation

Reacher_Continuous_Control

Reacher Enviroment

1. Setup Environment and Install Requirements

1.1 Install Anaconda & Create virtual environment using conda

1.2 Install all dependence

1.3 Create and Adding Kernel to Jupyter notebook

2. Environment

Getting Started

Option 1: Solve the First Version

Option 2: Solve the Second Version

3. DDPG Algorithms

4. Solve the First Version Reacher One Arm

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages