Reinforcement Learning

A little reinforcement learning project. Learning by doing.

1) Usage:

$ python train.py --help
usage: train.py [-h] [--episodes EPISODES] --agent {q_learning,} --environment
                ENVIRONMENT [--hyperparams HYPERPARAMS] [--console_log]
                [--neptune_log]

Reinforcement learning trainer.

optional arguments:
  -h, --help            show this help message and exit
  --episodes EPISODES   (default=5000) The amount of episodes to train for.
  --agent {q_learning,}
                        The name of the agent to be used.
  --environment ENVIRONMENT
                        The environment to run the agent against.
  --hyperparams HYPERPARAMS
                        (default={}) JSON encoded hyperparameters, passed to
                        the agent.
  --console_log         Log training stats to console.
  --neptune_log         Log training stats to neptune.ai.

$ python train.py --agent q_learning --environment FrozenLake8x8-v0

2) Agents:

All agents must exent the Agent class. Valid parameters to the Agent class constructor are:

observation_space: Environment observation space (Discrete or Box).
action_space: Environment action space (Discrete. or Box)
hyperparams: Dictionary containing the agent hyperparameters.

2.1) Q-Learning

Tabular QLearning agent. Works for discrete state and action spaces.

Formulas:

The Q-Learning update function.

$Q^{new}(s_{t}, a_{t}) = Q(s_{t}, a_{t}) + \alpha * [ r_{t} + \gamma * maxQ_{a}(s_{t+1}, a) - Q(s_t, a_t) ]$

Hyperparams:

alpha: The learning rate. Prevents the algorithm from putting too much weight on random missteps.
gamma: Reward decay over time.
epsilon: Exploratory helper parameter.
epsilon_decay: epsilon decay factor: epsilon = epsilon * epsilon_decay.
epsilon_min: Minimum epsilon to be reached. Allows for continuous exploration.

2.2) Vanilla Policy Gradient

3) Environments

Currently the environments are passed into the OpenAI's gym.make function.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
output		output
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning

1) Usage:

2) Agents:

2.1) Q-Learning

2.2) Vanilla Policy Gradient

3) Environments

About

Uh oh!

Releases

Packages

Languages

gruberpatrick/reinforcement

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning

1) Usage:

2) Agents:

2.1) Q-Learning

2.2) Vanilla Policy Gradient

3) Environments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages