modular-dqn

This is an attempt to modularize many of the most effective extensions of the DQN algorithm so they can easily be mixed together. The code is written in TensorFlow. The goal is to easily test mixtures of extensions/algorithms like in the Rainbow paper, as well as to produce simple implementations of these algorithms for learning purposes. See the examples folder to see how the module is used. If you are looking for a more production ready library with many more features then try TensorForce

Features

DQN
Double Q-Learning
Prioritized Experience Replay
- Proporitional variant (tends to be favorable amongst most papers)
- Rank-Based variant (Not yet optimized with precomputed segments)
Dueling Networks
Multi-Step Learning
Distributional RL (C51)
Noisy Nets
Quantile Regression (QR-DQN)
NAF (Algorithm 1 only)

What's missing

Frame skipping
- Because of this I have not yet added easy support for convolutional layers. This can easily be done by using a custom network or by updating the code
- This can probably easily be supported by using OpenAI Baseline's frame skipping code
A few algorithms need to add support for gradient clipping and reward clipping
A better way to save/load a model
A logging system/TensorBoard support
Documentation
Probably a lot of things I'm not even aware of !

In the future

I hope to add the following:
- Hindsight Experience Replay (HER)
- Random Network Distillation (RND)
Add Horovod support for distributed training
If possible to modularize, I would like to extend this module to include policy gradient methods such as PPO, rather than just DQN
I would like to add support for PyTorch

Disclaimer

I don't have proof that any algorithm's implementation is correct. In implementing an algorithm I do the following:

Read the paper and any blogs on the paper and/or existing implementations I can find
Write my own implementation of the algorithm
Do basic math checks. For example, making sure the algorithm gives a proper probability distribution
Ensure the algorithm can gain a very high score on a basic OpenAI gym enviornment like CartPole or MountainCar
- This sometimes requires a few tries, as these algorithms seem to be highly reliant on initial conditions (as seems to be reported by many working in the field)

In addition, the API will most likely change. As more algorithms are added, sometimes a new level of abstraction is needed to keep the code modular (or simply a better way is found)

To the learner

If you are trying to learn how to implement these algorithms and are having a hard time, then you are not alone. Many of those who implement the main RL libraries seem to agree with this (See Section 2). Many implementations have glaring errors due to ambiguous language like 'error clipping' or misunderstandings (this library itself may have many -- see the disclaimer). There is not yet a standard textbook in the field. Papers sometimes leave out/hop around important things like frame skipping or an edge case which causes loss of probability mass. Sometimes the math seems to make no sense or skips way too many steps. Vectorizing something in multiple dimensions can have your head smashing into things in multiple dimensions. And once you are finally convinced that your implementation works, it may not replicate the results of a paper, take many tries to work, or work and then suddenly start doing poorly again (these results are normal). Trying to understand everything at once is overwhelming. Don't be discourged. Even OpenAI recognizes the learning curve and have just released Spinning Up in an attempt to help. Here are some resources that have helped me along the way:

OpenAI Baselines (This is where I started from)
TensorForce (Also check out their blog)
Reinforcement Learning (Sutton, Barto 2017)
OpenAI Spinning Up (I wish I had this in the beginning!)
Blog posts. These can be extremely helpful for understanding a paper
Other people's implementations are of course wonderful. But be wary that they are correct/complete (including within this module!)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dqn		dqn
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modular-dqn

Features

What's missing

In the future

Disclaimer

To the learner

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

modular-dqn

Features

What's missing

In the future

Disclaimer

To the learner

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages