An Implementation of Distributed Prioritized Experience Replay (Horgan et al. 2018) in PyTorch.
The paper proposes a distributed architecture for deep reinforcement learning with distributed prioritized experience replay. This enables a fast and broad exploration with many actors, which prevents model from learning suboptimal policy.
There are a few implementations which are optimized for powerful single machine with a lot of cores but I tried to implement Ape-X in a multi-node situation with AWS EC2 instances. ZeroMQ, AsyncIO, Multiprocessing are really helpful tools for this.
There are still performance issues with replay server which are caused by the shared memory lock and hyperparameter tuning but this works anyway. Also, there are still some parts I hard-coded for convenience. I'm trying to improve many parts and really appreciate your help.
python 3.7
numpy==1.18.1
torch==1.4.0
pyzmq==19.0.0
opencv-python==4.2.0.32
tensorboard==2.1.0
gym==0.17.0
gym[atari]
See run.sh for details. This forked repo mainly aims as single node training.
not tested

