Skip to content

This repository contains many Reinforcement Learning algorithms that I've implemented over the years out of curiosity.

License

Notifications You must be signed in to change notification settings

CommanderCero/RL_Algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains many Reinforcement Learning algorithms that I've implemented over the years out of curiosity. Naturally, the algorithms are not designed to be used by other people. However, some more recent implementations are more user-friendly, such as the algorithms implemented in the A2C folder.

Advantage Actor Critic

Implementation of the Actor-Critic Algorithm using Advantage Estimation to reduce the variance of the policy gradient. This implementation also includes Entropy Regularization to improve exploration.

Code can be found in ./A2C/simpleActorCritic.py

LunarLander

PyBullet Hopper

Parallel Advantage Actor Critic (A2C)

A parallelized version of the Advantage Actor-Critic Algorithm. Instead of exploring only one environment, we run N-Environments in parallel. As a result, we reduce the computational bottleneck created by complex environments and make much more efficient use of the GPU.

I've tested the algorithm on LunarLander and PyBullet Hopper and saw a significant reduction in computation time per step. I've also tested the algorithm on the NES Mario environment.

Note that the agent still likes to jump into holes and enemies. The reason for this is difficult to pinpoint. It could be caused by a poorly chosen convolutional architecture, too short training time, or both. The agent also fails to learn the second level after completing the first level. The main reason for this is that the agent overfits too much on the first level, making it difficult to adapt to the second level without degrading the performance on the first level.

About

This repository contains many Reinforcement Learning algorithms that I've implemented over the years out of curiosity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published