GitHub - grcai/AEP-algorithm: Codes and data for "Adaptive Exploration Policy for Exploration-Exploitation Tradeoff in Continuous Action Control Optimization".

PyTorch implementation of TD3_AEP. If you use our code or data please cite "Adaptive Exploration Policy for Exploration-Exploitation Tradeoff in Continuous Action Control Optimization".

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.0 and Python 3.7.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparameters, etc, to improve performance. Learning curves are still the original results found in the paper.

Each learning curve is formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are performed every 5000-time steps, over a total of 1 million time steps.

Numerical results can be found in the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
TD3AEP.py		TD3AEP.py
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages