HyperPPO: A scalable method for finding small policies for robotic control

This repository consists of code used in this paper: https://arxiv.org/abs/2309.16663. We present an algorithm that learns architecture-agnostic policies for RL for robotic tasks. This method enables us to find tiny neural networks, capable of modeling performant policies.

This work was accepted for oral presentation at the 2024 IEEE International Conference on Robotics and Automation (Yokohama). The paper website and video results can be found at https://sites.google.com/usc.edu/hyperppo

This repo is based on

https://github.com/alex-petrenko/sample-factory

To create env:

conda create -n hyper python==3.9

conda activate hyper

git clone git@github.com:alex-petrenko/sample-factory.git

cd sample-factory

pip install -e .

pip install brax==0.1.1

pip install chex==0.1.6

pip install flax==0.6.4

pip install orbax==0.1.1

pip install jax==0.3.25

Download the jax wheel file from https://drive.google.com/file/d/1dBwmHhFUe5bhBN3Zw48MzhXGhhDVL0sc/view?usp=sharing

pip install gdown

gdown https://drive.google.com/uc?id=1dBwmHhFUe5bhBN3Zw48MzhXGhhDVL0sc

pip install jaxlib-0.3.25+cuda11.cudnn82-cp39-cp39-manylinux2014_x86_64.whl

Add the following line to .bashrc to avoid running into GPU memory issues:

echo "export XLA_PYTHON_CLIENT_PREALLOCATE=false" >> ~/.bashrc

To install stable-baselines3 (only for the vec env wrapper, for RL we use sample factory) and drone env:

git clone git@github.com:DLR-RM/stable-baselines3.git
cd stable-baselines3
pip install -e .

git clone git@github.com:Zhehui-Huang/quad-swarm-rl.git
cd quad-swarm-rl
pip install -e .
BEZIER_NO_EXTENSION=true python -m pip install bezier==2020.5.19

Remember to init wandb

Run:

python -m sample_factory.launcher.run --run=sf_examples.brax.experiments.brax_hyper_envs --backend=processes --max_parallel=4 --experiments_per_gpu=1 --num_gpus=4

Drone experiment:

python -m sf_examples.swarm.train_swarm --env quadrotor_multi --experiment hyper_test2 --train_dir dummy --train_for_env_steps 1_000_000_000 --dual_critic False --multi_stddev True --arch_sampling_mode biased --hyper True --with_wandb True --wandb_tags debug --meta_batch_size 16 --continuous_tanh_scale 15

Citation

@article{hegde2023hyperppo,
  title={HyperPPO: A scalable method for finding small policies for robotic control},
  author={Hegde, Shashank and Huang, Zhehui and Sukhatme, Gaurav S},
  journal={arXiv preprint arXiv:2309.16663},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
hyper		hyper
launch		launch
sample_factory		sample_factory
sf_examples		sf_examples
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml
points_plotter.ipynb		points_plotter.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperPPO: A scalable method for finding small policies for robotic control

Citation

About

Uh oh!

Releases

Packages

Languages

hegde95/HyperPPO

Folders and files

Latest commit

History

Repository files navigation

HyperPPO: A scalable method for finding small policies for robotic control

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages