RL training for SO-101 robot manipulation (grasp, lift, place).
git clone git@github.com:ggand0/pick-101.git
cd pick-101
git submodule update --init --recursive
uv syncAssets (STL meshes) are stored via Git LFS and pulled automatically. Alternatively, copy from SO-ARM100:
cp -r SO-ARM100/Simulation/SO101/assets models/so101/Run the top-down pick demo:
PYTHONPATH=. uv run python tests/test_topdown_pick.py --viewermodels/so101/ # Robot models and scenes
├── so101_new_calib.xml # Current robot with finger pads
├── lift_cube.xml # Scene with elliptic friction
└── assets/ # STL mesh files (Git LFS)
src/
├── controllers/
│ └── ik_controller.py # Damped least-squares IK
├── envs/
│ └── lift_cube.py # Gym environment with reward versions
└── training/
├── train_image_rl.py # DrQ-v2 training script
├── eval_checkpoint.py # Evaluation with video generation
└── workspace.py # SO101Workspace for RoboBase
configs/ # Training configs
├── drqv2_lift_s3_v19.yaml # Image-based RL (working)
└── curriculum_stage3.yaml # State-based RL
tests/ # Test scripts
| Script | Description |
|---|---|
test_topdown_pick.py |
Current best - top-down pick with finger pads |
test_ik_grasp.py |
Legacy IK grasp (original model) |
test_horizontal_grasp.py |
Experimental horizontal approach |
The current best pick-and-place implementation. Uses a clean 4-step sequence:
- Move above block - Position fingertips 30mm above cube, gripper open
- Descend to block - Lower to grasp height, gripper open
- Close gripper - Gradual close with contact detection, then tighten
- Lift - Raise cube to target height
Key features:
- Uses
gripperframesite at fingertips for precise IK targeting - Finger pad collision boxes for stable multi-contact grasping
- Elliptic cone friction model to prevent slip
- Contact detection to stop closing at optimal grip force
- Locked wrist joints (
wrist_flex=90°,wrist_roll=90°) for top-down orientation
# Headless
PYTHONPATH=. uv run python tests/test_topdown_pick.py
# With viewer
PYTHONPATH=. uv run python tests/test_topdown_pick.py --viewerTrain an RL agent using low-dimensional state observations (joint positions, cube pose, etc.):
PYTHONPATH=. uv run python train_lift.py --config configs/curriculum_stage3.yamlUses v11 reward. Achieves 100% success rate at 1M steps.
The agent learns to:
- Approach the cube from above
- Close gripper to grasp
- Lift to 8cm height
- Hold for 3 seconds
Training outputs are saved to runs/lift_curriculum_s3/<timestamp>/:
checkpoints/- Model checkpoints every 100k stepsvec_normalize.pkl- Observation normalization statstensorboard/- Training logs
Evaluate a trained model and generate videos:
PYTHONPATH=. uv run python eval_cartesian.py \
--run runs/lift_curriculum_s3/<timestamp> \
--checkpoint 1000000This runs 10 deterministic episodes and saves videos to the run directory.
To continue training from a checkpoint:
PYTHONPATH=. uv run python train_lift.py \
--config configs/curriculum_stage3.yaml \
--resume runs/lift_curriculum_s3/<timestamp> \
--timesteps 500000 # Additional stepsTrain an RL agent using wrist camera observations (84x84 RGB images):
MUJOCO_GL=egl uv run python src/training/train_image_rl.py \
--config configs/drqv2_lift_s3_v19.yamlTraining takes ~8 hours for 2M steps.
MUJOCO_GL=egl uv run python src/training/eval_checkpoint.py \
runs/image_rl/<timestamp>/snapshots/2000000_snapshot.pt \
--num_episodes 10 \
--reward_version v19 \
--output_dir runs/image_rl/<timestamp>/evalMUJOCO_GL=egl uv run python src/training/eval_checkpoint.py \
runs/image_rl/<timestamp>/snapshots/2000000_snapshot.pt \
--num_episodes 5 \
--reward_version v19 \
--x-format \
--output_dir runs/image_rl/<timestamp>/eval_x_postUses v19 reward with per-finger reach and hold count bonus. Achieves 100% success rate at 2M steps.
| Model | Features | Use Case |
|---|---|---|
so101_new_calib.xml |
Finger pads, fingertip sites | Current development |
so101_ik_grasp.xml |
Original model (no pads) | Legacy compatibility |
so101_horizontal_grasp.xml |
Horizontal approach config | Experimental |