A modular reinforcement learning system for options trading that integrates a differentiable Black-Scholes pricing engine with market regime detection.
options_rl/
├── bsm.py # Black-Scholes-Merton pricing engine with autograd Greeks
├── env.py # Gymnasium environment for options trading
├── train.py # PPO training script with experiment tracking
├── visualize.py # Visualization and analysis tools
├── requirements.txt
└── README.md
- Black-Scholes option pricing for calls/puts
- Automatic Greek calculation via PyTorch autograd (Delta, Gamma, Vega, Theta, Rho)
- Newton-Raphson IV solver for implied volatility calculation
- No hardcoded Greek formulas — all derivatives computed automatically
- Custom Gymnasium environment for options trading
- Market regimes: Bull/Bear phases with configurable drift
- GBM stock price simulation
- Configurable episode length (30-120+ trading days)
- Transaction costs and position management
- PPO algorithm via Stable-Baselines3
- Experiment tracking with datetime+UUID directories
- Configurable hyperparameters
- Automatic model saving and metadata logging
- Episode trajectory plots
- Action distribution analysis
- Reward comparison (trained vs random)
- Greeks vs actions scatter plots
pip install -r requirements.txt# Default training (20k steps, 60-day episodes)
python train.py
# Custom training
python train.py --timesteps 50000 --episode-length 90python visualize.py --episodes 20python env.pypython bsm.pyspot_normalized: Stock price / initial pricetime_to_expiry: Years remainingdelta: Option deltagamma: Option gammavega: Option vega (scaled)theta: Option theta (scaled)position: Current position (-1, 0, +1)pnl_normalized: Unrealized PnL / initial cashregime: Market regime (-1 = bear, +1 = bull)
0: BUY (go long)1: HOLD (do nothing)2: SELL (go short)
- Bull market: +30% annual drift
- Bear market: -30% annual drift
- 3% daily probability of regime switch
The trained agent learns regime-conditional behavior:
- Bull market: 100% BUY actions
- Bear market: 100% SELL actions
Typical improvement over random agent: +10-25% in average reward.
- Python 3.10+
- PyTorch (differentiable pricing)
- Gymnasium (RL environment)
- Stable-Baselines3 (PPO algorithm)
- NumPy, Matplotlib
- Stochastic volatility (IV dynamics)
- Multi-leg strategies (spreads, straddles)
- Delta hedging with underlying
- Multi-asset portfolios
- Real market data integration
MIT License