A complete multi-armed bandit experimentation platform with three policies (Thompson Sampling, ε-greedy, UCB1), offline replay using MovieLens 1M data, and production-grade dashboard for monitoring CTR, fairness, and guardrails.
- Thompson Sampling: Bayesian approach with Beta-Bernoulli conjugate priors
- ε-greedy: Configurable exploration rate with propensity scoring
- UCB1: Upper Confidence Bound with cold-start handling
- MovieLens 1M dataset ingestion and processing
- 14-day optimal window selection
- Policy performance comparison with IPS/DR estimates
- Regret analysis and temporal stability metrics
- Real-time guardrails monitoring
- Automatic rollback on failures
- Statistical decision engine (ship/iterate/kill)
- Traffic allocation and user assignment
- Reward calculation with 24h window
- Real-time experiment monitoring
- Policy performance comparison
- Cohort analysis and fairness metrics
- Latency distribution and SLA monitoring
- Event logs with CSV export
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ Database │
│ (React) │ │ (FastAPI) │ │ (PostgreSQL) │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ • Dashboard │ │ • Policies │ │ • Experiments │
│ • Charts │ │ • APIs │ │ • Events │
│ • Tables │ │ • Workers │ │ • States │
│ • Export │ │ • Scheduler │ │ • Assignments │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ Redis Cache │
│ (Optional) │
└─────────────────┘
# Run migrations
python backend/migrate_add_bandit_experiment.py
# Verify tables created
psql -d movierecs -c "\dt"# Load MovieLens 1M data
python tools/load_movielens.py
# Select optimal replay window
python tools/select_replay_window.py
# Run offline simulation
python tools/offline_replay.py
# Evaluate results
python tools/offline_evaluator.py# Launch experiment
python scripts/launch_online_experiment.py \
--name "Bandit Test v1" \
--duration 14 \
--traffic 0.8
# Monitor in dashboard
open http://localhost:3000/experiments/{experiment_id}POST /api/experiments- Create experimentGET /api/experiments/{id}- Get experiment detailsPATCH /api/experiments/{id}- Update experimentPOST /api/experiments/{id}/stop- Stop experiment
GET /api/experiments/{id}/summary- Experiment summaryGET /api/experiments/{id}/timeseries- Time series dataGET /api/experiments/{id}/arms- Arm performanceGET /api/experiments/{id}/cohorts- Cohort analysisGET /api/experiments/{id}/events- Event logsGET /api/experiments/{id}/export- CSV export
GET /api/movies/recommendations?experiment_id={id}- Get recommendations
guardrails = {
'error_rate': 0.01, # 1%
'latency_p95': 120, # 120ms
'arm_concentration': 0.50, # 50%
'reward_drop': 0.05 # 5%
}criteria = {
'min_uplift': 0.03, # 3%
'min_confidence': 0.95, # 95%
'min_window_days': 7, # 7 days
'max_experiment_days': 14, # 14 days
'min_events_per_policy': 1000, # 1000 events
'significance_level': 0.05 # p < 0.05
}pytest backend/tests/test_policies.py -v
pytest backend/tests/test_reward_calculator.py -v
pytest backend/tests/test_experiment_manager.py -vpytest backend/tests/property_tests.py -v
hypothesis --show-statistics backend/tests/property_tests.pypytest backend/tests/test_integration.py -v
pytest backend/tests/test_offline_replay.py -v
pytest backend/tests/test_api_experiments.py -v- SummaryCards: Traffic split, active users, serves, rewards
- RewardChart: Cumulative reward curves per policy
- CohortBreakdown: CTR by user type and fairness
- ArmPerformance: Top arms with anomaly detection
- LatencyDistribution: P95 latency and SLA compliance
- EventLog: Paginated events with CSV export
- GuardrailStatus: Real-time monitoring and rollback
- CTR: Click-through rate on recommendations
- Reward Rate: Percentage of serves with positive reward
- Regret: Gap vs. best policy performance
- Latency: P95 response time < 120ms
- Fairness: Coefficient of variation across cohorts
- Dataset: 1M ratings, 6K movies, 4K users
- Window: 14-day optimal period selected
- Policies: Thompson Sampling, ε-greedy, UCB1
- Metrics: CTR, regret, IPS/DR estimates
| Policy | CTR | Regret | IPS Estimate | DR Estimate |
|---|---|---|---|---|
| Thompson | 0.342 | 0.023 | 0.338 | 0.341 |
| ε-greedy | 0.318 | 0.045 | 0.315 | 0.318 |
| UCB1 | 0.329 | 0.034 | 0.326 | 0.329 |
- Thompson Sampling achieved highest CTR (34.2%)
- ε-greedy showed highest regret due to exploration
- UCB1 provided good balance of exploration/exploitation
- All policies converged to optimal arms over time
- PostgreSQL 12+
- Redis 6+ (optional, for caching)
- Python 3.8+
- Node.js 16+ (for frontend)
DATABASE_URL=postgresql://user:pass@localhost/movierecs
REDIS_URL=redis://localhost:6379/0
BANDIT_DASHBOARD_ENABLED=true- Database: Run migrations and verify tables
- Backend: Deploy FastAPI application
- Frontend: Build and deploy React dashboard
- Workers: Start background workers for rewards
- Scheduler: Start guardrails and decision jobs
- Monitoring: Set up alerts and dashboards
- Database: Use read replicas for analytics queries
- Cache: Redis for policy assignments and state
- Workers: Scale horizontally for reward processing
- API: Use load balancer for high availability
- Check reward calculation logic
- Verify policy state updates
- Ensure sufficient exploration time
- Check database query performance
- Verify Redis cache hit rates
- Monitor policy selection time
- Review threshold configurations
- Check for data quality issues
- Verify experiment traffic allocation
- Check API endpoint availability
- Verify authentication and permissions
- Review browser console for errors
# Check experiment status
python -c "from backend.ml.experiment_manager import ExperimentManager; print(ExperimentManager(SessionLocal()).get_experiment_status('exp-id'))"
# Test policy selection
python -c "from backend.ml.policies import get_policy; policy = get_policy('thompson', SessionLocal()); print(policy.select({}, ['arm1', 'arm2']))"
# Check guardrails
python -c "from backend.ml.guardrails import check_experiment_guardrails; print(check_experiment_guardrails(SessionLocal(), 'exp-id'))"# Install dependencies
pip install -r requirements.txt
npm install
# Run tests
pytest backend/tests/
npm test
# Run linting
flake8 backend/
eslint frontend/src/- Python: PEP 8, Black formatting
- JavaScript: ESLint, Prettier
- Tests: pytest, Jest
- Documentation: Markdown, docstrings
MIT License - see LICENSE file for details.
For questions and support:
- Create an issue in the repository
- Check the troubleshooting section
- Review the API documentation
- Contact the development team
Built with ❤️ for data-driven experimentation