Multi-Armed Bandit A/B Testing Platform

Overview

A complete multi-armed bandit experimentation platform with three policies (Thompson Sampling, ε-greedy, UCB1), offline replay using MovieLens 1M data, and production-grade dashboard for monitoring CTR, fairness, and guardrails.

Features

🎯 Bandit Policies

Thompson Sampling: Bayesian approach with Beta-Bernoulli conjugate priors
ε-greedy: Configurable exploration rate with propensity scoring
UCB1: Upper Confidence Bound with cold-start handling

📊 Offline Replay

MovieLens 1M dataset ingestion and processing
14-day optimal window selection
Policy performance comparison with IPS/DR estimates
Regret analysis and temporal stability metrics

🚀 Production Features

Real-time guardrails monitoring
Automatic rollback on failures
Statistical decision engine (ship/iterate/kill)
Traffic allocation and user assignment
Reward calculation with 24h window

📈 Dashboard

Real-time experiment monitoring
Policy performance comparison
Cohort analysis and fairness metrics
Latency distribution and SLA monitoring
Event logs with CSV export

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend       │    │   Database      │
│   (React)       │    │   (FastAPI)     │    │   (PostgreSQL)  │
├─────────────────┤    ├─────────────────┤    ├─────────────────┤
│ • Dashboard     │    │ • Policies      │    │ • Experiments  │
│ • Charts        │    │ • APIs          │    │ • Events        │
│ • Tables        │    │ • Workers       │    │ • States        │
│ • Export        │    │ • Scheduler     │    │ • Assignments   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Redis Cache   │
                    │   (Optional)    │
                    └─────────────────┘

Quick Start

1. Database Setup

# Run migrations
python backend/migrate_add_bandit_experiment.py

# Verify tables created
psql -d movierecs -c "\dt"

2. Offline Replay

# Load MovieLens 1M data
python tools/load_movielens.py

# Select optimal replay window
python tools/select_replay_window.py

# Run offline simulation
python tools/offline_replay.py

# Evaluate results
python tools/offline_evaluator.py

3. Launch Online Experiment

# Launch experiment
python scripts/launch_online_experiment.py \
  --name "Bandit Test v1" \
  --duration 14 \
  --traffic 0.8

# Monitor in dashboard
open http://localhost:3000/experiments/{experiment_id}

API Endpoints

Experiments

POST /api/experiments - Create experiment
GET /api/experiments/{id} - Get experiment details
PATCH /api/experiments/{id} - Update experiment
POST /api/experiments/{id}/stop - Stop experiment

Analytics

GET /api/experiments/{id}/summary - Experiment summary
GET /api/experiments/{id}/timeseries - Time series data
GET /api/experiments/{id}/arms - Arm performance
GET /api/experiments/{id}/cohorts - Cohort analysis
GET /api/experiments/{id}/events - Event logs
GET /api/experiments/{id}/export - CSV export

Recommendations

GET /api/movies/recommendations?experiment_id={id} - Get recommendations

Configuration

Guardrails

guardrails = {
    'error_rate': 0.01,      # 1%
    'latency_p95': 120,      # 120ms
    'arm_concentration': 0.50,  # 50%
    'reward_drop': 0.05      # 5%
}

Decision Criteria

criteria = {
    'min_uplift': 0.03,      # 3%
    'min_confidence': 0.95,  # 95%
    'min_window_days': 7,    # 7 days
    'max_experiment_days': 14,  # 14 days
    'min_events_per_policy': 1000,  # 1000 events
    'significance_level': 0.05  # p < 0.05
}

Testing

Unit Tests

pytest backend/tests/test_policies.py -v
pytest backend/tests/test_reward_calculator.py -v
pytest backend/tests/test_experiment_manager.py -v

Property Tests

pytest backend/tests/property_tests.py -v
hypothesis --show-statistics backend/tests/property_tests.py

Integration Tests

pytest backend/tests/test_integration.py -v
pytest backend/tests/test_offline_replay.py -v
pytest backend/tests/test_api_experiments.py -v

Monitoring

Dashboard Components

SummaryCards: Traffic split, active users, serves, rewards
RewardChart: Cumulative reward curves per policy
CohortBreakdown: CTR by user type and fairness
ArmPerformance: Top arms with anomaly detection
LatencyDistribution: P95 latency and SLA compliance
EventLog: Paginated events with CSV export
GuardrailStatus: Real-time monitoring and rollback

Key Metrics

CTR: Click-through rate on recommendations
Reward Rate: Percentage of serves with positive reward
Regret: Gap vs. best policy performance
Latency: P95 response time < 120ms
Fairness: Coefficient of variation across cohorts

Offline Replay Results

MovieLens 1M Analysis

Dataset: 1M ratings, 6K movies, 4K users
Window: 14-day optimal period selected
Policies: Thompson Sampling, ε-greedy, UCB1
Metrics: CTR, regret, IPS/DR estimates

Performance Comparison

Policy	CTR	Regret	IPS Estimate	DR Estimate
Thompson	0.342	0.023	0.338	0.341
ε-greedy	0.318	0.045	0.315	0.318
UCB1	0.329	0.034	0.326	0.329

Key Insights

Thompson Sampling achieved highest CTR (34.2%)
ε-greedy showed highest regret due to exploration
UCB1 provided good balance of exploration/exploitation
All policies converged to optimal arms over time

Production Deployment

Prerequisites

PostgreSQL 12+
Redis 6+ (optional, for caching)
Python 3.8+
Node.js 16+ (for frontend)

Environment Variables

DATABASE_URL=postgresql://user:pass@localhost/movierecs
REDIS_URL=redis://localhost:6379/0
BANDIT_DASHBOARD_ENABLED=true

Deployment Steps

Database: Run migrations and verify tables
Backend: Deploy FastAPI application
Frontend: Build and deploy React dashboard
Workers: Start background workers for rewards
Scheduler: Start guardrails and decision jobs
Monitoring: Set up alerts and dashboards

Scaling Considerations

Database: Use read replicas for analytics queries
Cache: Redis for policy assignments and state
Workers: Scale horizontally for reward processing
API: Use load balancer for high availability

Troubleshooting

Common Issues

Policy Not Learning

Check reward calculation logic
Verify policy state updates
Ensure sufficient exploration time

High Latency

Check database query performance
Verify Redis cache hit rates
Monitor policy selection time

Guardrail Failures

Review threshold configurations
Check for data quality issues
Verify experiment traffic allocation

Dashboard Not Loading

Check API endpoint availability
Verify authentication and permissions
Review browser console for errors

Debug Commands

# Check experiment status
python -c "from backend.ml.experiment_manager import ExperimentManager; print(ExperimentManager(SessionLocal()).get_experiment_status('exp-id'))"

# Test policy selection
python -c "from backend.ml.policies import get_policy; policy = get_policy('thompson', SessionLocal()); print(policy.select({}, ['arm1', 'arm2']))"

# Check guardrails
python -c "from backend.ml.guardrails import check_experiment_guardrails; print(check_experiment_guardrails(SessionLocal(), 'exp-id'))"

Contributing

Development Setup

# Install dependencies
pip install -r requirements.txt
npm install

# Run tests
pytest backend/tests/
npm test

# Run linting
flake8 backend/
eslint frontend/src/

Code Style

Python: PEP 8, Black formatting
JavaScript: ESLint, Prettier
Tests: pytest, Jest
Documentation: Markdown, docstrings

License

MIT License - see LICENSE file for details.

Support

For questions and support:

Create an issue in the repository
Check the troubleshooting section
Review the API documentation
Contact the development team

Built with ❤️ for data-driven experimentation

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.cursor/plans		.cursor/plans
.github/workflows		.github/workflows
.vscode		.vscode
backend		backend
deployment		deployment
docs		docs
frontend		frontend
infra		infra
scripts		scripts
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.railwayignore		.railwayignore
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
Makefile		Makefile
RAILWAY_DEPLOYMENT_FIXED.md		RAILWAY_DEPLOYMENT_FIXED.md
README.md		README.md
monitoring_report.json		monitoring_report.json
nixpacks.toml		nixpacks.toml
package-lock.json.local		package-lock.json.local
package.json.local		package.json.local
railway.json		railway.json
test_bandit.sh		test_bandit.sh

Teamial/CineaMate

Folders and files

Latest commit

History

Repository files navigation