Skip to content

Teamial/CineaMate

Repository files navigation

Multi-Armed Bandit A/B Testing Platform

Overview

A complete multi-armed bandit experimentation platform with three policies (Thompson Sampling, ε-greedy, UCB1), offline replay using MovieLens 1M data, and production-grade dashboard for monitoring CTR, fairness, and guardrails.

Features

🎯 Bandit Policies

  • Thompson Sampling: Bayesian approach with Beta-Bernoulli conjugate priors
  • ε-greedy: Configurable exploration rate with propensity scoring
  • UCB1: Upper Confidence Bound with cold-start handling

📊 Offline Replay

  • MovieLens 1M dataset ingestion and processing
  • 14-day optimal window selection
  • Policy performance comparison with IPS/DR estimates
  • Regret analysis and temporal stability metrics

🚀 Production Features

  • Real-time guardrails monitoring
  • Automatic rollback on failures
  • Statistical decision engine (ship/iterate/kill)
  • Traffic allocation and user assignment
  • Reward calculation with 24h window

📈 Dashboard

  • Real-time experiment monitoring
  • Policy performance comparison
  • Cohort analysis and fairness metrics
  • Latency distribution and SLA monitoring
  • Event logs with CSV export

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend       │    │   Database      │
│   (React)       │    │   (FastAPI)     │    │   (PostgreSQL)  │
├─────────────────┤    ├─────────────────┤    ├─────────────────┤
│ • Dashboard     │    │ • Policies      │    │ • Experiments  │
│ • Charts        │    │ • APIs          │    │ • Events        │
│ • Tables        │    │ • Workers       │    │ • States        │
│ • Export        │    │ • Scheduler     │    │ • Assignments   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Redis Cache   │
                    │   (Optional)    │
                    └─────────────────┘

Quick Start

1. Database Setup

# Run migrations
python backend/migrate_add_bandit_experiment.py

# Verify tables created
psql -d movierecs -c "\dt"

2. Offline Replay

# Load MovieLens 1M data
python tools/load_movielens.py

# Select optimal replay window
python tools/select_replay_window.py

# Run offline simulation
python tools/offline_replay.py

# Evaluate results
python tools/offline_evaluator.py

3. Launch Online Experiment

# Launch experiment
python scripts/launch_online_experiment.py \
  --name "Bandit Test v1" \
  --duration 14 \
  --traffic 0.8

# Monitor in dashboard
open http://localhost:3000/experiments/{experiment_id}

API Endpoints

Experiments

  • POST /api/experiments - Create experiment
  • GET /api/experiments/{id} - Get experiment details
  • PATCH /api/experiments/{id} - Update experiment
  • POST /api/experiments/{id}/stop - Stop experiment

Analytics

  • GET /api/experiments/{id}/summary - Experiment summary
  • GET /api/experiments/{id}/timeseries - Time series data
  • GET /api/experiments/{id}/arms - Arm performance
  • GET /api/experiments/{id}/cohorts - Cohort analysis
  • GET /api/experiments/{id}/events - Event logs
  • GET /api/experiments/{id}/export - CSV export

Recommendations

  • GET /api/movies/recommendations?experiment_id={id} - Get recommendations

Configuration

Guardrails

guardrails = {
    'error_rate': 0.01,      # 1%
    'latency_p95': 120,      # 120ms
    'arm_concentration': 0.50,  # 50%
    'reward_drop': 0.05      # 5%
}

Decision Criteria

criteria = {
    'min_uplift': 0.03,      # 3%
    'min_confidence': 0.95,  # 95%
    'min_window_days': 7,    # 7 days
    'max_experiment_days': 14,  # 14 days
    'min_events_per_policy': 1000,  # 1000 events
    'significance_level': 0.05  # p < 0.05
}

Testing

Unit Tests

pytest backend/tests/test_policies.py -v
pytest backend/tests/test_reward_calculator.py -v
pytest backend/tests/test_experiment_manager.py -v

Property Tests

pytest backend/tests/property_tests.py -v
hypothesis --show-statistics backend/tests/property_tests.py

Integration Tests

pytest backend/tests/test_integration.py -v
pytest backend/tests/test_offline_replay.py -v
pytest backend/tests/test_api_experiments.py -v

Monitoring

Dashboard Components

  • SummaryCards: Traffic split, active users, serves, rewards
  • RewardChart: Cumulative reward curves per policy
  • CohortBreakdown: CTR by user type and fairness
  • ArmPerformance: Top arms with anomaly detection
  • LatencyDistribution: P95 latency and SLA compliance
  • EventLog: Paginated events with CSV export
  • GuardrailStatus: Real-time monitoring and rollback

Key Metrics

  • CTR: Click-through rate on recommendations
  • Reward Rate: Percentage of serves with positive reward
  • Regret: Gap vs. best policy performance
  • Latency: P95 response time < 120ms
  • Fairness: Coefficient of variation across cohorts

Offline Replay Results

MovieLens 1M Analysis

  • Dataset: 1M ratings, 6K movies, 4K users
  • Window: 14-day optimal period selected
  • Policies: Thompson Sampling, ε-greedy, UCB1
  • Metrics: CTR, regret, IPS/DR estimates

Performance Comparison

Policy CTR Regret IPS Estimate DR Estimate
Thompson 0.342 0.023 0.338 0.341
ε-greedy 0.318 0.045 0.315 0.318
UCB1 0.329 0.034 0.326 0.329

Key Insights

  • Thompson Sampling achieved highest CTR (34.2%)
  • ε-greedy showed highest regret due to exploration
  • UCB1 provided good balance of exploration/exploitation
  • All policies converged to optimal arms over time

Production Deployment

Prerequisites

  • PostgreSQL 12+
  • Redis 6+ (optional, for caching)
  • Python 3.8+
  • Node.js 16+ (for frontend)

Environment Variables

DATABASE_URL=postgresql://user:pass@localhost/movierecs
REDIS_URL=redis://localhost:6379/0
BANDIT_DASHBOARD_ENABLED=true

Deployment Steps

  1. Database: Run migrations and verify tables
  2. Backend: Deploy FastAPI application
  3. Frontend: Build and deploy React dashboard
  4. Workers: Start background workers for rewards
  5. Scheduler: Start guardrails and decision jobs
  6. Monitoring: Set up alerts and dashboards

Scaling Considerations

  • Database: Use read replicas for analytics queries
  • Cache: Redis for policy assignments and state
  • Workers: Scale horizontally for reward processing
  • API: Use load balancer for high availability

Troubleshooting

Common Issues

Policy Not Learning

  • Check reward calculation logic
  • Verify policy state updates
  • Ensure sufficient exploration time

High Latency

  • Check database query performance
  • Verify Redis cache hit rates
  • Monitor policy selection time

Guardrail Failures

  • Review threshold configurations
  • Check for data quality issues
  • Verify experiment traffic allocation

Dashboard Not Loading

  • Check API endpoint availability
  • Verify authentication and permissions
  • Review browser console for errors

Debug Commands

# Check experiment status
python -c "from backend.ml.experiment_manager import ExperimentManager; print(ExperimentManager(SessionLocal()).get_experiment_status('exp-id'))"

# Test policy selection
python -c "from backend.ml.policies import get_policy; policy = get_policy('thompson', SessionLocal()); print(policy.select({}, ['arm1', 'arm2']))"

# Check guardrails
python -c "from backend.ml.guardrails import check_experiment_guardrails; print(check_experiment_guardrails(SessionLocal(), 'exp-id'))"

Contributing

Development Setup

# Install dependencies
pip install -r requirements.txt
npm install

# Run tests
pytest backend/tests/
npm test

# Run linting
flake8 backend/
eslint frontend/src/

Code Style

  • Python: PEP 8, Black formatting
  • JavaScript: ESLint, Prettier
  • Tests: pytest, Jest
  • Documentation: Markdown, docstrings

License

MIT License - see LICENSE file for details.

Support

For questions and support:

  • Create an issue in the repository
  • Check the troubleshooting section
  • Review the API documentation
  • Contact the development team

Built with ❤️ for data-driven experimentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published