Build complete applications using Claude across multiple autonomous sessions. Production-ready API-first architecture with modern Next.js web UI, database abstraction, and agent orchestration.
Architecture:
- 🏗️ API-First Platform: FastAPI REST API with WebSocket support
- 🎨 Next.js Web UI: Modern TypeScript/React interface with real-time updates
- 💾 Database Abstraction: Clean separation, PostgreSQL-ready
- 🎭 Agent Orchestrator: Decoupled session lifecycle management
- 🔌 Dual Access: Use via Web UI or CLI tools
Key Features:
- 🤖 Autonomous multi-session development
- 📊 Real-time progress monitoring via WebSocket
- 🔐 Secure blocklist approach for containerized deployment
- 📁 Hierarchical task management (epics → tasks → tests)
- 🎛️ Dual model strategy (Opus for planning, Sonnet for coding)
- 📝 Comprehensive session logging with human-readable durations
- ⚙️ YAML configuration file support
- 🛑 Graceful shutdown handling (Ctrl+C properly finalizes sessions)
Originally forked from Anthropic's autonomous coding demo, now evolved into YokeFlow with significant enhancements including API-first architecture, PostgreSQL database, agent orchestration, quality review system, and production-ready web interface.
Important: Version 1.1.0 includes database schema changes that are not backward compatible. If you are upgrading from v1.0.0:
- Export any projects you want to keep (the generated code in
generations/directory) - Back up your database if you want to preserve v1.0.0 data for reference
- Drop and recreate the database:
docker-compose down -v # Remove volumes docker-compose up -d # Start fresh PostgreSQL python scripts/init_database.py --docker # Initialize schema
- Start fresh with v1.1.0
Why fresh install: Several tables were modified or removed to improve the platform. Migration scripts have been removed as most users will start fresh with this wider release.
Current Status: v1.1.0 - Production Ready (December 2025)
- ✅ PostgreSQL Migration: 100% complete, production-ready async architecture
- ✅ Docker Sandbox: Full integration with 90+ sessions validated
- ✅ API Foundation: REST endpoints, WebSocket support, orchestrator, JWT authentication
- ✅ Web UI v2.0: Production ready - Complete and polished interface
- ✅ Project creation with validation, initialization, and coding session control
- ✅ Real-time session monitoring with WebSocket updates
- ✅ Session logs viewer (Human/Events/Errors tabs) with download
- ✅ Task detail views with epic/task/test hierarchy and drill-down
- ✅ Quality dashboard with deep review recommendations
- ✅ Project completion banner and celebration UI
- ✅ JWT authentication with development mode
- ✅ Toast notifications and confirmation dialogs (no more alert boxes)
- ✅ Enhanced metrics (token breakdown, quality trends)
- ✅ CLI Tools: Fully functional for all operations
- ✅ Review System (4 Phases):
- ✅ Phase 1: Quick quality checks (zero-cost, every session)
- ✅ Phase 2: Automated deep reviews (every 5 sessions or quality < 7)
- ✅ Phase 3: Quality dashboard with collapsible reviews and download
- ✅ Phase 4: Prompt improvement analysis with single-project analysis
- 🎯 Next Steps: Complete pre-release testing, finalize documentation, make repository public
Note: This platform is production-ready. The Web UI provides full functionality for project management, monitoring, and quality analysis. Authentication, validation, and comprehensive testing ensure deployment readiness.
See TODO-FUTURE.md for post-release enhancements and CLAUDE.md for comprehensive guide.
System Requirements:
- Node.js: Version 20 LTS or newer (Download)
- Python: Version 3.9 or newer
- Docker: For PostgreSQL database and sandboxing
- Git: For version control
# Verify Node.js version (must be 20+)
node --version # Should show v20.x.x or newer
# Install Claude Code CLI
npm install -g @anthropic-ai/claude-code
# Install Python dependencies
pip install -r requirements.txt
# Install Next.js web UI dependencies
cd web-ui
npm install
cp .env.local.example .env.local # Configure web UI environment
cd ..
# Build MCP task manager server
cd mcp-task-manager
npm install
npm run build
cd ..
# Setup database
docker-compose up -d # Start PostgreSQL
python scripts/init_database.py --docker # Initialize schema
# Authenticate with Claude Code
claude setup-token
# Configure environment variables
cp .env.example .env
# Edit .env and set CLAUDE_CODE_OAUTH_TOKEN to your token from 'claude setup-token'Use the production-ready web interface:
# Terminal 1: Start the API server
python api/start_api.py
# Or: uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
# Runs on http://localhost:8000
# Terminal 2: Start the Next.js web UI
cd web-ui
npm run dev
# Open http://localhost:3000Authentication:
- Development Mode (default): No password required, auto-bypasses login
- Production Mode: Set
UI_PASSWORDin.envfile to enable JWT authentication - See docs/authentication.md for details
Features:
- ✅ Create projects by uploading spec files with real-time validation
- ✅ Initialize projects (Session 0 - planning phase with Opus)
- ✅ Start/stop coding sessions with real-time monitoring
- ✅ View session logs (Human/Events/Errors tabs) with download
- ✅ WebSocket live updates for session progress
- ✅ Progress counters (epics/tasks/tests) with drill-down
- ✅ Task detail views with epic/task/test hierarchy
- ✅ Quality dashboard with collapsible deep reviews and markdown downloads
- ✅ Screenshots gallery - View all browser verification screenshots organized by task ID
- ✅ Project completion celebration banner
- ✅ JWT authentication (development mode enabled by default)
- ✅ Environment variable editor (inline .env editing)
- ✅ Enhanced metrics (token breakdown, quality trends)
The Web UI is production-ready with comprehensive features for project management and monitoring.
macOS Sleep Prevention & Docker Stability (Important for Multi-Session Runs):
When running autonomous sessions overnight or unattended, you need to prevent sleep AND ensure Docker stays running:
Step 1: Prevent macOS Sleep (Complete Settings)
# Mac Mini / iMac (Desktop): Disable ALL sleep-related features
sudo pmset -a disablesleep 1 # Disable system sleep
sudo pmset -a displaysleep 0 # Disable display sleep (CRITICAL for Docker!)
sudo pmset -a powernap 0 # Disable Power Nap
sudo pmset schedule cancelall # Cancel scheduled sleep/wake events
# ALSO disable screen lock (prevents Docker throttling):
# System Settings → Lock Screen → "Require password after..." → Never
# Or via command line:
sysadminctl -screenLock off
# To re-enable all sleep features when done:
sudo pmset -a disablesleep 0
sudo pmset -a displaysleep 10
sudo pmset -a powernap 1
sysadminctl -screenLock on
# MacBook (Laptop): Use caffeinate (keeps lid open)
caffeinate -s uvicorn api.main:app --host 0.0.0.0 --port 8000 --reloadWhy display sleep matters:
- Docker Desktop runs as a GUI app
- When display sleeps + screen locks, macOS throttles/suspends user-space processes
- This can suspend Docker's hypervisor/VM → PostgreSQL becomes unreachable
- Disabling display sleep AND screen lock prevents Docker throttling
Step 2: Run Docker Watchdog (Recommended)
Docker Desktop can crash even when the Mac doesn't sleep. The watchdog auto-restarts it:
# Terminal 3: Start Docker watchdog in background
./scripts/docker-watchdog.sh &
# It will:
# - Check Docker every 30 seconds
# - Auto-restart if Docker crashes
# - Restart PostgreSQL container
# - Log all events to docker-watchdog.logWhy this matters:
- Mac sleep stops Docker → PostgreSQL goes offline → sessions fail
- Docker Desktop can crash independently of sleep (especially on long runs)
- Watchdog ensures Docker recovers automatically without human intervention
pmsetis more reliable than System Preferences for desktop Macs
Alternative to pmset: Set System Preferences → Energy Saver → "Prevent computer from sleeping" (permanent setting)
See docs/PREVENTING_MAC_SLEEP.md for complete guide.
Features:
- ✅ Create Projects: Upload specification files (any .txt, .md format)
- ✅ Start Sessions: One-click session start with real-time progress
- ✅ Monitor Progress: Live updates via WebSocket (epics, tasks, tests)
- ✅ Configure Environment: Edit .env files directly in browser
- ✅ View Logs: Human-readable session logs with filtering
- ✅ Manage Projects: Browse, search, and delete projects
The agent will:
- Read specification file(s)
- Create complete roadmap (all epics, tasks, tests)
- Generate .env.example with required environment variables
- Stop after initialization for human review
- Resume coding sessions with "Start Session" button
Several utility scripts are available for development and debugging:
# View project progress (quick command-line check)
python scripts/task_status.py generations/my_project
# Reset stuck sessions (automatic cleanup also runs on session start)
python scripts/cleanup_sessions.py [--project my_project] [--force]
# Reset project to post-initialization state (for prompt iteration)
python scripts/reset_project.py --project my_project [--yes]
# Clean up Docker containers
python scripts/cleanup_containers.pyNotes:
- All project management is done via the Web UI (port 3000)
- Projects are stored in
generations/directory - Models are selected in the Web UI when creating/initializing projects
- Use Web UI for all normal operations (create, initialize, run sessions)
Session 0 - Initialization (Opus):
- Reads
app_spec.txtspecification - Creates project in PostgreSQL database with hierarchical structure
- Generates ALL epics (15-25 high-level features)
- Expands ALL epics into tasks (100-300 tasks)
- Adds tests for all tasks (200-1000 tests)
- Creates project structure and
init.sh - Stops automatically - complete roadmap ready
Sessions 1+ - Coding (Sonnet):
- Gets next task from database
- Implements feature
- Verifies with browser automation
- Updates database (marks tests pass/fail)
- Commits to git
- Auto-continues to next session (3s delay)
Press Ctrl+C to pause. Run the same command to resume.
📦 Epics (15-25) "Core Chat Interface"
└─ 📋 Tasks (8-15 each) "Create message input component"
└─ ✅ Tests (1-3 each) "Verify textarea auto-resizes"
Why?
- Prevents token limit errors (was 60K, now ~5K per session)
- Complete visibility from day 1
- Accurate progress tracking
- MCP protocol-based (not shell scripts)
For complex projects, you can upload multiple specification files:
Best practices:
-
Name your main file
main.mdorspec.md -
Reference other files in your main spec:
## API Design See `api-design.md` for detailed endpoint specifications. ## Database Schema See `database-schema.sql` for the complete schema. ## Code Examples See `example-auth.py` for authentication implementation patterns.
-
Include supporting files: API docs, schemas, code examples, wireframes, etc.
Supported file types:
- Spec files:
.txt,.md(primary specification files) - Code examples:
.py,.ts,.js,.tsx,.jsx(reference implementations) - Config files:
.json,.yaml,.yml,.sql,.sh(schemas, scripts) - Styling:
.css,.html(design references)
Example structure:
main.md # Main specification (read first)
api-design.md # API endpoint definitions
database-schema.sql # Database design
example-auth.py # Authentication code example
example-api.ts # API endpoint example
config-example.json # Configuration template
wireframes.md # UI mockups description
How it works:
- Files are saved to a
spec/directory in your project - The agent auto-detects the primary file (main.md, spec.md, or largest file)
- The agent reads the primary file first, then lazy-loads other files as needed
- This saves tokens and improves performance for large specifications
See docs/example-specs.md for detailed examples and example-specs/multi-file-spec/ for a complete working example.
Designed for containerized deployment with blocklist approach:
- ✅ Allows: All development tools (npm, git, curl, etc.)
- ❌ Blocks: Dangerous system commands (rm, sudo, package managers)
Philosophy: Enable autonomous operation while maintaining safety.
Create .yokeflow.yaml in your project directory or ~/.yokeflow.yaml for global defaults:
models:
initializer: claude-opus-4-5-20251101
coding: claude-sonnet-4-5-20250929
timing:
auto_continue_delay: 3
web_ui_poll_interval: 5
web_ui_port: 5001
project:
default_generations_dir: generations
max_iterations: null # unlimitedSee docs/configuration.md for complete guide and .yokeflow.yaml.example for all options.
The system uses a .env file for sensitive configuration. Copy .env.example to .env and update values:
# Required: Claude API Token (get from 'claude setup-token')
CLAUDE_CODE_OAUTH_TOKEN=your_actual_token_here
# Required: PostgreSQL Database URL
DATABASE_URL=postgresql://agent:agent_dev_password@localhost:5432/yokeflow
# Optional: Default models (can also set in .yokeflow.yaml)
DEFAULT_INITIALIZER_MODEL=claude-opus-4-5-20251101
DEFAULT_CODING_MODEL=claude-sonnet-4-5-20250929
DEFAULT_REVIEW_MODEL=claude-opus-4-5-20251101
DEFAULT_PROMPT_IMPROVEMENT_MODEL=claude-opus-4-5-20251101
# Optional: API Server settings
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:3000,http://localhost:5173Important:
- The
.envfile is loaded automatically by the API server and CLI - Never commit
.envto git (it's in.gitignore) - Use
.env.exampleas a template for required variables
For Web UI: Select models when creating/initializing projects via the UI
For CLI: Configure models in .yokeflow.yaml:
models:
initializer: claude-opus-4-5-20251101 # For Session 0 (planning)
coding: claude-sonnet-4-5-20250929 # For Sessions 1+ (coding)Models can also be set via environment variables in .env:
DEFAULT_INITIALIZER_MODEL=claude-opus-4-5-20251101
DEFAULT_CODING_MODEL=claude-sonnet-4-5-20250929
DEFAULT_REVIEW_MODEL=claude-opus-4-5-20251101
DEFAULT_PROMPT_IMPROVEMENT_MODEL=claude-opus-4-5-20251101Priority: Web UI selection > .yokeflow.yaml > .env > Built-in defaults
yokeflow/
├── api/ # FastAPI REST API
│ ├── main.py # API server with WebSocket
│ ├── start_api.py # API server launcher
│ └── README.md # API documentation
├── web-ui/ # Next.js Web UI (TypeScript/React)
│ ├── src/ # Application source
│ │ ├── app/ # Next.js pages
│ │ ├── components/ # React components
│ │ └── lib/ # API client, types, utils
│ └── package.json # Dependencies
├── core/ # Core platform modules
│ ├── orchestrator.py # Session lifecycle management
│ ├── agent.py # Agent loop and session logic
│ ├── database.py # PostgreSQL abstraction (async)
│ ├── database_connection.py # Connection pooling
│ ├── client.py # Claude SDK client setup
│ ├── config.py # Configuration management
│ ├── observability.py # Session logging (JSONL + TXT)
│ ├── security.py # Blocklist validation
│ ├── progress.py # Progress tracking
│ ├── prompts.py # Prompt loading
│ ├── reset.py # Project reset logic
│ ├── sandbox_manager.py # Docker sandbox management
│ └── sandbox_hooks.py # Sandbox hooks
├── review/ # Review system modules
│ ├── review_client.py # Automated deep reviews (Phase 2)
│ ├── review_metrics.py # Quality metrics (Phase 1)
│ └── prompt_improvement_analyzer.py # Prompt optimization (Phase 4)
├── scripts/ # Utility tools (debugging/development)
│ ├── task_status.py # View task status and progress
│ ├── reset_project.py # Reset project to post-init state
│ ├── analyze_sessions.py # Analyze session metrics
│ ├── cleanup_sessions.py # Clean up stuck sessions
│ ├── cleanup_containers.py # Clean up Docker containers
│ ├── init_database.py # Initialize PostgreSQL schema
│ ├── check_deep_reviews.py # Inspect review data
│ ├── show_review_recommendations.py # Show review suggestions
│ ├── docker-watchdog.sh # Auto-restart Docker daemon
│ ├── setup-macos-for-long-runs.sh # Prevent sleep on macOS
│ └── README.md # Scripts documentation
├── prompts/ # Agent instructions
│ ├── initializer_prompt.md # Session 0 instructions (Opus)
│ ├── coding_prompt.md # Sessions 1+ instructions (Sonnet)
│ └── review_prompt.md # Deep review instructions
├── schema/
│ └── postgresql/ # PostgreSQL database schema
│ ├── 001_initial_schema.sql
│ └── 002_session_quality.sql
├── mcp-task-manager/ # MCP server (TypeScript)
│ ├── src/index.ts # Server implementation
│ └── dist/ # Compiled JavaScript
├── tests/ # Test scripts
├── docs/ # Documentation
│ ├── developer-guide.md # Technical deep-dive
│ ├── mcp-usage.md # MCP integration details
│ ├── configuration.md # Config file guide
│ └── review-system.md # Complete review system documentation
└── generations/ # Generated projects (created at runtime)
generations/my_project/
├── app_spec.txt # Your specification
├── init.sh # Generated setup script
├── claude-progress.md # Session notes
├── logs/ # Session logs (JSONL + TXT)
└── [application files] # Generated code
cd generations/my_project
# Use the agent-generated setup script
./init.sh
# Or manually
npm install
npm run devCheck init.sh or the agent's output for the exact URL (typically http://localhost:3000).
Problem: Initialization takes 10-20 minutes. If coding sessions have issues, you don't want to re-run full initialization.
Solution: Reset to post-initialization state while preserving the complete roadmap:
# Preview what will be reset (dry run)
python reset_project.py --project-dir my_project --dry-run
# Reset with confirmation prompt
python reset_project.py --project-dir my_project
# Reset without confirmation
python reset_project.py --project-dir my_project --yesWhat gets reset:
- Database: All task/test completion status (keeps roadmap intact)
- Git: Resets to commit after initialization session
- Logs: Archives coding session logs to
logs/old_attempts/TIMESTAMP/ - Progress: Backs up and resets
claude-progress.md
What is preserved:
- Complete project roadmap (all epics, tasks, tests)
- Initialization session (commit and log)
- Project structure and
init.sh - Configuration files (
.env.example, etc.)
Use cases:
- Testing prompt improvements (v3 → v4 → v5)
- Debugging agent behavior during coding sessions
- A/B testing different models on same initialization
- Recovering from early-stage issues without full restart
Benefits: Saves 10-20 minutes per iteration, enabling faster prompt engineering and testing.
Change the application: Upload your specification files via the Web UI when creating a new project.
Modify security rules:
Edit security.py - add/remove commands from BLOCKED_COMMANDS.
Customize prompts:
Edit files in prompts/ directory.
- This README - Quick start and basic usage
- CLAUDE.md - Comprehensive quick reference guide
- docs/configuration.md - Config file documentation
- docs/example-specs.md - Example specification files and best practices
- docs/developer-guide.md - Technical deep-dive
- docs/mcp-usage.md - MCP integration
- docs/review-system.md - Complete review system documentation (4 phases)
- TODO-FUTURE.md - Post-release enhancements
Initialization takes a long time
- Creating complete roadmap takes 3-5 minutes
- Agent stops automatically when done
- Then set Environment variables and start Coding session manually
Command blocked
- Security system working as intended
- Check
security.pyfor blocked commands - Modify blocklist if needed (use caution)
Database errors
- Ensure PostgreSQL is running:
docker-compose up -d - Check DATABASE_URL in
.envfile - Initialize schema:
python scripts/init_database.py
Web UI shows no projects
- Ensure PostgreSQL database is running
- Check projects in database:
psql $DATABASE_URL -c "SELECT * FROM projects;" - Run initialization session to create first project
Generated applications don't work on different operating systems
- Projects built in Docker sandbox use the Linux environment specified in
Dockerfile.agent-sandbox - Applications may require changes when moved to Windows or macOS
- Node native modules, system dependencies, and OS-specific code may need adjustment
- For production deployment, rebuild or test in target environment
- Consider using Docker for consistent cross-platform deployment
YokeFlow Community License (YCL) v1.0 - See LICENSE for details.