Behavioral validation forged in, not bolted on.
Forge is an autonomous behavioral validation swarm skill for Claude Code that combines BDD behavioral verification, 7 behavioral validation gates, confidence-tiered learning, topological governance, and self-healing fix loops. It spawns 8 specialized agents that work in parallel to verify, test, fix, and commit - continuously - until every Gherkin scenario passes and every behavioral validation gate clears.
- 8 specialized agents working in parallel with cost-optimized model routing
- Gherkin behavioral specifications as the single source of truth
- 7 behavioral validation gates: Functional, Behavioral, Coverage, Security, Accessibility, Resilience, Contract
- 12 topological governance specifications (§1.1–§1.12) - mathematical foundations for autonomous behavioral validation
- Confidence-tiered fix patterns (Platinum/Gold/Silver/Bronze) with Nash Equilibrium convergence
- Defect prediction based on historical failure data and file changes
- Chaos/resilience testing with controlled failure injection
- Cross-context dependency awareness with cascade re-testing and sheaf cohomology consistency
- Shared types and cross-cutting validation across bounded contexts
- Agent-optimized ADRs with MUST/MUST NOT constraints and verification commands
- Visual regression testing with pixel-by-pixel comparison
- Architecture-agnostic - monolith, microservices, monorepo, mobile+backend
- Optional Agentic QE integration for enhanced pattern search, security scanning, and more
- External-only mocking - mock third-party services, never internal code (production-validated policy)
- Spec drift detection - detects when Gherkin specs and implementation diverge
- LLM-as-Judge meta-review - second-model evaluation with Anti-Echo-Chamber guarantee
- Self-reflection gate - Bug Fixer asks "What could go wrong?" before committing
- Hallucination Gate - deterministic pre-LLM boundary (AST resolution, contract hash, mocking detection)
- Agent criticality scoring - bottleneck detection via Dirichlet energy and automatic optimization
- Narya-proofs - counterfactual verification proving fix necessity and sufficiency
- Property-based testing - generate 1000+ test cases from invariants
- Mutation testing - inject bugs to verify test effectiveness
- Blake3 witness chain - cryptographic tamper-evident audit trail for gate verdicts
- Infrastructure readiness markers - specify formally, implement pragmatically, upgrade transparently
| Pillar | Source | What It Does |
|---|---|---|
| Build | DDD+ADR+TDD methodology | Structured development with behavioral validation gates, defect prediction, confidence-tiered fixes |
| Verify | BDD/Gherkin behavioral specs | Continuous behavioral verification - the PRODUCT works, not just the CODE |
| Heal | Autonomous E2E fix loop | Test → Analyze → Fix → Commit → Learn → Repeat |
"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every behavioral validation gate clears. Every dependency graph is satisfied.
# Copy SKILL.md to your Claude Code skills directory
cp SKILL.md ~/.claude/skills/forge.md
# Run on your project
/forge --autonomous --context payments| Command | Description |
|---|---|
/forge --autonomous --all |
Full autonomous run - all contexts, all gates |
/forge --autonomous --context [name] |
Single context autonomous run |
/forge --verify-only |
Behavioral verification only (no fixes) |
/forge --verify-only --context [name] |
Verify single context |
/forge --fix-only --context [name] |
Fix failures, don't generate new tests |
/forge --learn |
Analyze patterns, update confidence tiers |
/forge --add-coverage --screens [names] |
Add coverage for new screens/pages/components |
/forge --spec-gen --context [name] |
Generate Gherkin specs for a context |
/forge --spec-gen --all |
Generate Gherkin specs for all contexts |
/forge --gates-only |
Run behavioral validation gates without test execution |
/forge --gates-only --context [name] |
Run behavioral validation gates for single context |
/forge --predict |
Defect prediction only |
/forge --predict --context [name] |
Predict defects for single context |
/forge --chaos --context [name] |
Chaos/resilience testing for a context |
/forge --chaos --all |
Chaos testing for all contexts |
/forge --drift-check |
Spec drift detection |
/forge --drift-check --context [name] |
Drift check for single context |
/forge --regressions |
Behavioral regression analysis |
/forge --regressions --context [name] |
Regressions for single context |
/forge --meta-review |
LLM-as-Judge meta-evaluation |
/forge --meta-review --context [name] |
Meta-review for single context |
/forge --mutation --context [name] |
Mutation testing for a context |
/forge --mutation --critical-only |
Mutation testing for critical paths only |
Specify → Test → Analyze → Fix → Audit → Gate → Commit → Learn → Repeat
┌────────────────────────────────────────────────────────────────────┐
│ FORGE AUTONOMOUS LOOP │
├────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Specify │──▶│ Test │──▶│ Analyze │──▶│ Fix │ │
│ │ (Gherkin)│ │ (Run) │ │ (Root │ │ (Tiered) │ │
│ └──────────┘ └──────────┘ │ Cause) │ └──────────┘ │
│ ▲ └──────────┘ │ │
│ │ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Learn │◀──│ Commit │◀──│ Gate │◀──│ Audit │ │
│ │ (Update │ │ (Auto) │ │ (7 Gates)│ │ (A11y) │ │
│ │ Tiers) │ └──────────┘ └──────────┘ └──────────┘ │
│ └──────────┘ │
│ │ │
│ └──────────────── REPEAT ──────────────────────────────────│
│ │
│ Loop continues until: ALL 7 VALIDATION GATES PASS or MAX 10 │
└────────────────────────────────────────────────────────────────────┘
- Phase 0 - Backend setup (build, run, health check, seed data)
- Phase 1 - Behavioral specification & architecture records (Gherkin specs, ADRs)
- Phase 2 - Contract & dependency validation (schemas, shared types, cross-cutting)
- Phase 3 - Swarm initialization (load patterns, predictions, confidence tiers)
- Phase 4 - Spawn 8 autonomous agents in parallel
- Phase 5 - Behavioral validation gates evaluation (7 gates after every fix cycle, BFT consensus ≥5/7)
| Gate | Check | Threshold | Blocking |
|---|---|---|---|
| 1. Functional | All tests pass | 100% pass rate | YES |
| 2. Behavioral | Gherkin scenarios satisfied | 100% of targeted scenarios | YES |
| 3. Coverage | Path coverage | >=85% overall, >=95% critical | YES (critical only) |
| 4. Security | No secrets, SAST checks, no injection vectors | 0 critical/high violations | YES |
| 5. Accessibility | Labels, target sizes, contrast | WCAG AA | Warning only |
| 6. Resilience | Offline, timeout, error handling | Tested for target context | Warning only |
| 7. Contract | API response matches schema | 0 mismatches | YES |
| Agent | Model | Role | v1.2.0 Enhancement |
|---|---|---|---|
| Specification Verifier | Sonnet | Generates/validates Gherkin specs and ADRs for bounded contexts | - |
| Test Runner | Haiku | Executes E2E test suites, parses results, maps failures to specs | - |
| Failure Analyzer | Sonnet | Root cause analysis, pattern matching, dependency impact assessment | MaTTS - 3 parallel reasoning trajectories with self-contrast |
| Bug Fixer | Opus | Applies confidence-tiered fixes from first principles | Driver-Observer algebraic connectivity (λ₂ monitoring) |
| Behavioral Validation Gate Enforcer | Haiku | Evaluates all 7 gates, arbitrates agent disagreements | BFT consensus model (≥5/7 threshold, VETO for blocking gates) |
| Accessibility Auditor | Sonnet | WCAG AA audit: labels, contrast, targets, focus order | - |
| Auto-Committer | Haiku | Stages fixed files, creates detailed commits with gate statuses | - |
| Learning Optimizer | Sonnet | Updates confidence tiers, defect prediction, coverage metrics | DISTILL phase - LoRA-style abstraction with EWC++ anti-forgetting |
Forge v1.2.0 introduces 12 formal topological governance specifications (§1.1–§1.12) that provide mathematical foundations for autonomous behavioral validation. Production heuristics from v1.1.0 - criticality scoring, regression tracking, blocking gates - are now anchored to formal mathematical equivalents.
| Cluster | Sections | Purpose |
|---|---|---|
| Consistency & Verification | §1.1–§1.5 | Sheaf cohomology for cross-context consistency, Dirichlet energy for system tension, persistent Laplacian for regression tracking, Hallucination Gate for pre-LLM verification, Blake3 witness chain for tamper-evident audit |
| Swarm Stability | §1.6–§1.7 | Algebraic connectivity (Fiedler value λ₂) for agent coordination monitoring, MinCut isolation for quarantining anomalous agent output |
| Memory & Reasoning | §1.8–§1.11 | Hyperbolic memory (Poincaré ball) for hierarchical code embeddings, GF(3) triadic validation for phase transitions, Narya-proofs for counterfactual fix verification, Johnson-Lindenstrauss for sublinear test coverage |
| Execution Plane | §1.12 | WASM/Rust pure-function tasks for deterministic verification (Blake3 hashing, eigenvalue computation, GF(3) validation, HNSW search, contract hash comparison, JL projection) |
Every specification is operational today. Infrastructure readiness markers define the path from "correct" to "correct and fast":
| Specification | Current Implementation | Native Infrastructure |
|---|---|---|
| Blake3 witness chain (§1.5) | SHA-256 hashing | Blake3 native hashing |
| Hyperbolic memory (§1.8) | Flat key-value lookups across 10 namespaces | HNSW-indexed Poincaré ball embeddings |
| JL coverage (§1.11) | Defect prediction with failure probability ranking | Random projection to O(log n) representative tests |
| WASM execution (§1.12) | LLM structured reasoning for pure functions | WASM/Rust compilation with sub-ms latency |
# forge.config.yaml - placed at repo root
architecture: microservices
backend:
services:
- name: auth-service
port: 8081
healthEndpoint: /health
buildCommand: npm run build
runCommand: npm start
frontend:
technology: react
testCommand: npx cypress run --spec {target}
testDir: cypress/e2e/
specDir: cypress/e2e/specs/
# Model routing overrides
model_routing:
bug-fixer: opus
failure-analyzer: sonnet
test-runner: haiku
# Visual regression
visual_regression:
enabled: true
threshold: 0.001
# Agentic QE integration
integrations:
agentic-qe:
enabled: true
domains: [defect-intelligence, security-compliance, visual-accessibility, contract-testing]# forge.contexts.yaml - bounded context definitions
contexts:
- name: identity
testFile: identity.cy.ts
specFile: identity.feature
paths: 68
subdomains: [Auth, Profiles, Verification]
- name: payments
testFile: payments.cy.ts
specFile: payments.feature
paths: 89
subdomains: [Wallet, Cards, Transactions]
dependencies:
identity:
blocks: [payments, orders]
payments:
depends_on: [identity]
blocks: [orders, subscriptions]If no configuration files are present, Forge auto-discovers the project structure on first run.
Forge optionally integrates with Agentic QE via MCP for enhanced capabilities:
| Capability | Without AQE | With AQE |
|---|---|---|
| Pattern Storage | claude-flow memory | ReasoningBank (vector-indexed, 150x faster) |
| Defect Prediction | File changes + history | Specialized defect-intelligence agents |
| Security Scanning | Gate 4 static checks | Full SAST/DAST analysis |
| Accessibility | Built-in auditor | visual-tester + accessibility-auditor |
| Contract Testing | Schema validation | contract-validator + graphql-tester |
| Progress | .forge/progress.jsonl |
AG-UI real-time streaming |
All AQE features are additive. Forge works identically without AQE installed.
- Continuous Behavioral Verification: Ongoing Path to Done - Ikenna Okpala
- Build with Quality Skill: How I Build Software 10x Faster - Mondweep Chakravorty
- claude-code-v3-qe-skill - V3 QE Skill
- agentic-qe - Agentic QE Framework
- Advanced Topological Governance in Autonomous Software Engineering - Formal mathematical foundations (sheaf theory, spectral analysis, Galois fields) for v1.2.0 specifications
MIT