Forge

Behavioral validation forged in, not bolted on.

Forge is an autonomous behavioral validation swarm skill for Claude Code that combines BDD behavioral verification, 7 behavioral validation gates, confidence-tiered learning, topological governance, and self-healing fix loops. It spawns 8 specialized agents that work in parallel to verify, test, fix, and commit - continuously - until every Gherkin scenario passes and every behavioral validation gate clears.

Key Features

8 specialized agents working in parallel with cost-optimized model routing
Gherkin behavioral specifications as the single source of truth
7 behavioral validation gates: Functional, Behavioral, Coverage, Security, Accessibility, Resilience, Contract
12 topological governance specifications (§1.1–§1.12) - mathematical foundations for autonomous behavioral validation
Confidence-tiered fix patterns (Platinum/Gold/Silver/Bronze) with Nash Equilibrium convergence
Defect prediction based on historical failure data and file changes
Chaos/resilience testing with controlled failure injection
Cross-context dependency awareness with cascade re-testing and sheaf cohomology consistency
Shared types and cross-cutting validation across bounded contexts
Agent-optimized ADRs with MUST/MUST NOT constraints and verification commands
Visual regression testing with pixel-by-pixel comparison
Architecture-agnostic - monolith, microservices, monorepo, mobile+backend
Optional Agentic QE integration for enhanced pattern search, security scanning, and more
External-only mocking - mock third-party services, never internal code (production-validated policy)
Spec drift detection - detects when Gherkin specs and implementation diverge
LLM-as-Judge meta-review - second-model evaluation with Anti-Echo-Chamber guarantee
Self-reflection gate - Bug Fixer asks "What could go wrong?" before committing
Hallucination Gate - deterministic pre-LLM boundary (AST resolution, contract hash, mocking detection)
Agent criticality scoring - bottleneck detection via Dirichlet energy and automatic optimization
Narya-proofs - counterfactual verification proving fix necessity and sufficiency
Property-based testing - generate 1000+ test cases from invariants
Mutation testing - inject bugs to verify test effectiveness
Blake3 witness chain - cryptographic tamper-evident audit trail for gate verdicts
Infrastructure readiness markers - specify formally, implement pragmatically, upgrade transparently

Philosophy

Three Pillars

Pillar	Source	What It Does
Build	DDD+ADR+TDD methodology	Structured development with behavioral validation gates, defect prediction, confidence-tiered fixes
Verify	BDD/Gherkin behavioral specs	Continuous behavioral verification - the PRODUCT works, not just the CODE
Heal	Autonomous E2E fix loop	Test → Analyze → Fix → Commit → Learn → Repeat

"DONE DONE"

"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every behavioral validation gate clears. Every dependency graph is satisfied.

Quick Start

# Copy SKILL.md to your Claude Code skills directory
cp SKILL.md ~/.claude/skills/forge.md

# Run on your project
/forge --autonomous --context payments

Invocation Modes

Command	Description
`/forge --autonomous --all`	Full autonomous run - all contexts, all gates
`/forge --autonomous --context [name]`	Single context autonomous run
`/forge --verify-only`	Behavioral verification only (no fixes)
`/forge --verify-only --context [name]`	Verify single context
`/forge --fix-only --context [name]`	Fix failures, don't generate new tests
`/forge --learn`	Analyze patterns, update confidence tiers
`/forge --add-coverage --screens [names]`	Add coverage for new screens/pages/components
`/forge --spec-gen --context [name]`	Generate Gherkin specs for a context
`/forge --spec-gen --all`	Generate Gherkin specs for all contexts
`/forge --gates-only`	Run behavioral validation gates without test execution
`/forge --gates-only --context [name]`	Run behavioral validation gates for single context
`/forge --predict`	Defect prediction only
`/forge --predict --context [name]`	Predict defects for single context
`/forge --chaos --context [name]`	Chaos/resilience testing for a context
`/forge --chaos --all`	Chaos testing for all contexts
`/forge --drift-check`	Spec drift detection
`/forge --drift-check --context [name]`	Drift check for single context
`/forge --regressions`	Behavioral regression analysis
`/forge --regressions --context [name]`	Regressions for single context
`/forge --meta-review`	LLM-as-Judge meta-evaluation
`/forge --meta-review --context [name]`	Meta-review for single context
`/forge --mutation --context [name]`	Mutation testing for a context
`/forge --mutation --critical-only`	Mutation testing for critical paths only

Architecture

Autonomous Loop

Specify → Test → Analyze → Fix → Audit → Gate → Commit → Learn → Repeat

┌────────────────────────────────────────────────────────────────────┐
│                    FORGE AUTONOMOUS LOOP                            │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │
│  │ Specify  │──▶│   Test   │──▶│ Analyze  │──▶│   Fix    │      │
│  │ (Gherkin)│   │ (Run)    │   │ (Root    │   │ (Tiered) │      │
│  └──────────┘   └──────────┘   │  Cause)  │   └──────────┘      │
│       ▲                        └──────────┘        │              │
│       │                                            ▼              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │
│  │  Learn   │◀──│  Commit  │◀──│  Gate    │◀──│  Audit   │      │
│  │ (Update  │   │ (Auto)   │   │ (7 Gates)│   │ (A11y)   │      │
│  │  Tiers)  │   └──────────┘   └──────────┘   └──────────┘      │
│  └──────────┘                                                     │
│       │                                                           │
│       └──────────────── REPEAT ──────────────────────────────────│
│                                                                    │
│  Loop continues until: ALL 7 VALIDATION GATES PASS or MAX 10    │
└────────────────────────────────────────────────────────────────────┘

Execution Phases

Phase 0 - Backend setup (build, run, health check, seed data)
Phase 1 - Behavioral specification & architecture records (Gherkin specs, ADRs)
Phase 2 - Contract & dependency validation (schemas, shared types, cross-cutting)
Phase 3 - Swarm initialization (load patterns, predictions, confidence tiers)
Phase 4 - Spawn 8 autonomous agents in parallel
Phase 5 - Behavioral validation gates evaluation (7 gates after every fix cycle, BFT consensus ≥5/7)

Behavioral Validation Gates

Gate	Check	Threshold	Blocking
1. Functional	All tests pass	100% pass rate	YES
2. Behavioral	Gherkin scenarios satisfied	100% of targeted scenarios	YES
3. Coverage	Path coverage	>=85% overall, >=95% critical	YES (critical only)
4. Security	No secrets, SAST checks, no injection vectors	0 critical/high violations	YES
5. Accessibility	Labels, target sizes, contrast	WCAG AA	Warning only
6. Resilience	Offline, timeout, error handling	Tested for target context	Warning only
7. Contract	API response matches schema	0 mismatches	YES

Agent Roles

Agent	Model	Role	v1.2.0 Enhancement
Specification Verifier	Sonnet	Generates/validates Gherkin specs and ADRs for bounded contexts	-
Test Runner	Haiku	Executes E2E test suites, parses results, maps failures to specs	-
Failure Analyzer	Sonnet	Root cause analysis, pattern matching, dependency impact assessment	MaTTS - 3 parallel reasoning trajectories with self-contrast
Bug Fixer	Opus	Applies confidence-tiered fixes from first principles	Driver-Observer algebraic connectivity (λ₂ monitoring)
Behavioral Validation Gate Enforcer	Haiku	Evaluates all 7 gates, arbitrates agent disagreements	BFT consensus model (≥5/7 threshold, VETO for blocking gates)
Accessibility Auditor	Sonnet	WCAG AA audit: labels, contrast, targets, focus order	-
Auto-Committer	Haiku	Stages fixed files, creates detailed commits with gate statuses	-
Learning Optimizer	Sonnet	Updates confidence tiers, defect prediction, coverage metrics	DISTILL phase - LoRA-style abstraction with EWC++ anti-forgetting

Topological Governance (v1.2.0)

Forge v1.2.0 introduces 12 formal topological governance specifications (§1.1–§1.12) that provide mathematical foundations for autonomous behavioral validation. Production heuristics from v1.1.0 - criticality scoring, regression tracking, blocking gates - are now anchored to formal mathematical equivalents.

Four Specification Clusters

Cluster	Sections	Purpose
Consistency & Verification	§1.1–§1.5	Sheaf cohomology for cross-context consistency, Dirichlet energy for system tension, persistent Laplacian for regression tracking, Hallucination Gate for pre-LLM verification, Blake3 witness chain for tamper-evident audit
Swarm Stability	§1.6–§1.7	Algebraic connectivity (Fiedler value λ₂) for agent coordination monitoring, MinCut isolation for quarantining anomalous agent output
Memory & Reasoning	§1.8–§1.11	Hyperbolic memory (Poincaré ball) for hierarchical code embeddings, GF(3) triadic validation for phase transitions, Narya-proofs for counterfactual fix verification, Johnson-Lindenstrauss for sublinear test coverage
Execution Plane	§1.12	WASM/Rust pure-function tasks for deterministic verification (Blake3 hashing, eigenvalue computation, GF(3) validation, HNSW search, contract hash comparison, JL projection)

Infrastructure Readiness

Every specification is operational today. Infrastructure readiness markers define the path from "correct" to "correct and fast":

Specification	Current Implementation	Native Infrastructure
Blake3 witness chain (§1.5)	SHA-256 hashing	Blake3 native hashing
Hyperbolic memory (§1.8)	Flat key-value lookups across 10 namespaces	HNSW-indexed Poincaré ball embeddings
JL coverage (§1.11)	Defect prediction with failure probability ranking	Random projection to O(log n) representative tests
WASM execution (§1.12)	LLM structured reasoning for pure functions	WASM/Rust compilation with sub-ms latency

Configuration

Project Config (optional)

# forge.config.yaml - placed at repo root
architecture: microservices
backend:
  services:
    - name: auth-service
      port: 8081
      healthEndpoint: /health
      buildCommand: npm run build
      runCommand: npm start
frontend:
  technology: react
  testCommand: npx cypress run --spec {target}
  testDir: cypress/e2e/
  specDir: cypress/e2e/specs/

# Model routing overrides
model_routing:
  bug-fixer: opus
  failure-analyzer: sonnet
  test-runner: haiku

# Visual regression
visual_regression:
  enabled: true
  threshold: 0.001

# Agentic QE integration
integrations:
  agentic-qe:
    enabled: true
    domains: [defect-intelligence, security-compliance, visual-accessibility, contract-testing]

Context Config (optional)

# forge.contexts.yaml - bounded context definitions
contexts:
  - name: identity
    testFile: identity.cy.ts
    specFile: identity.feature
    paths: 68
    subdomains: [Auth, Profiles, Verification]
  - name: payments
    testFile: payments.cy.ts
    specFile: payments.feature
    paths: 89
    subdomains: [Wallet, Cards, Transactions]

dependencies:
  identity:
    blocks: [payments, orders]
  payments:
    depends_on: [identity]
    blocks: [orders, subscriptions]

If no configuration files are present, Forge auto-discovers the project structure on first run.

Agentic QE Integration

Forge optionally integrates with Agentic QE via MCP for enhanced capabilities:

Capability	Without AQE	With AQE
Pattern Storage	claude-flow memory	ReasoningBank (vector-indexed, 150x faster)
Defect Prediction	File changes + history	Specialized defect-intelligence agents
Security Scanning	Gate 4 static checks	Full SAST/DAST analysis
Accessibility	Built-in auditor	visual-tester + accessibility-auditor
Contract Testing	Schema validation	contract-validator + graphql-tester
Progress	`.forge/progress.jsonl`	AG-UI real-time streaming

All AQE features are additive. Forge works identically without AQE installed.

References

Continuous Behavioral Verification: Ongoing Path to Done - Ikenna Okpala
Build with Quality Skill: How I Build Software 10x Faster - Mondweep Chakravorty
claude-code-v3-qe-skill - V3 QE Skill
agentic-qe - Agentic QE Framework
Advanced Topological Governance in Autonomous Software Engineering - Formal mathematical foundations (sheaf theory, spectral analysis, Galois fields) for v1.2.0 specifications

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forge

Key Features

Philosophy

Three Pillars

"DONE DONE"

Quick Start

Invocation Modes

Architecture

Autonomous Loop

Execution Phases

Behavioral Validation Gates

Agent Roles

Topological Governance (v1.2.0)

Four Specification Clusters

Infrastructure Readiness

Configuration

Project Config (optional)

Context Config (optional)

Agentic QE Integration

References

License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

License

ikennaokpala/forge

Folders and files

Latest commit

History

Repository files navigation

Forge

Key Features

Philosophy

Three Pillars

"DONE DONE"

Quick Start

Invocation Modes

Architecture

Autonomous Loop

Execution Phases

Behavioral Validation Gates

Agent Roles

Topological Governance (v1.2.0)

Four Specification Clusters

Infrastructure Readiness

Configuration

Project Config (optional)

Context Config (optional)

Agentic QE Integration

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Packages