Skip to content

Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.

License

Notifications You must be signed in to change notification settings

ikennaokpala/forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Forge

Behavioral validation forged in, not bolted on.

Forge is an autonomous behavioral validation swarm skill for Claude Code that combines BDD behavioral verification, 7 behavioral validation gates, confidence-tiered learning, topological governance, and self-healing fix loops. It spawns 8 specialized agents that work in parallel to verify, test, fix, and commit - continuously - until every Gherkin scenario passes and every behavioral validation gate clears.


Key Features

  • 8 specialized agents working in parallel with cost-optimized model routing
  • Gherkin behavioral specifications as the single source of truth
  • 7 behavioral validation gates: Functional, Behavioral, Coverage, Security, Accessibility, Resilience, Contract
  • 12 topological governance specifications (§1.1–§1.12) - mathematical foundations for autonomous behavioral validation
  • Confidence-tiered fix patterns (Platinum/Gold/Silver/Bronze) with Nash Equilibrium convergence
  • Defect prediction based on historical failure data and file changes
  • Chaos/resilience testing with controlled failure injection
  • Cross-context dependency awareness with cascade re-testing and sheaf cohomology consistency
  • Shared types and cross-cutting validation across bounded contexts
  • Agent-optimized ADRs with MUST/MUST NOT constraints and verification commands
  • Visual regression testing with pixel-by-pixel comparison
  • Architecture-agnostic - monolith, microservices, monorepo, mobile+backend
  • Optional Agentic QE integration for enhanced pattern search, security scanning, and more
  • External-only mocking - mock third-party services, never internal code (production-validated policy)
  • Spec drift detection - detects when Gherkin specs and implementation diverge
  • LLM-as-Judge meta-review - second-model evaluation with Anti-Echo-Chamber guarantee
  • Self-reflection gate - Bug Fixer asks "What could go wrong?" before committing
  • Hallucination Gate - deterministic pre-LLM boundary (AST resolution, contract hash, mocking detection)
  • Agent criticality scoring - bottleneck detection via Dirichlet energy and automatic optimization
  • Narya-proofs - counterfactual verification proving fix necessity and sufficiency
  • Property-based testing - generate 1000+ test cases from invariants
  • Mutation testing - inject bugs to verify test effectiveness
  • Blake3 witness chain - cryptographic tamper-evident audit trail for gate verdicts
  • Infrastructure readiness markers - specify formally, implement pragmatically, upgrade transparently

Philosophy

Three Pillars

Pillar Source What It Does
Build DDD+ADR+TDD methodology Structured development with behavioral validation gates, defect prediction, confidence-tiered fixes
Verify BDD/Gherkin behavioral specs Continuous behavioral verification - the PRODUCT works, not just the CODE
Heal Autonomous E2E fix loop Test → Analyze → Fix → Commit → Learn → Repeat

"DONE DONE"

"DONE DONE" means: the code compiles AND the product behaves as specified. Every Gherkin scenario passes. Every behavioral validation gate clears. Every dependency graph is satisfied.


Quick Start

# Copy SKILL.md to your Claude Code skills directory
cp SKILL.md ~/.claude/skills/forge.md

# Run on your project
/forge --autonomous --context payments

Invocation Modes

Command Description
/forge --autonomous --all Full autonomous run - all contexts, all gates
/forge --autonomous --context [name] Single context autonomous run
/forge --verify-only Behavioral verification only (no fixes)
/forge --verify-only --context [name] Verify single context
/forge --fix-only --context [name] Fix failures, don't generate new tests
/forge --learn Analyze patterns, update confidence tiers
/forge --add-coverage --screens [names] Add coverage for new screens/pages/components
/forge --spec-gen --context [name] Generate Gherkin specs for a context
/forge --spec-gen --all Generate Gherkin specs for all contexts
/forge --gates-only Run behavioral validation gates without test execution
/forge --gates-only --context [name] Run behavioral validation gates for single context
/forge --predict Defect prediction only
/forge --predict --context [name] Predict defects for single context
/forge --chaos --context [name] Chaos/resilience testing for a context
/forge --chaos --all Chaos testing for all contexts
/forge --drift-check Spec drift detection
/forge --drift-check --context [name] Drift check for single context
/forge --regressions Behavioral regression analysis
/forge --regressions --context [name] Regressions for single context
/forge --meta-review LLM-as-Judge meta-evaluation
/forge --meta-review --context [name] Meta-review for single context
/forge --mutation --context [name] Mutation testing for a context
/forge --mutation --critical-only Mutation testing for critical paths only

Architecture

Autonomous Loop

Specify → Test → Analyze → Fix → Audit → Gate → Commit → Learn → Repeat
┌────────────────────────────────────────────────────────────────────┐
│                    FORGE AUTONOMOUS LOOP                            │
├────────────────────────────────────────────────────────────────────┤
│                                                                    │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │
│  │ Specify  │──▶│   Test   │──▶│ Analyze  │──▶│   Fix    │      │
│  │ (Gherkin)│   │ (Run)    │   │ (Root    │   │ (Tiered) │      │
│  └──────────┘   └──────────┘   │  Cause)  │   └──────────┘      │
│       ▲                        └──────────┘        │              │
│       │                                            ▼              │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐      │
│  │  Learn   │◀──│  Commit  │◀──│  Gate    │◀──│  Audit   │      │
│  │ (Update  │   │ (Auto)   │   │ (7 Gates)│   │ (A11y)   │      │
│  │  Tiers)  │   └──────────┘   └──────────┘   └──────────┘      │
│  └──────────┘                                                     │
│       │                                                           │
│       └──────────────── REPEAT ──────────────────────────────────│
│                                                                    │
│  Loop continues until: ALL 7 VALIDATION GATES PASS or MAX 10    │
└────────────────────────────────────────────────────────────────────┘

Execution Phases

  1. Phase 0 - Backend setup (build, run, health check, seed data)
  2. Phase 1 - Behavioral specification & architecture records (Gherkin specs, ADRs)
  3. Phase 2 - Contract & dependency validation (schemas, shared types, cross-cutting)
  4. Phase 3 - Swarm initialization (load patterns, predictions, confidence tiers)
  5. Phase 4 - Spawn 8 autonomous agents in parallel
  6. Phase 5 - Behavioral validation gates evaluation (7 gates after every fix cycle, BFT consensus ≥5/7)

Behavioral Validation Gates

Gate Check Threshold Blocking
1. Functional All tests pass 100% pass rate YES
2. Behavioral Gherkin scenarios satisfied 100% of targeted scenarios YES
3. Coverage Path coverage >=85% overall, >=95% critical YES (critical only)
4. Security No secrets, SAST checks, no injection vectors 0 critical/high violations YES
5. Accessibility Labels, target sizes, contrast WCAG AA Warning only
6. Resilience Offline, timeout, error handling Tested for target context Warning only
7. Contract API response matches schema 0 mismatches YES

Agent Roles

Agent Model Role v1.2.0 Enhancement
Specification Verifier Sonnet Generates/validates Gherkin specs and ADRs for bounded contexts -
Test Runner Haiku Executes E2E test suites, parses results, maps failures to specs -
Failure Analyzer Sonnet Root cause analysis, pattern matching, dependency impact assessment MaTTS - 3 parallel reasoning trajectories with self-contrast
Bug Fixer Opus Applies confidence-tiered fixes from first principles Driver-Observer algebraic connectivity (λ₂ monitoring)
Behavioral Validation Gate Enforcer Haiku Evaluates all 7 gates, arbitrates agent disagreements BFT consensus model (≥5/7 threshold, VETO for blocking gates)
Accessibility Auditor Sonnet WCAG AA audit: labels, contrast, targets, focus order -
Auto-Committer Haiku Stages fixed files, creates detailed commits with gate statuses -
Learning Optimizer Sonnet Updates confidence tiers, defect prediction, coverage metrics DISTILL phase - LoRA-style abstraction with EWC++ anti-forgetting

Topological Governance (v1.2.0)

Forge v1.2.0 introduces 12 formal topological governance specifications (§1.1–§1.12) that provide mathematical foundations for autonomous behavioral validation. Production heuristics from v1.1.0 - criticality scoring, regression tracking, blocking gates - are now anchored to formal mathematical equivalents.

Four Specification Clusters

Cluster Sections Purpose
Consistency & Verification §1.1–§1.5 Sheaf cohomology for cross-context consistency, Dirichlet energy for system tension, persistent Laplacian for regression tracking, Hallucination Gate for pre-LLM verification, Blake3 witness chain for tamper-evident audit
Swarm Stability §1.6–§1.7 Algebraic connectivity (Fiedler value λ₂) for agent coordination monitoring, MinCut isolation for quarantining anomalous agent output
Memory & Reasoning §1.8–§1.11 Hyperbolic memory (Poincaré ball) for hierarchical code embeddings, GF(3) triadic validation for phase transitions, Narya-proofs for counterfactual fix verification, Johnson-Lindenstrauss for sublinear test coverage
Execution Plane §1.12 WASM/Rust pure-function tasks for deterministic verification (Blake3 hashing, eigenvalue computation, GF(3) validation, HNSW search, contract hash comparison, JL projection)

Infrastructure Readiness

Every specification is operational today. Infrastructure readiness markers define the path from "correct" to "correct and fast":

Specification Current Implementation Native Infrastructure
Blake3 witness chain (§1.5) SHA-256 hashing Blake3 native hashing
Hyperbolic memory (§1.8) Flat key-value lookups across 10 namespaces HNSW-indexed Poincaré ball embeddings
JL coverage (§1.11) Defect prediction with failure probability ranking Random projection to O(log n) representative tests
WASM execution (§1.12) LLM structured reasoning for pure functions WASM/Rust compilation with sub-ms latency

Configuration

Project Config (optional)

# forge.config.yaml - placed at repo root
architecture: microservices
backend:
  services:
    - name: auth-service
      port: 8081
      healthEndpoint: /health
      buildCommand: npm run build
      runCommand: npm start
frontend:
  technology: react
  testCommand: npx cypress run --spec {target}
  testDir: cypress/e2e/
  specDir: cypress/e2e/specs/

# Model routing overrides
model_routing:
  bug-fixer: opus
  failure-analyzer: sonnet
  test-runner: haiku

# Visual regression
visual_regression:
  enabled: true
  threshold: 0.001

# Agentic QE integration
integrations:
  agentic-qe:
    enabled: true
    domains: [defect-intelligence, security-compliance, visual-accessibility, contract-testing]

Context Config (optional)

# forge.contexts.yaml - bounded context definitions
contexts:
  - name: identity
    testFile: identity.cy.ts
    specFile: identity.feature
    paths: 68
    subdomains: [Auth, Profiles, Verification]
  - name: payments
    testFile: payments.cy.ts
    specFile: payments.feature
    paths: 89
    subdomains: [Wallet, Cards, Transactions]

dependencies:
  identity:
    blocks: [payments, orders]
  payments:
    depends_on: [identity]
    blocks: [orders, subscriptions]

If no configuration files are present, Forge auto-discovers the project structure on first run.


Agentic QE Integration

Forge optionally integrates with Agentic QE via MCP for enhanced capabilities:

Capability Without AQE With AQE
Pattern Storage claude-flow memory ReasoningBank (vector-indexed, 150x faster)
Defect Prediction File changes + history Specialized defect-intelligence agents
Security Scanning Gate 4 static checks Full SAST/DAST analysis
Accessibility Built-in auditor visual-tester + accessibility-auditor
Contract Testing Schema validation contract-validator + graphql-tester
Progress .forge/progress.jsonl AG-UI real-time streaming

All AQE features are additive. Forge works identically without AQE installed.


References


License

MIT

About

Forge is an autonomous behavioural validation engineering swarm that treats quality as something forged into software continuously, not bolted on at the end.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •