From 3c741cebae05bbb850abd272443b988aa855bde0 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 31 Jan 2026 07:18:44 +0000
Subject: [PATCH 1/3] feat: Add reusable anatomize & adversarial review skills
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Three composable slash commands extracted from session methodology:

/anatomize <target> — Full two-phase pipeline:
  Phase 1: Document all instances of target with identity, architecture,
  dependencies, constraints, and limitations in structured markdown.
  Phase 2: Launch 6 parallel adversarial subagents with strict rubrics
  (convex wins, extending capabilities, reducing steps, improvement
  integration, accuracy fidelity, structural quality), synthesize
  findings, and apply improvements targeting 4+/5 across all dimensions.

/adversarial-review <artifact> — Standalone Phase 2:
  6-dimension parallel adversarial evaluation with scoring, findings,
  recommendations, and automatic application of improvements.

/extract-anatomy <target> — Standalone Phase 1:
  Comprehensive anatomical extraction with identity, architecture,
  dependencies, constraints per item. Outputs structured markdown
  with glossary, quick reference matrix, and cross-references.

https://claude.ai/code/session_01CJnBiMXiuhsnvvvcqAMPYS
---
 .claude/commands/adversarial-review.md |  42 +++++++++
 .claude/commands/anatomize.md          | 124 +++++++++++++++++++++++++
 .claude/commands/extract-anatomy.md    |  46 +++++++++
 3 files changed, 212 insertions(+)
 create mode 100644 .claude/commands/adversarial-review.md
 create mode 100644 .claude/commands/anatomize.md
 create mode 100644 .claude/commands/extract-anatomy.md
diff --git a/.claude/commands/adversarial-review.md b/.claude/commands/adversarial-review.md
new file mode 100644
index 0000000000..7cd54d4c19
--- /dev/null
+++ b/.claude/commands/adversarial-review.md
@@ -0,0 +1,42 @@
+# Adversarial Review
+
+Adversarially evaluate the target artifact using parallel subagents with strict rubrics. The target is: **$ARGUMENTS**
+
+If the argument is a file path, read it. If it's a description, find the relevant files first.
+
+---
+
+## Execution
+
+Launch **6 parallel subagents**, each assigned ONE rubric dimension. Each must return:
+- **Score** (1-5, with 5 being best)
+- **Specific findings** (line-level references where possible)
+- **Concrete recommendations** (exact changes, not vague advice)
+
+### R1: Convex Easy Wins
+High-impact, low-effort fixes. Missing defaults, incorrect values, simple formatting inconsistencies, missing cross-references. Score 5 = nothing trivial left to fix.
+
+### R2: Extending Capabilities
+Does the artifact cover advanced usage, composition patterns, edge cases, extension points, power-user features? Score 5 = comprehensive advanced coverage.
+
+### R3: Reducing Steps to Accurate Results
+Can a reader go from question to answer efficiently? Quick references, decision tables, workflow patterns, inline error docs. Score 5 = minimal friction to any answer.
+
+### R4: Improvement-Integration Points
+Does the artifact identify where the system can be improved? Limitations as opportunities, automation potential, drift detection. Score 5 = clear improvement roadmap.
+
+### R5: Accuracy & Source-of-Truth Fidelity
+Cross-reference against actual source code, schemas, or runtime behavior. Find documentation lies, stale values, incomplete enums. Score 5 = fully verified.
+
+### R6: Structural & Navigational Quality
+ToC completeness, glossary coverage, heading consistency, metadata template uniformity, bidirectional cross-references. Score 5 = professional reference quality.
+
+## After Evaluation
+
+1. Present aggregate score table
+2. Deduplicate overlapping findings across dimensions
+3. Prioritize: convex wins → capability extensions → accuracy → structural
+4. Apply all improvements
+5. Commit with detailed changelog
+
+If any dimension scores below 3/5 after application, run a focused follow-up on that dimension.
diff --git a/.claude/commands/anatomize.md b/.claude/commands/anatomize.md
new file mode 100644
index 0000000000..6024e8b592
--- /dev/null
+++ b/.claude/commands/anatomize.md
@@ -0,0 +1,124 @@
+# Anatomize & Adversarially Optimize
+
+You are executing a two-phase methodology: **Anatomize** then **Adversarially Optimize**. The target is: **$ARGUMENTS**
+
+If no target is provided, ask the user to specify one (e.g., "native tools", "API endpoints", "database models", "CI/CD pipelines", "authentication flows").
+
+---
+
+## Phase 1: Anatomize
+
+Produce a comprehensive anatomical document of every instance of the target. For each item, document:
+
+### 1. Identity
+- **Name** (canonical, aliases, historical names)
+- **Category** (which logical group it belongs to)
+- **Purpose** (one-sentence description of what it does and why it exists)
+
+### 2. Anatomical Architecture
+- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), and description
+- **Outputs / Return Shape** — what it produces, including structure, max size, truncation behavior
+- **Internal Mechanics** — how it works (execution model, delegation, transformation pipeline)
+- **State Effects** — what it reads from, writes to, or mutates
+
+### 3. Dependencies
+- **Upstream** — what must exist or be configured for this to function (prerequisites, feature flags, servers, environment)
+- **Downstream** — what depends on this, what breaks if this changes
+- **Peer Relationships** — tools/components it commonly pairs with, mutual exclusions, ordering constraints
+
+### 4. Constraints & Limitations
+- **Hard Constraints** — enforced limits (schema validation, size caps, rate limits, timeout ceilings)
+- **Soft Constraints** — guidance and best practices (recommended usage patterns, anti-patterns)
+- **Behavioral Properties** — read-only, concurrency-safe, requires permission, sandboxed, auto-allowed
+- **Availability Conditions** — feature flags, platform requirements, configuration prerequisites
+
+### Methodology
+1. Use `Explore` subagents to search the codebase, installed packages, and configuration files
+2. Cross-reference source of truth (actual runtime schemas/code) against any existing documentation
+3. Produce structured markdown with tables for parameters and bullet lists for properties
+4. Include a Quick Reference matrix at the top and a Glossary for domain-specific terms
+
+### Output
+Write the anatomical document to a file. Propose a sensible path based on the target (e.g., `docs/<target>-reference.md`). Include:
+- Glossary of domain terms
+- Quick Reference summary table
+- Full per-item anatomy sections
+- Cross-reference notes between related items
+- Configuration reference (env vars, CLI flags, config files) if applicable
+
+---
+
+## Phase 2: Adversarially Optimize
+
+Once the anatomical document exists, launch **parallel subagents** to adversarially evaluate it. Each subagent gets ONE rubric dimension and must return:
+- **Score** (1-5)
+- **Specific findings** (what's wrong, missing, or improvable — with line references)
+- **Concrete recommendations** (exact changes, not vague suggestions)
+
+### Rubric Dimensions
+
+Launch these **6 subagents in parallel**:
+
+#### R1: Convex Easy Wins
+> Score how many high-impact, low-effort improvements exist.
+- Are there missing defaults that take 1 line to add?
+- Are there incorrect values that are a simple find-and-fix?
+- Are there missing cross-references that just need a "See also: X" line?
+- Are there inconsistencies in formatting/structure that a single pass fixes?
+- **Scoring:** 5 = no easy wins left (already polished), 1 = many trivial fixes remaining
+
+#### R2: Extending Capabilities
+> Score how well the document enables users to push beyond basic usage.
+- Does each item document its full parameter space (not just the common ones)?
+- Are advanced features, edge cases, and power-user patterns documented?
+- Are composition patterns shown (how items combine for emergent capability)?
+- Are extension points documented (hooks, plugins, custom configurations)?
+- **Scoring:** 5 = comprehensive advanced coverage, 1 = basic-only documentation
+
+#### R3: Reducing Steps to Accurate Results
+> Score how efficiently a reader can go from question to correct answer.
+- Is there a Quick Reference for the 80% use case?
+- Are "Use this, not that" decision tables present?
+- Are workflow patterns documented with step-by-step sequences?
+- Can a reader find the right tool/item without reading the entire document?
+- Are common errors and their fixes documented inline?
+- **Scoring:** 5 = minimal steps to any answer, 1 = requires full document read
+
+#### R4: Improvement-Integration Points
+> Score how well the document identifies where the system can be improved.
+- Are limitations framed as improvement opportunities?
+- Are gaps between current and ideal behavior identified?
+- Are automation opportunities noted (scripts, CI, generators)?
+- Are version/drift detection strategies suggested?
+- **Scoring:** 5 = clear roadmap for improvement, 1 = static reference only
+
+#### R5: Accuracy & Source-of-Truth Fidelity
+> Score correctness against the actual runtime artifact.
+- Cross-reference parameter types, defaults, and constraints against source code/schemas
+- Identify any documentation lies (stated behavior != actual behavior)
+- Check enum values are exhaustive
+- Verify constraints match runtime enforcement
+- **Scoring:** 5 = fully verified against source, 1 = unverified or contains errors
+
+#### R6: Structural & Navigational Quality
+> Score the document's architecture as a reference artifact.
+- Is the ToC complete and correctly linked?
+- Is the Glossary comprehensive (no undefined jargon)?
+- Are section headings consistent and scannable?
+- Is the metadata template consistent across all items?
+- Are related items cross-referenced bidirectionally?
+- **Scoring:** 5 = professional reference quality, 1 = inconsistent/hard to navigate
+
+### Synthesis
+
+After all 6 subagents return:
+1. **Aggregate scores** into a summary table
+2. **Deduplicate** overlapping findings
+3. **Prioritize** by impact (convex wins first, then capability extensions, then structural)
+4. **Apply** all improvements to the anatomical document
+5. **Commit** with a detailed message listing what changed and why
+
+### Target Scores
+- Phase 1 baseline is expected around 2-3/5 (first drafts always have gaps)
+- After Phase 2 application, target 4+/5 across all dimensions
+- If any dimension remains below 3/5, run a focused follow-up pass on that dimension
diff --git a/.claude/commands/extract-anatomy.md b/.claude/commands/extract-anatomy.md
new file mode 100644
index 0000000000..bb983fa927
--- /dev/null
+++ b/.claude/commands/extract-anatomy.md
@@ -0,0 +1,46 @@
+# Extract Anatomy
+
+Produce a comprehensive anatomical document of all instances of the target: **$ARGUMENTS**
+
+If no target is provided, ask the user to specify one.
+
+---
+
+## For Each Item Found, Document:
+
+### Identity
+- Name (canonical, aliases, historical names)
+- Category / logical group
+- Purpose (one sentence)
+
+### Architecture
+- **Parameters / Inputs** — name, type, required/optional, default, constraints (min/max/enum/regex), description
+- **Outputs / Return Shape** — structure, max size, truncation behavior
+- **Internal Mechanics** — execution model, delegation, transformation pipeline
+- **State Effects** — reads from, writes to, mutates
+
+### Dependencies
+- **Upstream** — prerequisites, feature flags, servers, environment requirements
+- **Downstream** — what depends on this, what breaks if it changes
+- **Peers** — common pairings, mutual exclusions, ordering constraints
+
+### Constraints & Limitations
+- **Hard** — schema validation, size caps, rate limits, timeouts
+- **Soft** — best practices, recommended patterns, anti-patterns
+- **Behavioral Properties** — read-only, concurrency-safe, permissions, sandbox, auto-allowed
+- **Availability** — feature flags, platform requirements, configuration prerequisites
+
+## Methodology
+
+1. Search codebase, installed packages, and config files using `Explore` subagents
+2. Cross-reference runtime source of truth against existing documentation
+3. Output structured markdown with parameter tables and property lists
+
+## Output Structure
+
+Write to `docs/<target>-reference.md` (or propose a better path). Include:
+- Glossary of domain terms
+- Quick Reference summary matrix
+- Per-item anatomy sections
+- Cross-references between related items
+- Configuration reference (env vars, CLI flags, config files) if applicable

From f8e6d63cbe5983e77571541477818928a00ff1bc Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 31 Jan 2026 07:26:08 +0000
Subject: [PATCH 2/3] feat: Add fan-out pattern, considerations guard-rails,
 and review skills
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

New slash commands extracted from session methodology:

/fan-out-review <target> — The parallel adversarial subagent pattern
  formalized as a reusable skill. Documents the fan-out → synthesize →
  apply pipeline with rubric templates for 4 artifact types:
  documentation, code, tests, and architecture. Includes custom rubric
  design guide for anything that doesn't fit the templates.

/considerations <context> — The 7 key considerations as pre-flight
  checklist and in-flight guard-rails:
  1. Adversarial eval most valuable on first drafts (diminishing returns)
  2. Parallel evaluation beats sequential (narrow mandates, no groupthink)
  3. Fan-out → synthesize → apply is the core reusable unit
  4. Scoring forces accountability (no hand-waving)
  5. Source-of-truth extraction catches documentation lies
  6. Prioritize convex easy wins (impact/effort ratio)
  7. Know when to stop (4+/5 or <3 findings per round)
  Includes calibration table, priority formula, and stopping criteria.

/review-docs <file> — Focused documentation review with 6 parallel
  rubrics tuned for reference material.

/review-code <file> — Focused code review with 6 parallel rubrics
  tuned for implementation (correctness prioritized over cleanup).

/audit-completeness <artifact> — Cross-reference any artifact against
  its source of truth. Produces structured delta report with severity
  classification. Includes "common documentation lies" checklist.

https://claude.ai/code/session_01CJnBiMXiuhsnvvvcqAMPYS
---
 .claude/commands/audit-completeness.md |  93 ++++++++++++++
 .claude/commands/considerations.md     | 162 +++++++++++++++++++++++++
 .claude/commands/fan-out-review.md     | 140 +++++++++++++++++++++
 .claude/commands/review-code.md        |  25 ++++
 .claude/commands/review-docs.md        |  23 ++++
 5 files changed, 443 insertions(+)
 create mode 100644 .claude/commands/audit-completeness.md
 create mode 100644 .claude/commands/considerations.md
 create mode 100644 .claude/commands/fan-out-review.md
 create mode 100644 .claude/commands/review-code.md
 create mode 100644 .claude/commands/review-docs.md

diff --git a/.claude/commands/audit-completeness.md b/.claude/commands/audit-completeness.md
new file mode 100644
index 0000000000..4b3faa1524
--- /dev/null
+++ b/.claude/commands/audit-completeness.md
@@ -0,0 +1,93 @@
+# Audit Completeness
+
+Cross-reference **$ARGUMENTS** against its source of truth. This is a focused accuracy and completeness audit — not a structural or stylistic review.
+
+---
+
+## Methodology
+
+### Step 1: Identify the Source of Truth
+
+Determine what the documentation/artifact *claims to describe*, then locate the authoritative source:
+
+| Artifact Type | Source of Truth |
+|--------------|-----------------|
+| Tool/API documentation | Actual code, schema definitions (Zod, JSON Schema, OpenAPI) |
+| Configuration reference | Config file parsers, CLI argument handlers, env var readers |
+| Architecture docs | Actual module structure, dependency graph, deployed topology |
+| Test documentation | Actual test files, coverage reports, CI configuration |
+| Changelog/release notes | Git history, package versions, deployed artifacts |
+
+### Step 2: Extract from Source
+
+Using Explore subagents or Bash, extract the **actual** values from the source of truth:
+- Parameter names, types, required/optional status
+- Default values (from code, not from docs)
+- Constraints (min, max, enum values, regex patterns)
+- Feature flags and availability conditions
+- Aliases and historical names
+
+### Step 3: Diff Against Documentation
+
+For every item in the documentation, check:
+
+| Check | Finding Type |
+|-------|-------------|
+| Item exists in source but not in doc | **Missing** |
+| Item exists in doc but not in source | **Stale/Removed** |
+| Item exists in both but values differ | **Incorrect** |
+| Item exists in both, doc is vague, source is precise | **Imprecise** |
+| Item exists in both, source has constraints doc doesn't mention | **Under-documented** |
+
+### Step 4: Report
+
+Produce a structured delta report:
+
+```markdown
+## Completeness Audit: [artifact name]
+
+### Summary
+- Items in source of truth: N
+- Items documented: M
+- Missing: X
+- Stale: Y
+- Incorrect: Z
+- Imprecise: W
+
+### Critical (Incorrect)
+| Item | Documented Value | Actual Value | Location |
+|------|-----------------|--------------|----------|
+
+### Major (Missing)
+| Item | Source Location | What Should Be Documented |
+|------|----------------|--------------------------|
+
+### Minor (Imprecise / Under-documented)
+| Item | Current Doc | More Precise Value | Source |
+|------|-------------|-------------------|--------|
+
+### Stale (Removed from Source)
+| Item | Doc Location | Notes |
+|------|-------------|-------|
+```
+
+### Step 5: Apply
+
+Fix all Critical and Major findings. Fix Minor findings if low-effort. Remove Stale entries. Commit with the delta report as the commit message body.
+
+---
+
+## Common Documentation Lies to Check
+
+These are the most frequent divergences between docs and reality:
+
+- [ ] Default values that changed in code but not in docs
+- [ ] Enum values that were added (new options) but not documented
+- [ ] Enum values that were removed but still documented
+- [ ] Constraints that were loosened (higher max) or tightened (lower max)
+- [ ] Parameters that were added but not documented
+- [ ] Parameters that were renamed (old name still in docs)
+- [ ] Parameters that were removed but still documented
+- [ ] Features gated behind flags/conditions that docs don't mention
+- [ ] Aliases that exist in code but aren't documented
+- [ ] Internal parameters exposed in schema but intentionally undocumented
diff --git a/.claude/commands/considerations.md b/.claude/commands/considerations.md
new file mode 100644
index 0000000000..85e92f99ce
--- /dev/null
+++ b/.claude/commands/considerations.md
@@ -0,0 +1,162 @@
+# Considerations & Guard-Rails
+
+You are applying the **7 Key Considerations** as a pre-flight checklist and ongoing guard-rails for the current task. The context is: **$ARGUMENTS**
+
+If no context is given, apply these considerations to whatever task is currently in progress.
+
+---
+
+## The 7 Considerations
+
+These are hard-won principles from adversarial evaluation methodology. Apply them as both a **pre-flight checklist** (before starting) and **in-flight guard-rails** (during execution).
+
+### 1. Adversarial evaluation is most valuable on first drafts
+
+> The initial version of anything scores ~2-3/5. That's expected, not a failure. One focused round of adversarial review typically yields the largest improvement. Diminishing returns set in after 2-3 rounds.
+
+**Guard-rail:** Don't polish prematurely. Get the first draft out, then evaluate. Don't run 5 adversarial passes when 2 will capture 90% of the value.
+
+**Pre-flight check:**
+- [ ] Is this a first draft? → Run full adversarial evaluation
+- [ ] Is this a second pass? → Run focused evaluation on weak dimensions only
+- [ ] Is this a third+ pass? → Only proceed if a specific dimension is still below 3/5
+
+---
+
+### 2. Parallel evaluation beats sequential
+
+> Six focused evaluators with narrow mandates find more than one generalist trying to evaluate everything at once. They also run concurrently, so wall-clock time is the same as a single evaluation.
+
+**Guard-rail:** Always fan-out to multiple subagents. Never ask one subagent to evaluate "everything." A narrow mandate prevents evaluator fatigue and blind spots.
+
+**Pre-flight check:**
+- [ ] Have I defined distinct, non-overlapping rubric dimensions?
+- [ ] Am I launching all evaluators in parallel (single message, multiple Task calls)?
+- [ ] Does each evaluator have a clear scope boundary?
+
+---
+
+### 3. The fan-out → synthesize → apply pattern is the core reusable unit
+
+> This three-stage pattern works for any quality gate: doc review, code review, test gap analysis, architecture review, incident postmortem, onboarding material audit.
+
+**Guard-rail:** If you're doing any kind of review or audit, reach for this pattern first. The stages are:
+1. **Fan-out** — parallel independent evaluation with narrow mandates
+2. **Synthesize** — aggregate, deduplicate, prioritize
+3. **Apply** — implement changes in priority order
+
+**Applicable to:**
+- Documentation and reference material
+- Source code and implementations
+- Test suites and coverage
+- Architecture and design documents
+- Configuration and infrastructure
+- API designs and schemas
+- Onboarding and runbooks
+- Incident postmortems and retrospectives
+
+---
+
+### 4. Scoring forces accountability
+
+> Without numeric scores, evaluations drift toward "looks good" or vague suggestions. A 1-5 scale with mandatory specific findings per score makes hand-waving impossible.
+
+**Guard-rail:** Every evaluation dimension MUST produce:
+- A numeric score (1-5)
+- At least one specific finding (with file/line reference where possible)
+- At least one concrete recommendation (exact change, not "consider improving X")
+
+**Scoring calibration:**
+| Score | Meaning | Action Required |
+|-------|---------|-----------------|
+| 1 | Fundamentally broken | Stop and fix before proceeding |
+| 2 | Significant gaps | Major rework needed |
+| 3 | Functional but incomplete | Targeted improvements |
+| 4 | Good with minor gaps | Polish pass |
+| 5 | Comprehensive | No action needed |
+
+---
+
+### 5. Source-of-truth extraction catches documentation lies
+
+> Inspecting the actual runtime artifact (code, schema, config) catches things that reviewing documentation in isolation never will. The stated behavior and the actual behavior diverge more often than expected.
+
+**Guard-rail:** Always cross-reference documentation claims against the source of truth. Methods:
+- Read the actual source code / compiled bundle / installed package
+- Run the tool and observe actual behavior
+- Check schema definitions (Zod, JSON Schema, OpenAPI) for real constraints
+- Compare documented defaults against actual defaults in code
+
+**Common documentation lies:**
+- Default values that changed but the doc wasn't updated
+- Enum values that were added but not documented
+- Constraints that were loosened/tightened
+- Parameters that were added/removed/renamed
+- Features gated behind flags that the doc doesn't mention
+
+---
+
+### 6. Prioritize convex easy wins
+
+> A "convex" improvement is one where the value delivered is disproportionately high relative to the effort required. Fix these first. They compound — each easy fix improves the overall quality, making the next improvement clearer.
+
+**Guard-rail:** After synthesis, sort by effort-to-impact ratio:
+
+```
+Priority = Impact / Effort
+
+High impact + Low effort  = DO FIRST  (convex wins)
+High impact + High effort = DO SECOND (strategic)
+Low impact  + Low effort  = DO THIRD  (cleanup)
+Low impact  + High effort = SKIP      (waste)
+```
+
+**Common convex wins:**
+- Adding a missing default value to a parameter table (1 line, prevents user confusion)
+- Fixing an incorrect constraint (1 word, prevents runtime errors)
+- Adding a cross-reference "See also: X" (1 line, connects isolated knowledge)
+- Adding a glossary entry (2 lines, eliminates repeated confusion)
+- Fixing inconsistent formatting (find-and-replace, improves scannability)
+
+---
+
+### 7. Know when to stop
+
+> Adversarial evaluation has diminishing returns. The first round captures ~60% of issues. The second captures ~25% more. The third captures ~10%. Beyond that, you're polishing polish.
+
+**Guard-rail:** Stop when ANY of these are true:
+- All dimensions score 4+/5
+- A round yields fewer than 3 actionable findings
+- You've completed 3 rounds
+- The remaining findings are all "Enhancement" severity (nice-to-have, not need-to-have)
+
+**Do NOT stop when:**
+- Any dimension is below 3/5
+- Accuracy/correctness findings remain unaddressed (wrong > missing)
+- Critical or Major findings exist from any dimension
+
+---
+
+## Using These as a Pre-Flight Checklist
+
+Before starting any review, evaluation, or documentation task, verify:
+
+```
+[ ] 1. What draft stage is this? (first → full eval, second → focused, third+ → only if <3/5)
+[ ] 2. Am I using parallel subagents with narrow mandates? (not one generalist)
+[ ] 3. Am I using fan-out → synthesize → apply? (not ad-hoc review)
+[ ] 4. Does every evaluation require scores + findings + recommendations? (not vibes)
+[ ] 5. Am I cross-referencing against the source of truth? (not reviewing docs in isolation)
+[ ] 6. Am I prioritizing convex easy wins first? (not big-effort items)
+[ ] 7. Do I know my stopping criteria? (not infinite polish)
+```
+
+## Using These as In-Flight Guard-Rails
+
+During execution, check periodically:
+
+- **Am I going deep on one dimension at the expense of others?** → Rebalance
+- **Am I making changes without scoring first?** → Score, then change
+- **Am I trusting the documentation without verifying the source?** → Cross-reference
+- **Am I spending significant effort on a low-impact finding?** → Skip, note for future
+- **Am I on round 4+ with marginal returns?** → Stop
diff --git a/.claude/commands/fan-out-review.md b/.claude/commands/fan-out-review.md
new file mode 100644
index 0000000000..3850a5af17
--- /dev/null
+++ b/.claude/commands/fan-out-review.md
@@ -0,0 +1,140 @@
+# Fan-Out → Synthesize → Apply
+
+You are executing the **Parallel Adversarial Subagent Pattern**. The target is: **$ARGUMENTS**
+
+If the argument is a file path, read it. If it's a description, find the relevant artifact first. If no argument is given, ask the user what to review.
+
+---
+
+## The Pattern
+
+```
+┌─────────────────────────┐
+│   Artifact Under Review │
+└────────┬────────────────┘
+         │
+    ┌────┴────┐
+    │ Fan-Out │  (parallel, independent, narrow mandates)
+    └────┬────┘
+         │
+   ┌─────┼─────┬─────┬─────┬─────┐
+   ▼     ▼     ▼     ▼     ▼     ▼
+  R1    R2    R3    R4    R5    R6
+  │     │     │     │     │     │
+  └─────┼─────┴─────┴─────┴─────┘
+        │
+   ┌────┴────┐
+   │ Fan-In  │  (synthesize, deduplicate, prioritize)
+   └────┬────┘
+        │
+   ┌────┴────┐
+   │  Apply  │  (implement, commit)
+   └─────────┘
+```
+
+## Why This Works
+
+- Each evaluator has a **narrow mandate** — prevents blind spots from a single reviewer trying to evaluate everything
+- Parallel execution means **no serial bottleneck** — all lenses run simultaneously
+- Independent rubrics mean evaluators **cannot groupthink** — they don't see each other's findings
+- Mandatory scoring with mandatory findings means **no hand-waving** — every score requires evidence
+- Fan-in deduplication catches **overlapping concerns** across lenses and merges them
+
+## Step 1: Fan-Out
+
+Launch these subagents **in parallel** (all in a single message with multiple Task tool calls). Each subagent:
+- Receives the full artifact
+- Gets ONE rubric dimension (its narrow mandate)
+- Must return: Score (1-5), Specific Findings (with line references), Concrete Recommendations
+
+**Determine your rubric dimensions based on the artifact type.** Here are templates:
+
+### For Documentation / Reference Material
+| Subagent | Rubric | Mandate |
+|----------|--------|---------|
+| R1 | Convex Easy Wins | High-impact, low-effort fixes: missing defaults, wrong values, formatting gaps |
+| R2 | Extending Capabilities | Advanced usage, composition patterns, edge cases, power-user features |
+| R3 | Reducing Steps to Results | Quick references, decision tables, workflow patterns, inline error docs |
+| R4 | Improvement-Integration Points | Limitations as opportunities, automation potential, drift detection |
+| R5 | Accuracy & Source Fidelity | Cross-reference against source code/schemas, find documentation lies |
+| R6 | Structural & Navigation | ToC, glossary, heading consistency, cross-references, metadata uniformity |
+
+### For Code / Implementation
+| Subagent | Rubric | Mandate |
+|----------|--------|---------|
+| R1 | Convex Easy Wins | Dead code, unused imports, trivial type fixes, obvious simplifications |
+| R2 | Extending Capabilities | Missing error handling, uncovered edge cases, unexposed configuration |
+| R3 | Reducing Steps to Results | Unnecessary abstractions, over-engineering, simpler alternatives |
+| R4 | Improvement-Integration Points | Testability gaps, observability hooks, plugin points, API surface |
+| R5 | Correctness & Safety | Logic errors, race conditions, injection vectors, resource leaks |
+| R6 | Structural Quality | Naming consistency, module boundaries, dependency direction, cohesion |
+
+### For Tests / Test Suites
+| Subagent | Rubric | Mandate |
+|----------|--------|---------|
+| R1 | Convex Easy Wins | Missing assertions, unchecked return values, trivial coverage gaps |
+| R2 | Extending Coverage | Untested branches, missing edge cases, integration gaps |
+| R3 | Reducing Noise | Flaky tests, slow tests, redundant tests, unclear failure messages |
+| R4 | Improvement-Integration Points | Missing test categories, automation opportunities, CI optimization |
+| R5 | Correctness & Confidence | Tests that pass for wrong reasons, mocked-away real behavior |
+| R6 | Structural Quality | Test organization, naming, setup/teardown patterns, helper reuse |
+
+### For Architecture / Design Documents
+| Subagent | Rubric | Mandate |
+|----------|--------|---------|
+| R1 | Convex Easy Wins | Missing diagrams, undefined terms, broken cross-references |
+| R2 | Extending Capabilities | Unaddressed scaling scenarios, missing failure modes, evolution paths |
+| R3 | Reducing Steps to Results | Decision rationale, trade-off matrices, rejected alternatives |
+| R4 | Improvement-Integration Points | Migration paths, deprecation strategies, extension mechanisms |
+| R5 | Feasibility & Accuracy | Assumptions vs reality, dependency availability, performance claims |
+| R6 | Structural Quality | Consistency, completeness, ADR format compliance, audience clarity |
+
+### Custom Rubrics
+If the artifact doesn't fit the above, design 6 rubric dimensions that:
+1. Cover **correctness** (is it right?)
+2. Cover **completeness** (is anything missing?)
+3. Cover **efficiency** (can it be simpler?)
+4. Cover **extensibility** (can it grow?)
+5. Cover **usability** (can someone use it?)
+6. Cover **maintainability** (will it age well?)
+
+## Step 2: Fan-In (Synthesize)
+
+After all subagents return:
+
+1. **Aggregate** into a summary table:
+   ```
+   | Dimension | Score | Top Finding | Top Recommendation |
+   ```
+
+2. **Deduplicate** — multiple rubrics often surface the same underlying issue from different angles. Merge them.
+
+3. **Prioritize** using this order:
+   - **Convex easy wins first** — maximum impact per effort
+   - **Accuracy fixes** — wrong information is worse than missing information
+   - **Step reduction** — efficiency improvements compound
+   - **Capability extensions** — breadth improvements
+   - **Structural quality** — polish
+   - **Integration points** — forward-looking improvements last
+
+4. **Classify** each finding:
+   - **Critical** (score 1-2 in any dimension): fix immediately
+   - **Major** (score 3 in accuracy/correctness): fix in this pass
+   - **Minor** (score 3-4 in other dimensions): fix if time permits
+   - **Enhancement** (score 4+ suggestions): note for future
+
+## Step 3: Apply
+
+1. Implement all Critical and Major fixes
+2. Implement Minor fixes
+3. Implement Enhancements that are genuinely low-effort
+4. Commit with a detailed message listing:
+   - Pre-evaluation aggregate score
+   - Post-application expected score
+   - What changed and why, organized by dimension
+
+## Iteration
+
+- If any dimension remains **below 3/5** after application, run a **focused follow-up** on that specific dimension only
+- Diminishing returns set in after **2-3 rounds** — stop when all dimensions are 4+/5 or when each additional round yields fewer than 3 actionable findings
+- First drafts typically score **2-3/5** — this is expected, not a failure
diff --git a/.claude/commands/review-code.md b/.claude/commands/review-code.md
new file mode 100644
index 0000000000..995b08cfc8
--- /dev/null
+++ b/.claude/commands/review-code.md
@@ -0,0 +1,25 @@
+# Review Code
+
+Adversarially evaluate the code at **$ARGUMENTS** using the fan-out → synthesize → apply pattern.
+
+Read the target file(s) first. Then launch **6 parallel subagents**, each with one rubric:
+
+1. **Convex Easy Wins** — Dead code, unused imports, trivial type fixes, obvious simplifications, inconsistent naming, missing early returns, redundant conditions. What takes 1 line to fix but improves real quality?
+
+2. **Extending Capabilities** — Missing error handling for realistic failure modes, uncovered edge cases, unexposed configuration that should be configurable, missing validation at system boundaries, incomplete API surface.
+
+3. **Reducing Steps to Accurate Results** — Unnecessary abstractions, over-engineering, indirection that obscures intent, simpler alternatives to current patterns, premature generalization, dead flexibility (extension points nothing uses).
+
+4. **Improvement-Integration Points** — Testability gaps (untestable code paths, hard-wired dependencies), observability hooks (logging, metrics, tracing), plugin/extension points, API surface improvements, migration paths.
+
+5. **Correctness & Safety** — Logic errors, race conditions, injection vectors (SQL, XSS, command), resource leaks (connections, file handles, memory), error swallowing, incorrect assumptions, off-by-one, null/undefined access.
+
+6. **Structural Quality** — Naming consistency, module boundary clarity, dependency direction (no circular deps, layered correctly), cohesion (do modules have single responsibilities?), coupling (are modules appropriately decoupled?).
+
+Each subagent must return: **Score (1-5)**, **Specific Findings (with file:line references)**, **Concrete Recommendations (with code patches where applicable)**.
+
+After all return: aggregate scores, deduplicate, prioritize (correctness → convex wins → steps → capabilities → structure → integration), apply improvements, commit.
+
+Note: For code reviews, **correctness** takes priority over convex wins — a bug fix is always more important than a cleanup.
+
+Stop when all dimensions are 4+/5 or a round yields fewer than 3 findings.
diff --git a/.claude/commands/review-docs.md b/.claude/commands/review-docs.md
new file mode 100644
index 0000000000..c4465bc81a
--- /dev/null
+++ b/.claude/commands/review-docs.md
@@ -0,0 +1,23 @@
+# Review Documentation
+
+Adversarially evaluate the documentation at **$ARGUMENTS** using the fan-out → synthesize → apply pattern.
+
+Read the target file(s) first. Then launch **6 parallel subagents**, each with one rubric:
+
+1. **Convex Easy Wins** — Missing defaults, incorrect values, broken links, formatting inconsistencies, missing cross-references. What takes 1 line to fix but saves readers real confusion?
+
+2. **Extending Capabilities** — Is the full parameter space documented (not just common params)? Are advanced features, composition patterns, edge cases, and power-user workflows covered? Are extension points documented?
+
+3. **Reducing Steps to Accurate Results** — Is there a Quick Reference for the 80% case? Are "Use this, not that" decision tables present? Are workflow patterns step-by-step? Can a reader find the right answer without reading the full document? Are common errors and fixes inline?
+
+4. **Improvement-Integration Points** — Are limitations framed as improvement opportunities? Are automation opportunities identified (generators, CI checks, drift detectors)? Does the document suggest how the system could evolve?
+
+5. **Accuracy & Source-of-Truth Fidelity** — Cross-reference every claim against source code, schemas, or runtime behavior. Find documentation lies: stated vs actual defaults, missing enum values, changed constraints, undocumented parameters.
+
+6. **Structural & Navigational Quality** — Is the ToC complete and correctly linked? Is the glossary comprehensive? Are headings consistent and scannable? Is the metadata template uniform across all items? Are related items cross-referenced bidirectionally?
+
+Each subagent must return: **Score (1-5)**, **Specific Findings**, **Concrete Recommendations**.
+
+After all return: aggregate scores, deduplicate, prioritize (convex wins → accuracy → steps → capabilities → structure → integration), apply improvements, commit.
+
+Stop when all dimensions are 4+/5 or a round yields fewer than 3 findings.

From 0d6d802e0b1a4d928f867fc83ef9e33f91d5772e Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 31 Jan 2026 07:29:05 +0000
Subject: [PATCH 3/3] feat: Parallelize Phase 1 extraction with 6 concurrent
 subagents
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Both /extract-anatomy and /anatomize now use parallel subagent
orchestration for the extraction phase, not just the evaluation phase.

Phase 1 now fans out to 6 simultaneous extraction subagents:
  E1: Identity & Inventory (master checklist)
  E2: Inputs, Outputs & Schemas (parameter tables from source)
  E3: Internal Mechanics & State Effects (runtime behavior)
  E4: Dependencies & Relationships (dependency graph, peer map)
  E5: Constraints, Limitations & Availability (hard/soft limits)
  E6: Configuration & Environment (env vars, CLI flags, config files)

Key additions:
- Phase 0 (Reconnaissance): fast discovery scan before extraction
- Phase 1.5 (Merge): conflict resolution rules (schema > observed)
- Validation checklist before proceeding to evaluation
- Scaling strategy: split E2/E3 by category for >20 instances
- Optional E7 (cross-system) and E8 (versioning) subagents
- Rationale table: why parallel extraction beats serial

Full pipeline is now:
  Recon (1) → Extract (6‖) → Merge → Evaluate (6‖) → Synthesize → Apply

https://claude.ai/code/session_01CJnBiMXiuhsnvvvcqAMPYS
---
 .claude/commands/anatomize.md       | 298 ++++++++++++++++++----------
 .claude/commands/extract-anatomy.md | 154 +++++++++++---
 2 files changed, 314 insertions(+), 138 deletions(-)

diff --git a/.claude/commands/anatomize.md b/.claude/commands/anatomize.md
index 6024e8b592..621c66c6d9 100644
--- a/.claude/commands/anatomize.md
+++ b/.claude/commands/anatomize.md
@@ -4,121 +4,201 @@ You are executing a two-phase methodology: **Anatomize** then **Adversarially Op
 
 If no target is provided, ask the user to specify one (e.g., "native tools", "API endpoints", "database models", "CI/CD pipelines", "authentication flows").
 
+Both phases use parallel subagent orchestration. The full pipeline is:
+
+```
+Recon (1 agent) → Extract (6 parallel) → Synthesize → Evaluate (6 parallel) → Synthesize → Apply
+```
+
+---
+
+## Phase 0: Reconnaissance
+
+Before extraction, run a fast **discovery scan** with an `Explore` subagent to determine:
+
+1. **Where the target lives** — codebase, installed packages, config files, runtime artifacts
+2. **How many instances exist** — rough count to calibrate parallelization
+3. **What the source of truth is** — source code, schemas, bundle, config parsers, API specs
+
+Output: a manifest of locations and estimated instance count.
+
+---
+
+## Phase 1: Parallel Anatomical Extraction
+
+Fan-out extraction across **6 parallel subagents**, each responsible for one extraction dimension. Launch these **simultaneously in a single message**:
+
+### E1: Identity & Inventory
+> Discover and catalog every instance of the target.
+- **Name** (canonical, aliases, historical names, user-facing name)
+- **Category** / logical group
+- **Purpose** (one sentence: what it does and why it exists)
+- Confirm nothing is missing — this is the master checklist
+
+### E2: Inputs, Outputs & Schemas
+> Extract the full parameter and return shape anatomy from actual schema definitions.
+- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), description
+- **Outputs / Return Shape** — what it produces: structure, max size, truncation behavior
+- Source: Zod schemas, JSON Schema, OpenAPI specs, TypeScript types, function signatures
+
+### E3: Internal Mechanics & State Effects
+> Understand how each item actually works at runtime.
+- **Execution model** — synchronous/async, sandboxed, delegated, queued
+- **Transformation pipeline** — what happens between input and output
+- **State effects** — reads from, writes to, mutates (files, env, memory, network)
+- **Behavioral properties** — read-only, concurrency-safe, requires permission, auto-allowed
+- Source: implementation code, execution path traces
+
+### E4: Dependencies & Relationships
+> Map the dependency graph between items and the broader system.
+- **Upstream** — prerequisites, feature flags, servers, environment requirements
+- **Downstream** — what depends on this, what breaks if it changes
+- **Peer relationships** — common pairings, mutual exclusions, ordering constraints, composition patterns
+- **Cross-references** — which items reference or invoke each other
+- Source: imports, invocations, configuration wiring
+
+### E5: Constraints, Limitations & Availability
+> Catalog every hard and soft limit.
+- **Hard constraints** — enforced by schema/code: validation rules, size caps, rate limits, timeouts
+- **Soft constraints** — guidance: best practices, recommended patterns, anti-patterns
+- **Availability conditions** — feature flags, platform requirements, mutual exclusivity
+- Source: validators, error handlers, feature flag checks, conditional enablement
+
+### E6: Configuration & Environment
+> Extract all configuration surfaces that affect the target.
+- **Environment variables** — name, default, what they affect
+- **CLI flags** — name, description, interaction with the target
+- **Configuration files** — paths, schemas, scopes (project/local/user/global)
+- **Runtime settings** — programmatic configuration options
+- Source: env var readers, argument parsers, config file loaders
+
+### Scaling Strategy
+
+If the target has **>20 instances**, split E2 and E3 further by category:
+```
+E2a: Schemas for category A    E3a: Mechanics for category A
+E2b: Schemas for category B    E3b: Mechanics for category B
+```
+
+If the target has **cross-system** scope, add `E7: Cross-system integration points`.
+If the target has **versioning** concerns, add `E8: Version history and migration paths`.
+
 ---
 
-## Phase 1: Anatomize
-
-Produce a comprehensive anatomical document of every instance of the target. For each item, document:
-
-### 1. Identity
-- **Name** (canonical, aliases, historical names)
-- **Category** (which logical group it belongs to)
-- **Purpose** (one-sentence description of what it does and why it exists)
-
-### 2. Anatomical Architecture
-- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), and description
-- **Outputs / Return Shape** — what it produces, including structure, max size, truncation behavior
-- **Internal Mechanics** — how it works (execution model, delegation, transformation pipeline)
-- **State Effects** — what it reads from, writes to, or mutates
-
-### 3. Dependencies
-- **Upstream** — what must exist or be configured for this to function (prerequisites, feature flags, servers, environment)
-- **Downstream** — what depends on this, what breaks if this changes
-- **Peer Relationships** — tools/components it commonly pairs with, mutual exclusions, ordering constraints
-
-### 4. Constraints & Limitations
-- **Hard Constraints** — enforced limits (schema validation, size caps, rate limits, timeout ceilings)
-- **Soft Constraints** — guidance and best practices (recommended usage patterns, anti-patterns)
-- **Behavioral Properties** — read-only, concurrency-safe, requires permission, sandboxed, auto-allowed
-- **Availability Conditions** — feature flags, platform requirements, configuration prerequisites
-
-### Methodology
-1. Use `Explore` subagents to search the codebase, installed packages, and configuration files
-2. Cross-reference source of truth (actual runtime schemas/code) against any existing documentation
-3. Produce structured markdown with tables for parameters and bullet lists for properties
-4. Include a Quick Reference matrix at the top and a Glossary for domain-specific terms
-
-### Output
-Write the anatomical document to a file. Propose a sensible path based on the target (e.g., `docs/<target>-reference.md`). Include:
-- Glossary of domain terms
-- Quick Reference summary table
-- Full per-item anatomy sections
-- Cross-reference notes between related items
-- Configuration reference (env vars, CLI flags, config files) if applicable
+## Phase 1.5: Merge & Assemble
+
+After all extraction subagents return:
+
+### Merge
+1. Use E1's inventory as the canonical item list — every item here gets a full section
+2. For each item, merge in: E2's parameter tables, E3's mechanics, E4's dependencies, E5's constraints, E6's configuration
+3. Resolve conflicts: E2's schema-extracted values win over E3's observed values (schema is the source of truth)
+4. Flag terms each subagent had to define → aggregate into glossary
+
+### Assemble the Document
+
+Write to `docs/<target>-reference.md`. Structure:
+
+1. **Title and introduction** — one paragraph
+2. **Glossary** — aggregated from all subagents
+3. **Quick Reference matrix** — one row per item, key properties as columns
+4. **Selection Guide** — "If you want to X, use Y" (from E4's relationship data)
+5. **Workflow Patterns** — composition sequences (from E4's peer relationships)
+6. **Permission & Security Model** — auto-allowed vs permission-required (from E3 + E5)
+7. **Per-item anatomy sections** — full detail, in category order
+8. **Configuration reference** — consolidated from E6
+9. **Cross-reference index** — bidirectional links from E4
+10. **Summary table** — category → items
+11. **Historical names / aliases** — from E1
+
+### Validate Before Proceeding
+
+- [ ] Every item from E1 has a full section
+- [ ] Every parameter from E2 appears in its item's table
+- [ ] Every behavioral property from E3 is listed
+- [ ] Every dependency from E4 has a cross-reference
+- [ ] Every constraint from E5 is documented
+- [ ] Every config surface from E6 is in the configuration reference
 
 ---
 
-## Phase 2: Adversarially Optimize
-
-Once the anatomical document exists, launch **parallel subagents** to adversarially evaluate it. Each subagent gets ONE rubric dimension and must return:
-- **Score** (1-5)
-- **Specific findings** (what's wrong, missing, or improvable — with line references)
-- **Concrete recommendations** (exact changes, not vague suggestions)
-
-### Rubric Dimensions
-
-Launch these **6 subagents in parallel**:
-
-#### R1: Convex Easy Wins
-> Score how many high-impact, low-effort improvements exist.
-- Are there missing defaults that take 1 line to add?
-- Are there incorrect values that are a simple find-and-fix?
-- Are there missing cross-references that just need a "See also: X" line?
-- Are there inconsistencies in formatting/structure that a single pass fixes?
-- **Scoring:** 5 = no easy wins left (already polished), 1 = many trivial fixes remaining
-
-#### R2: Extending Capabilities
-> Score how well the document enables users to push beyond basic usage.
-- Does each item document its full parameter space (not just the common ones)?
-- Are advanced features, edge cases, and power-user patterns documented?
-- Are composition patterns shown (how items combine for emergent capability)?
-- Are extension points documented (hooks, plugins, custom configurations)?
-- **Scoring:** 5 = comprehensive advanced coverage, 1 = basic-only documentation
-
-#### R3: Reducing Steps to Accurate Results
-> Score how efficiently a reader can go from question to correct answer.
-- Is there a Quick Reference for the 80% use case?
-- Are "Use this, not that" decision tables present?
-- Are workflow patterns documented with step-by-step sequences?
-- Can a reader find the right tool/item without reading the entire document?
-- Are common errors and their fixes documented inline?
-- **Scoring:** 5 = minimal steps to any answer, 1 = requires full document read
-
-#### R4: Improvement-Integration Points
-> Score how well the document identifies where the system can be improved.
-- Are limitations framed as improvement opportunities?
-- Are gaps between current and ideal behavior identified?
-- Are automation opportunities noted (scripts, CI, generators)?
-- Are version/drift detection strategies suggested?
-- **Scoring:** 5 = clear roadmap for improvement, 1 = static reference only
-
-#### R5: Accuracy & Source-of-Truth Fidelity
-> Score correctness against the actual runtime artifact.
-- Cross-reference parameter types, defaults, and constraints against source code/schemas
-- Identify any documentation lies (stated behavior != actual behavior)
-- Check enum values are exhaustive
-- Verify constraints match runtime enforcement
-- **Scoring:** 5 = fully verified against source, 1 = unverified or contains errors
-
-#### R6: Structural & Navigational Quality
-> Score the document's architecture as a reference artifact.
-- Is the ToC complete and correctly linked?
-- Is the Glossary comprehensive (no undefined jargon)?
-- Are section headings consistent and scannable?
-- Is the metadata template consistent across all items?
-- Are related items cross-referenced bidirectionally?
+## Phase 2: Parallel Adversarial Evaluation
+
+Once the anatomical document is assembled, launch **6 evaluation subagents in parallel**:
+
+### R1: Convex Easy Wins
+> High-impact, low-effort improvements remaining.
+- Missing defaults that take 1 line to add?
+- Incorrect values that are a find-and-fix?
+- Missing cross-references that need a "See also: X" line?
+- Formatting inconsistencies fixable in a single pass?
+- **Scoring:** 5 = fully polished, 1 = many trivial fixes remaining
+
+### R2: Extending Capabilities
+> Does it enable users to push beyond basic usage?
+- Full parameter space documented (not just common params)?
+- Advanced features, edge cases, power-user patterns?
+- Composition patterns (items combining for emergent capability)?
+- Extension points (hooks, plugins, custom configurations)?
+- **Scoring:** 5 = comprehensive advanced coverage, 1 = basic-only
+
+### R3: Reducing Steps to Accurate Results
+> How efficiently can a reader go from question to answer?
+- Quick Reference for the 80% case?
+- "Use this, not that" decision tables?
+- Workflow patterns with step-by-step sequences?
+- Common errors and fixes documented inline?
+- **Scoring:** 5 = minimal steps to any answer, 1 = requires full read
+
+### R4: Improvement-Integration Points
+> Does it identify where the system can be improved?
+- Limitations framed as improvement opportunities?
+- Gaps between current and ideal behavior?
+- Automation opportunities (scripts, CI, generators)?
+- Version/drift detection strategies?
+- **Scoring:** 5 = clear improvement roadmap, 1 = static reference only
+
+### R5: Accuracy & Source-of-Truth Fidelity
+> Is it correct against the actual runtime artifact?
+- Parameter types, defaults, constraints verified against source?
+- Documentation lies identified (stated != actual)?
+- Enum values exhaustive?
+- Constraints match runtime enforcement?
+- **Scoring:** 5 = fully verified, 1 = unverified or contains errors
+
+### R6: Structural & Navigational Quality
+> Is it well-architected as a reference artifact?
+- ToC complete and correctly linked?
+- Glossary comprehensive (no undefined jargon)?
+- Section headings consistent and scannable?
+- Metadata template consistent across all items?
+- Related items cross-referenced bidirectionally?
 - **Scoring:** 5 = professional reference quality, 1 = inconsistent/hard to navigate
 
-### Synthesis
+---
+
+## Phase 2.5: Synthesize & Apply
+
+After all 6 evaluation subagents return:
+
+1. **Aggregate** scores into a summary table
+2. **Deduplicate** overlapping findings across dimensions
+3. **Prioritize** by impact:
+   - Convex easy wins first (maximum value per effort)
+   - Accuracy fixes (wrong > missing)
+   - Step reduction (efficiency compounds)
+   - Capability extensions (breadth)
+   - Structural quality (polish)
+   - Integration points (forward-looking)
+4. **Apply** all Critical and Major fixes, then Minor, then low-effort Enhancements
+5. **Commit** with detailed message listing pre/post scores and what changed
+
+---
 
-After all 6 subagents return:
-1. **Aggregate scores** into a summary table
-2. **Deduplicate** overlapping findings
-3. **Prioritize** by impact (convex wins first, then capability extensions, then structural)
-4. **Apply** all improvements to the anatomical document
-5. **Commit** with a detailed message listing what changed and why
+## Stopping Criteria
 
-### Target Scores
-- Phase 1 baseline is expected around 2-3/5 (first drafts always have gaps)
-- After Phase 2 application, target 4+/5 across all dimensions
-- If any dimension remains below 3/5, run a focused follow-up pass on that dimension
+- **Target:** 4+/5 across all 6 dimensions
+- **Stop when:** all dimensions 4+/5, OR a round yields <3 actionable findings, OR 3 rounds completed
+- **Don't stop when:** any dimension <3/5, OR accuracy findings remain unaddressed
+- **First draft baseline:** expect 2-3/5 — this is normal, not a failure
+- **Diminishing returns:** round 1 captures ~60%, round 2 ~25%, round 3 ~10%
diff --git a/.claude/commands/extract-anatomy.md b/.claude/commands/extract-anatomy.md
index bb983fa927..fb4ad9e662 100644
--- a/.claude/commands/extract-anatomy.md
+++ b/.claude/commands/extract-anatomy.md
@@ -6,41 +6,137 @@ If no target is provided, ask the user to specify one.
 
 ---
 
-## For Each Item Found, Document:
+## Phase 0: Reconnaissance
 
-### Identity
-- Name (canonical, aliases, historical names)
-- Category / logical group
-- Purpose (one sentence)
+Before extraction, run a fast **discovery scan** to determine:
 
-### Architecture
-- **Parameters / Inputs** — name, type, required/optional, default, constraints (min/max/enum/regex), description
-- **Outputs / Return Shape** — structure, max size, truncation behavior
-- **Internal Mechanics** — execution model, delegation, transformation pipeline
-- **State Effects** — reads from, writes to, mutates
+1. **Where the target lives** — codebase, installed packages, config files, runtime artifacts
+2. **How many instances exist** — rough count to determine parallelization strategy
+3. **What the source of truth is** — source code, schemas, bundle, config parsers, API specs
 
-### Dependencies
-- **Upstream** — prerequisites, feature flags, servers, environment requirements
+Use an `Explore` subagent for this. Output: a manifest of locations and estimated instance count.
+
+---
+
+## Phase 1: Parallel Extraction
+
+Fan-out extraction across **parallel subagents**, each responsible for one extraction dimension. Launch these **simultaneously in a single message**:
+
+### Subagent E1: Identity & Inventory
+> Discover and catalog every instance of the target.
+- **Name** (canonical, aliases, historical names, user-facing name)
+- **Category** / logical group
+- **Purpose** (one sentence: what it does and why it exists)
+- Output: complete inventory list, confirm nothing is missing
+
+### Subagent E2: Inputs, Outputs & Schemas
+> Extract the full parameter and return shape anatomy.
+- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), description
+- **Outputs / Return Shape** — what it produces, including structure, max size, truncation behavior
+- Source: read actual schema definitions (Zod, JSON Schema, OpenAPI, TypeScript types, function signatures)
+- Output: structured parameter tables per item
+
+### Subagent E3: Internal Mechanics & State Effects
+> Understand how each item actually works at runtime.
+- **Execution model** — synchronous/async, sandboxed, delegated, queued
+- **Transformation pipeline** — what happens between input and output
+- **State effects** — what it reads from, writes to, or mutates (files, env, memory, network)
+- **Behavioral properties** — read-only, concurrency-safe, requires permission, auto-allowed
+- Source: read implementation code, trace execution paths
+- Output: mechanics summary per item
+
+### Subagent E4: Dependencies & Relationships
+> Map the dependency graph between items and the broader system.
+- **Upstream** — prerequisites, feature flags, servers, environment requirements, configuration
 - **Downstream** — what depends on this, what breaks if it changes
-- **Peers** — common pairings, mutual exclusions, ordering constraints
+- **Peer relationships** — common pairings, mutual exclusions, ordering constraints, composition patterns
+- **Cross-references** — which items reference or invoke each other
+- Source: trace imports, invocations, configuration wiring
+- Output: dependency map and relationship matrix
+
+### Subagent E5: Constraints, Limitations & Availability
+> Catalog every hard and soft limit.
+- **Hard constraints** — enforced by schema/code: validation rules, size caps, rate limits, timeout ceilings, numeric bounds
+- **Soft constraints** — guidance: best practices, recommended patterns, anti-patterns, "do not" rules
+- **Availability conditions** — feature flags, platform requirements, configuration prerequisites, mutual exclusivity
+- Source: read validators, error handlers, feature flag checks, conditional enablement logic
+- Output: constraint table per item
+
+### Subagent E6: Configuration & Environment
+> Extract all configuration surfaces that affect the target.
+- **Environment variables** — name, default, what they affect
+- **CLI flags** — name, description, interaction with the target
+- **Configuration files** — paths, schemas, scopes (project/local/user/global)
+- **Runtime settings** — programmatic configuration options
+- Source: read env var readers, argument parsers, config file loaders
+- Output: configuration reference tables
+
+---
+
+## Phase 2: Synthesis
+
+After all 6 extraction subagents return:
+
+### Merge
+1. Use E1's inventory as the canonical item list
+2. For each item, merge in E2's parameter tables, E3's mechanics, E4's dependencies, E5's constraints, E6's configuration
+3. Resolve conflicts (if two subagents report different defaults, E2's schema-extracted value wins over E3's observed value)
+
+### Structure the Document
+
+Write to `docs/<target>-reference.md` (or propose a better path). Assemble in this order:
+
+1. **Title and introduction** — one paragraph describing what this reference covers
+2. **Glossary** — domain-specific terms discovered across all subagents (each subagent should flag terms it had to define)
+3. **Quick Reference matrix** — summary table with one row per item, columns for category, purpose, key properties
+4. **Tool Selection Guide / Use-Case Table** — "If you want to X, use Y" (derived from E4's relationship data)
+5. **Common Workflow Patterns** — composition sequences derived from E4's peer relationships
+6. **Per-item anatomy sections** — full detail, one section per item, in category order
+7. **Configuration reference** — consolidated from E6
+8. **Cross-reference index** — bidirectional links derived from E4
+9. **Summary table** — category → items mapping
+10. **Historical names / aliases** — from E1's alias data
+
+### Validate Completeness
+
+Before finalizing, verify:
+- [ ] Every item from E1's inventory has a full section
+- [ ] Every parameter from E2 appears in its item's table
+- [ ] Every behavioral property from E3 is listed
+- [ ] Every dependency from E4 has a cross-reference
+- [ ] Every constraint from E5 is documented
+- [ ] Every config surface from E6 is in the configuration reference
+
+---
+
+## Why Parallel Extraction
+
+| Serial Approach | Parallel Approach |
+|----------------|-------------------|
+| One subagent extracts everything per item, one item at a time | Six subagents each extract one dimension across ALL items simultaneously |
+| Bottlenecked on sequential item processing | All dimensions extracted concurrently |
+| Evaluator fatigue: later items get less attention | Each subagent has a narrow focus, consistent depth |
+| Schema + mechanics + dependencies tangled in one pass | Clean separation: schema expert, dependency expert, constraint expert |
+| Conflicts hidden within single output | Conflicts surfaced during merge (different subagents report different values → resolution) |
 
-### Constraints & Limitations
-- **Hard** — schema validation, size caps, rate limits, timeouts
-- **Soft** — best practices, recommended patterns, anti-patterns
-- **Behavioral Properties** — read-only, concurrency-safe, permissions, sandbox, auto-allowed
-- **Availability** — feature flags, platform requirements, configuration prerequisites
+The key insight: **extraction dimensions are independent**. Knowing an item's parameters doesn't require knowing its dependencies. Knowing its constraints doesn't require knowing its mechanics. So they parallelize cleanly.
 
-## Methodology
+### When to Add More Subagents
 
-1. Search codebase, installed packages, and config files using `Explore` subagents
-2. Cross-reference runtime source of truth against existing documentation
-3. Output structured markdown with parameter tables and property lists
+If the target has **>20 instances**, consider splitting E2 (schemas) and E3 (mechanics) further by category:
+```
+E2a: Schemas for File I/O tools
+E2b: Schemas for Search tools
+E2c: Schemas for Agent tools
+...
+```
 
-## Output Structure
+If the target has **cross-system dependencies** (e.g., frontend + backend + database), add:
+```
+E7: Cross-system integration points
+```
 
-Write to `docs/<target>-reference.md` (or propose a better path). Include:
-- Glossary of domain terms
-- Quick Reference summary matrix
-- Per-item anatomy sections
-- Cross-references between related items
-- Configuration reference (env vars, CLI flags, config files) if applicable
+If the target has **versioning concerns**, add:
+```
+E8: Version history and migration paths
+```