From 3c741cebae05bbb850abd272443b988aa855bde0 Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 31 Jan 2026 07:18:44 +0000 Subject: [PATCH 1/3] feat: Add reusable anatomize & adversarial review skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three composable slash commands extracted from session methodology: /anatomize — Full two-phase pipeline: Phase 1: Document all instances of target with identity, architecture, dependencies, constraints, and limitations in structured markdown. Phase 2: Launch 6 parallel adversarial subagents with strict rubrics (convex wins, extending capabilities, reducing steps, improvement integration, accuracy fidelity, structural quality), synthesize findings, and apply improvements targeting 4+/5 across all dimensions. /adversarial-review — Standalone Phase 2: 6-dimension parallel adversarial evaluation with scoring, findings, recommendations, and automatic application of improvements. /extract-anatomy — Standalone Phase 1: Comprehensive anatomical extraction with identity, architecture, dependencies, constraints per item. Outputs structured markdown with glossary, quick reference matrix, and cross-references. https://claude.ai/code/session_01CJnBiMXiuhsnvvvcqAMPYS --- .claude/commands/adversarial-review.md | 42 +++++++++ .claude/commands/anatomize.md | 124 +++++++++++++++++++++++++ .claude/commands/extract-anatomy.md | 46 +++++++++ 3 files changed, 212 insertions(+) create mode 100644 .claude/commands/adversarial-review.md create mode 100644 .claude/commands/anatomize.md create mode 100644 .claude/commands/extract-anatomy.md diff --git a/.claude/commands/adversarial-review.md b/.claude/commands/adversarial-review.md new file mode 100644 index 0000000000..7cd54d4c19 --- /dev/null +++ b/.claude/commands/adversarial-review.md @@ -0,0 +1,42 @@ +# Adversarial Review + +Adversarially evaluate the target artifact using parallel subagents with strict rubrics. The target is: **$ARGUMENTS** + +If the argument is a file path, read it. If it's a description, find the relevant files first. + +--- + +## Execution + +Launch **6 parallel subagents**, each assigned ONE rubric dimension. Each must return: +- **Score** (1-5, with 5 being best) +- **Specific findings** (line-level references where possible) +- **Concrete recommendations** (exact changes, not vague advice) + +### R1: Convex Easy Wins +High-impact, low-effort fixes. Missing defaults, incorrect values, simple formatting inconsistencies, missing cross-references. Score 5 = nothing trivial left to fix. + +### R2: Extending Capabilities +Does the artifact cover advanced usage, composition patterns, edge cases, extension points, power-user features? Score 5 = comprehensive advanced coverage. + +### R3: Reducing Steps to Accurate Results +Can a reader go from question to answer efficiently? Quick references, decision tables, workflow patterns, inline error docs. Score 5 = minimal friction to any answer. + +### R4: Improvement-Integration Points +Does the artifact identify where the system can be improved? Limitations as opportunities, automation potential, drift detection. Score 5 = clear improvement roadmap. + +### R5: Accuracy & Source-of-Truth Fidelity +Cross-reference against actual source code, schemas, or runtime behavior. Find documentation lies, stale values, incomplete enums. Score 5 = fully verified. + +### R6: Structural & Navigational Quality +ToC completeness, glossary coverage, heading consistency, metadata template uniformity, bidirectional cross-references. Score 5 = professional reference quality. + +## After Evaluation + +1. Present aggregate score table +2. Deduplicate overlapping findings across dimensions +3. Prioritize: convex wins → capability extensions → accuracy → structural +4. Apply all improvements +5. Commit with detailed changelog + +If any dimension scores below 3/5 after application, run a focused follow-up on that dimension. diff --git a/.claude/commands/anatomize.md b/.claude/commands/anatomize.md new file mode 100644 index 0000000000..6024e8b592 --- /dev/null +++ b/.claude/commands/anatomize.md @@ -0,0 +1,124 @@ +# Anatomize & Adversarially Optimize + +You are executing a two-phase methodology: **Anatomize** then **Adversarially Optimize**. The target is: **$ARGUMENTS** + +If no target is provided, ask the user to specify one (e.g., "native tools", "API endpoints", "database models", "CI/CD pipelines", "authentication flows"). + +--- + +## Phase 1: Anatomize + +Produce a comprehensive anatomical document of every instance of the target. For each item, document: + +### 1. Identity +- **Name** (canonical, aliases, historical names) +- **Category** (which logical group it belongs to) +- **Purpose** (one-sentence description of what it does and why it exists) + +### 2. Anatomical Architecture +- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), and description +- **Outputs / Return Shape** — what it produces, including structure, max size, truncation behavior +- **Internal Mechanics** — how it works (execution model, delegation, transformation pipeline) +- **State Effects** — what it reads from, writes to, or mutates + +### 3. Dependencies +- **Upstream** — what must exist or be configured for this to function (prerequisites, feature flags, servers, environment) +- **Downstream** — what depends on this, what breaks if this changes +- **Peer Relationships** — tools/components it commonly pairs with, mutual exclusions, ordering constraints + +### 4. Constraints & Limitations +- **Hard Constraints** — enforced limits (schema validation, size caps, rate limits, timeout ceilings) +- **Soft Constraints** — guidance and best practices (recommended usage patterns, anti-patterns) +- **Behavioral Properties** — read-only, concurrency-safe, requires permission, sandboxed, auto-allowed +- **Availability Conditions** — feature flags, platform requirements, configuration prerequisites + +### Methodology +1. Use `Explore` subagents to search the codebase, installed packages, and configuration files +2. Cross-reference source of truth (actual runtime schemas/code) against any existing documentation +3. Produce structured markdown with tables for parameters and bullet lists for properties +4. Include a Quick Reference matrix at the top and a Glossary for domain-specific terms + +### Output +Write the anatomical document to a file. Propose a sensible path based on the target (e.g., `docs/-reference.md`). Include: +- Glossary of domain terms +- Quick Reference summary table +- Full per-item anatomy sections +- Cross-reference notes between related items +- Configuration reference (env vars, CLI flags, config files) if applicable + +--- + +## Phase 2: Adversarially Optimize + +Once the anatomical document exists, launch **parallel subagents** to adversarially evaluate it. Each subagent gets ONE rubric dimension and must return: +- **Score** (1-5) +- **Specific findings** (what's wrong, missing, or improvable — with line references) +- **Concrete recommendations** (exact changes, not vague suggestions) + +### Rubric Dimensions + +Launch these **6 subagents in parallel**: + +#### R1: Convex Easy Wins +> Score how many high-impact, low-effort improvements exist. +- Are there missing defaults that take 1 line to add? +- Are there incorrect values that are a simple find-and-fix? +- Are there missing cross-references that just need a "See also: X" line? +- Are there inconsistencies in formatting/structure that a single pass fixes? +- **Scoring:** 5 = no easy wins left (already polished), 1 = many trivial fixes remaining + +#### R2: Extending Capabilities +> Score how well the document enables users to push beyond basic usage. +- Does each item document its full parameter space (not just the common ones)? +- Are advanced features, edge cases, and power-user patterns documented? +- Are composition patterns shown (how items combine for emergent capability)? +- Are extension points documented (hooks, plugins, custom configurations)? +- **Scoring:** 5 = comprehensive advanced coverage, 1 = basic-only documentation + +#### R3: Reducing Steps to Accurate Results +> Score how efficiently a reader can go from question to correct answer. +- Is there a Quick Reference for the 80% use case? +- Are "Use this, not that" decision tables present? +- Are workflow patterns documented with step-by-step sequences? +- Can a reader find the right tool/item without reading the entire document? +- Are common errors and their fixes documented inline? +- **Scoring:** 5 = minimal steps to any answer, 1 = requires full document read + +#### R4: Improvement-Integration Points +> Score how well the document identifies where the system can be improved. +- Are limitations framed as improvement opportunities? +- Are gaps between current and ideal behavior identified? +- Are automation opportunities noted (scripts, CI, generators)? +- Are version/drift detection strategies suggested? +- **Scoring:** 5 = clear roadmap for improvement, 1 = static reference only + +#### R5: Accuracy & Source-of-Truth Fidelity +> Score correctness against the actual runtime artifact. +- Cross-reference parameter types, defaults, and constraints against source code/schemas +- Identify any documentation lies (stated behavior != actual behavior) +- Check enum values are exhaustive +- Verify constraints match runtime enforcement +- **Scoring:** 5 = fully verified against source, 1 = unverified or contains errors + +#### R6: Structural & Navigational Quality +> Score the document's architecture as a reference artifact. +- Is the ToC complete and correctly linked? +- Is the Glossary comprehensive (no undefined jargon)? +- Are section headings consistent and scannable? +- Is the metadata template consistent across all items? +- Are related items cross-referenced bidirectionally? +- **Scoring:** 5 = professional reference quality, 1 = inconsistent/hard to navigate + +### Synthesis + +After all 6 subagents return: +1. **Aggregate scores** into a summary table +2. **Deduplicate** overlapping findings +3. **Prioritize** by impact (convex wins first, then capability extensions, then structural) +4. **Apply** all improvements to the anatomical document +5. **Commit** with a detailed message listing what changed and why + +### Target Scores +- Phase 1 baseline is expected around 2-3/5 (first drafts always have gaps) +- After Phase 2 application, target 4+/5 across all dimensions +- If any dimension remains below 3/5, run a focused follow-up pass on that dimension diff --git a/.claude/commands/extract-anatomy.md b/.claude/commands/extract-anatomy.md new file mode 100644 index 0000000000..bb983fa927 --- /dev/null +++ b/.claude/commands/extract-anatomy.md @@ -0,0 +1,46 @@ +# Extract Anatomy + +Produce a comprehensive anatomical document of all instances of the target: **$ARGUMENTS** + +If no target is provided, ask the user to specify one. + +--- + +## For Each Item Found, Document: + +### Identity +- Name (canonical, aliases, historical names) +- Category / logical group +- Purpose (one sentence) + +### Architecture +- **Parameters / Inputs** — name, type, required/optional, default, constraints (min/max/enum/regex), description +- **Outputs / Return Shape** — structure, max size, truncation behavior +- **Internal Mechanics** — execution model, delegation, transformation pipeline +- **State Effects** — reads from, writes to, mutates + +### Dependencies +- **Upstream** — prerequisites, feature flags, servers, environment requirements +- **Downstream** — what depends on this, what breaks if it changes +- **Peers** — common pairings, mutual exclusions, ordering constraints + +### Constraints & Limitations +- **Hard** — schema validation, size caps, rate limits, timeouts +- **Soft** — best practices, recommended patterns, anti-patterns +- **Behavioral Properties** — read-only, concurrency-safe, permissions, sandbox, auto-allowed +- **Availability** — feature flags, platform requirements, configuration prerequisites + +## Methodology + +1. Search codebase, installed packages, and config files using `Explore` subagents +2. Cross-reference runtime source of truth against existing documentation +3. Output structured markdown with parameter tables and property lists + +## Output Structure + +Write to `docs/-reference.md` (or propose a better path). Include: +- Glossary of domain terms +- Quick Reference summary matrix +- Per-item anatomy sections +- Cross-references between related items +- Configuration reference (env vars, CLI flags, config files) if applicable From f8e6d63cbe5983e77571541477818928a00ff1bc Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 31 Jan 2026 07:26:08 +0000 Subject: [PATCH 2/3] feat: Add fan-out pattern, considerations guard-rails, and review skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New slash commands extracted from session methodology: /fan-out-review — The parallel adversarial subagent pattern formalized as a reusable skill. Documents the fan-out → synthesize → apply pipeline with rubric templates for 4 artifact types: documentation, code, tests, and architecture. Includes custom rubric design guide for anything that doesn't fit the templates. /considerations — The 7 key considerations as pre-flight checklist and in-flight guard-rails: 1. Adversarial eval most valuable on first drafts (diminishing returns) 2. Parallel evaluation beats sequential (narrow mandates, no groupthink) 3. Fan-out → synthesize → apply is the core reusable unit 4. Scoring forces accountability (no hand-waving) 5. Source-of-truth extraction catches documentation lies 6. Prioritize convex easy wins (impact/effort ratio) 7. Know when to stop (4+/5 or <3 findings per round) Includes calibration table, priority formula, and stopping criteria. /review-docs — Focused documentation review with 6 parallel rubrics tuned for reference material. /review-code — Focused code review with 6 parallel rubrics tuned for implementation (correctness prioritized over cleanup). /audit-completeness — Cross-reference any artifact against its source of truth. Produces structured delta report with severity classification. Includes "common documentation lies" checklist. https://claude.ai/code/session_01CJnBiMXiuhsnvvvcqAMPYS --- .claude/commands/audit-completeness.md | 93 ++++++++++++++ .claude/commands/considerations.md | 162 +++++++++++++++++++++++++ .claude/commands/fan-out-review.md | 140 +++++++++++++++++++++ .claude/commands/review-code.md | 25 ++++ .claude/commands/review-docs.md | 23 ++++ 5 files changed, 443 insertions(+) create mode 100644 .claude/commands/audit-completeness.md create mode 100644 .claude/commands/considerations.md create mode 100644 .claude/commands/fan-out-review.md create mode 100644 .claude/commands/review-code.md create mode 100644 .claude/commands/review-docs.md diff --git a/.claude/commands/audit-completeness.md b/.claude/commands/audit-completeness.md new file mode 100644 index 0000000000..4b3faa1524 --- /dev/null +++ b/.claude/commands/audit-completeness.md @@ -0,0 +1,93 @@ +# Audit Completeness + +Cross-reference **$ARGUMENTS** against its source of truth. This is a focused accuracy and completeness audit — not a structural or stylistic review. + +--- + +## Methodology + +### Step 1: Identify the Source of Truth + +Determine what the documentation/artifact *claims to describe*, then locate the authoritative source: + +| Artifact Type | Source of Truth | +|--------------|-----------------| +| Tool/API documentation | Actual code, schema definitions (Zod, JSON Schema, OpenAPI) | +| Configuration reference | Config file parsers, CLI argument handlers, env var readers | +| Architecture docs | Actual module structure, dependency graph, deployed topology | +| Test documentation | Actual test files, coverage reports, CI configuration | +| Changelog/release notes | Git history, package versions, deployed artifacts | + +### Step 2: Extract from Source + +Using Explore subagents or Bash, extract the **actual** values from the source of truth: +- Parameter names, types, required/optional status +- Default values (from code, not from docs) +- Constraints (min, max, enum values, regex patterns) +- Feature flags and availability conditions +- Aliases and historical names + +### Step 3: Diff Against Documentation + +For every item in the documentation, check: + +| Check | Finding Type | +|-------|-------------| +| Item exists in source but not in doc | **Missing** | +| Item exists in doc but not in source | **Stale/Removed** | +| Item exists in both but values differ | **Incorrect** | +| Item exists in both, doc is vague, source is precise | **Imprecise** | +| Item exists in both, source has constraints doc doesn't mention | **Under-documented** | + +### Step 4: Report + +Produce a structured delta report: + +```markdown +## Completeness Audit: [artifact name] + +### Summary +- Items in source of truth: N +- Items documented: M +- Missing: X +- Stale: Y +- Incorrect: Z +- Imprecise: W + +### Critical (Incorrect) +| Item | Documented Value | Actual Value | Location | +|------|-----------------|--------------|----------| + +### Major (Missing) +| Item | Source Location | What Should Be Documented | +|------|----------------|--------------------------| + +### Minor (Imprecise / Under-documented) +| Item | Current Doc | More Precise Value | Source | +|------|-------------|-------------------|--------| + +### Stale (Removed from Source) +| Item | Doc Location | Notes | +|------|-------------|-------| +``` + +### Step 5: Apply + +Fix all Critical and Major findings. Fix Minor findings if low-effort. Remove Stale entries. Commit with the delta report as the commit message body. + +--- + +## Common Documentation Lies to Check + +These are the most frequent divergences between docs and reality: + +- [ ] Default values that changed in code but not in docs +- [ ] Enum values that were added (new options) but not documented +- [ ] Enum values that were removed but still documented +- [ ] Constraints that were loosened (higher max) or tightened (lower max) +- [ ] Parameters that were added but not documented +- [ ] Parameters that were renamed (old name still in docs) +- [ ] Parameters that were removed but still documented +- [ ] Features gated behind flags/conditions that docs don't mention +- [ ] Aliases that exist in code but aren't documented +- [ ] Internal parameters exposed in schema but intentionally undocumented diff --git a/.claude/commands/considerations.md b/.claude/commands/considerations.md new file mode 100644 index 0000000000..85e92f99ce --- /dev/null +++ b/.claude/commands/considerations.md @@ -0,0 +1,162 @@ +# Considerations & Guard-Rails + +You are applying the **7 Key Considerations** as a pre-flight checklist and ongoing guard-rails for the current task. The context is: **$ARGUMENTS** + +If no context is given, apply these considerations to whatever task is currently in progress. + +--- + +## The 7 Considerations + +These are hard-won principles from adversarial evaluation methodology. Apply them as both a **pre-flight checklist** (before starting) and **in-flight guard-rails** (during execution). + +### 1. Adversarial evaluation is most valuable on first drafts + +> The initial version of anything scores ~2-3/5. That's expected, not a failure. One focused round of adversarial review typically yields the largest improvement. Diminishing returns set in after 2-3 rounds. + +**Guard-rail:** Don't polish prematurely. Get the first draft out, then evaluate. Don't run 5 adversarial passes when 2 will capture 90% of the value. + +**Pre-flight check:** +- [ ] Is this a first draft? → Run full adversarial evaluation +- [ ] Is this a second pass? → Run focused evaluation on weak dimensions only +- [ ] Is this a third+ pass? → Only proceed if a specific dimension is still below 3/5 + +--- + +### 2. Parallel evaluation beats sequential + +> Six focused evaluators with narrow mandates find more than one generalist trying to evaluate everything at once. They also run concurrently, so wall-clock time is the same as a single evaluation. + +**Guard-rail:** Always fan-out to multiple subagents. Never ask one subagent to evaluate "everything." A narrow mandate prevents evaluator fatigue and blind spots. + +**Pre-flight check:** +- [ ] Have I defined distinct, non-overlapping rubric dimensions? +- [ ] Am I launching all evaluators in parallel (single message, multiple Task calls)? +- [ ] Does each evaluator have a clear scope boundary? + +--- + +### 3. The fan-out → synthesize → apply pattern is the core reusable unit + +> This three-stage pattern works for any quality gate: doc review, code review, test gap analysis, architecture review, incident postmortem, onboarding material audit. + +**Guard-rail:** If you're doing any kind of review or audit, reach for this pattern first. The stages are: +1. **Fan-out** — parallel independent evaluation with narrow mandates +2. **Synthesize** — aggregate, deduplicate, prioritize +3. **Apply** — implement changes in priority order + +**Applicable to:** +- Documentation and reference material +- Source code and implementations +- Test suites and coverage +- Architecture and design documents +- Configuration and infrastructure +- API designs and schemas +- Onboarding and runbooks +- Incident postmortems and retrospectives + +--- + +### 4. Scoring forces accountability + +> Without numeric scores, evaluations drift toward "looks good" or vague suggestions. A 1-5 scale with mandatory specific findings per score makes hand-waving impossible. + +**Guard-rail:** Every evaluation dimension MUST produce: +- A numeric score (1-5) +- At least one specific finding (with file/line reference where possible) +- At least one concrete recommendation (exact change, not "consider improving X") + +**Scoring calibration:** +| Score | Meaning | Action Required | +|-------|---------|-----------------| +| 1 | Fundamentally broken | Stop and fix before proceeding | +| 2 | Significant gaps | Major rework needed | +| 3 | Functional but incomplete | Targeted improvements | +| 4 | Good with minor gaps | Polish pass | +| 5 | Comprehensive | No action needed | + +--- + +### 5. Source-of-truth extraction catches documentation lies + +> Inspecting the actual runtime artifact (code, schema, config) catches things that reviewing documentation in isolation never will. The stated behavior and the actual behavior diverge more often than expected. + +**Guard-rail:** Always cross-reference documentation claims against the source of truth. Methods: +- Read the actual source code / compiled bundle / installed package +- Run the tool and observe actual behavior +- Check schema definitions (Zod, JSON Schema, OpenAPI) for real constraints +- Compare documented defaults against actual defaults in code + +**Common documentation lies:** +- Default values that changed but the doc wasn't updated +- Enum values that were added but not documented +- Constraints that were loosened/tightened +- Parameters that were added/removed/renamed +- Features gated behind flags that the doc doesn't mention + +--- + +### 6. Prioritize convex easy wins + +> A "convex" improvement is one where the value delivered is disproportionately high relative to the effort required. Fix these first. They compound — each easy fix improves the overall quality, making the next improvement clearer. + +**Guard-rail:** After synthesis, sort by effort-to-impact ratio: + +``` +Priority = Impact / Effort + +High impact + Low effort = DO FIRST (convex wins) +High impact + High effort = DO SECOND (strategic) +Low impact + Low effort = DO THIRD (cleanup) +Low impact + High effort = SKIP (waste) +``` + +**Common convex wins:** +- Adding a missing default value to a parameter table (1 line, prevents user confusion) +- Fixing an incorrect constraint (1 word, prevents runtime errors) +- Adding a cross-reference "See also: X" (1 line, connects isolated knowledge) +- Adding a glossary entry (2 lines, eliminates repeated confusion) +- Fixing inconsistent formatting (find-and-replace, improves scannability) + +--- + +### 7. Know when to stop + +> Adversarial evaluation has diminishing returns. The first round captures ~60% of issues. The second captures ~25% more. The third captures ~10%. Beyond that, you're polishing polish. + +**Guard-rail:** Stop when ANY of these are true: +- All dimensions score 4+/5 +- A round yields fewer than 3 actionable findings +- You've completed 3 rounds +- The remaining findings are all "Enhancement" severity (nice-to-have, not need-to-have) + +**Do NOT stop when:** +- Any dimension is below 3/5 +- Accuracy/correctness findings remain unaddressed (wrong > missing) +- Critical or Major findings exist from any dimension + +--- + +## Using These as a Pre-Flight Checklist + +Before starting any review, evaluation, or documentation task, verify: + +``` +[ ] 1. What draft stage is this? (first → full eval, second → focused, third+ → only if <3/5) +[ ] 2. Am I using parallel subagents with narrow mandates? (not one generalist) +[ ] 3. Am I using fan-out → synthesize → apply? (not ad-hoc review) +[ ] 4. Does every evaluation require scores + findings + recommendations? (not vibes) +[ ] 5. Am I cross-referencing against the source of truth? (not reviewing docs in isolation) +[ ] 6. Am I prioritizing convex easy wins first? (not big-effort items) +[ ] 7. Do I know my stopping criteria? (not infinite polish) +``` + +## Using These as In-Flight Guard-Rails + +During execution, check periodically: + +- **Am I going deep on one dimension at the expense of others?** → Rebalance +- **Am I making changes without scoring first?** → Score, then change +- **Am I trusting the documentation without verifying the source?** → Cross-reference +- **Am I spending significant effort on a low-impact finding?** → Skip, note for future +- **Am I on round 4+ with marginal returns?** → Stop diff --git a/.claude/commands/fan-out-review.md b/.claude/commands/fan-out-review.md new file mode 100644 index 0000000000..3850a5af17 --- /dev/null +++ b/.claude/commands/fan-out-review.md @@ -0,0 +1,140 @@ +# Fan-Out → Synthesize → Apply + +You are executing the **Parallel Adversarial Subagent Pattern**. The target is: **$ARGUMENTS** + +If the argument is a file path, read it. If it's a description, find the relevant artifact first. If no argument is given, ask the user what to review. + +--- + +## The Pattern + +``` +┌─────────────────────────┐ +│ Artifact Under Review │ +└────────┬────────────────┘ + │ + ┌────┴────┐ + │ Fan-Out │ (parallel, independent, narrow mandates) + └────┬────┘ + │ + ┌─────┼─────┬─────┬─────┬─────┐ + ▼ ▼ ▼ ▼ ▼ ▼ + R1 R2 R3 R4 R5 R6 + │ │ │ │ │ │ + └─────┼─────┴─────┴─────┴─────┘ + │ + ┌────┴────┐ + │ Fan-In │ (synthesize, deduplicate, prioritize) + └────┬────┘ + │ + ┌────┴────┐ + │ Apply │ (implement, commit) + └─────────┘ +``` + +## Why This Works + +- Each evaluator has a **narrow mandate** — prevents blind spots from a single reviewer trying to evaluate everything +- Parallel execution means **no serial bottleneck** — all lenses run simultaneously +- Independent rubrics mean evaluators **cannot groupthink** — they don't see each other's findings +- Mandatory scoring with mandatory findings means **no hand-waving** — every score requires evidence +- Fan-in deduplication catches **overlapping concerns** across lenses and merges them + +## Step 1: Fan-Out + +Launch these subagents **in parallel** (all in a single message with multiple Task tool calls). Each subagent: +- Receives the full artifact +- Gets ONE rubric dimension (its narrow mandate) +- Must return: Score (1-5), Specific Findings (with line references), Concrete Recommendations + +**Determine your rubric dimensions based on the artifact type.** Here are templates: + +### For Documentation / Reference Material +| Subagent | Rubric | Mandate | +|----------|--------|---------| +| R1 | Convex Easy Wins | High-impact, low-effort fixes: missing defaults, wrong values, formatting gaps | +| R2 | Extending Capabilities | Advanced usage, composition patterns, edge cases, power-user features | +| R3 | Reducing Steps to Results | Quick references, decision tables, workflow patterns, inline error docs | +| R4 | Improvement-Integration Points | Limitations as opportunities, automation potential, drift detection | +| R5 | Accuracy & Source Fidelity | Cross-reference against source code/schemas, find documentation lies | +| R6 | Structural & Navigation | ToC, glossary, heading consistency, cross-references, metadata uniformity | + +### For Code / Implementation +| Subagent | Rubric | Mandate | +|----------|--------|---------| +| R1 | Convex Easy Wins | Dead code, unused imports, trivial type fixes, obvious simplifications | +| R2 | Extending Capabilities | Missing error handling, uncovered edge cases, unexposed configuration | +| R3 | Reducing Steps to Results | Unnecessary abstractions, over-engineering, simpler alternatives | +| R4 | Improvement-Integration Points | Testability gaps, observability hooks, plugin points, API surface | +| R5 | Correctness & Safety | Logic errors, race conditions, injection vectors, resource leaks | +| R6 | Structural Quality | Naming consistency, module boundaries, dependency direction, cohesion | + +### For Tests / Test Suites +| Subagent | Rubric | Mandate | +|----------|--------|---------| +| R1 | Convex Easy Wins | Missing assertions, unchecked return values, trivial coverage gaps | +| R2 | Extending Coverage | Untested branches, missing edge cases, integration gaps | +| R3 | Reducing Noise | Flaky tests, slow tests, redundant tests, unclear failure messages | +| R4 | Improvement-Integration Points | Missing test categories, automation opportunities, CI optimization | +| R5 | Correctness & Confidence | Tests that pass for wrong reasons, mocked-away real behavior | +| R6 | Structural Quality | Test organization, naming, setup/teardown patterns, helper reuse | + +### For Architecture / Design Documents +| Subagent | Rubric | Mandate | +|----------|--------|---------| +| R1 | Convex Easy Wins | Missing diagrams, undefined terms, broken cross-references | +| R2 | Extending Capabilities | Unaddressed scaling scenarios, missing failure modes, evolution paths | +| R3 | Reducing Steps to Results | Decision rationale, trade-off matrices, rejected alternatives | +| R4 | Improvement-Integration Points | Migration paths, deprecation strategies, extension mechanisms | +| R5 | Feasibility & Accuracy | Assumptions vs reality, dependency availability, performance claims | +| R6 | Structural Quality | Consistency, completeness, ADR format compliance, audience clarity | + +### Custom Rubrics +If the artifact doesn't fit the above, design 6 rubric dimensions that: +1. Cover **correctness** (is it right?) +2. Cover **completeness** (is anything missing?) +3. Cover **efficiency** (can it be simpler?) +4. Cover **extensibility** (can it grow?) +5. Cover **usability** (can someone use it?) +6. Cover **maintainability** (will it age well?) + +## Step 2: Fan-In (Synthesize) + +After all subagents return: + +1. **Aggregate** into a summary table: + ``` + | Dimension | Score | Top Finding | Top Recommendation | + ``` + +2. **Deduplicate** — multiple rubrics often surface the same underlying issue from different angles. Merge them. + +3. **Prioritize** using this order: + - **Convex easy wins first** — maximum impact per effort + - **Accuracy fixes** — wrong information is worse than missing information + - **Step reduction** — efficiency improvements compound + - **Capability extensions** — breadth improvements + - **Structural quality** — polish + - **Integration points** — forward-looking improvements last + +4. **Classify** each finding: + - **Critical** (score 1-2 in any dimension): fix immediately + - **Major** (score 3 in accuracy/correctness): fix in this pass + - **Minor** (score 3-4 in other dimensions): fix if time permits + - **Enhancement** (score 4+ suggestions): note for future + +## Step 3: Apply + +1. Implement all Critical and Major fixes +2. Implement Minor fixes +3. Implement Enhancements that are genuinely low-effort +4. Commit with a detailed message listing: + - Pre-evaluation aggregate score + - Post-application expected score + - What changed and why, organized by dimension + +## Iteration + +- If any dimension remains **below 3/5** after application, run a **focused follow-up** on that specific dimension only +- Diminishing returns set in after **2-3 rounds** — stop when all dimensions are 4+/5 or when each additional round yields fewer than 3 actionable findings +- First drafts typically score **2-3/5** — this is expected, not a failure diff --git a/.claude/commands/review-code.md b/.claude/commands/review-code.md new file mode 100644 index 0000000000..995b08cfc8 --- /dev/null +++ b/.claude/commands/review-code.md @@ -0,0 +1,25 @@ +# Review Code + +Adversarially evaluate the code at **$ARGUMENTS** using the fan-out → synthesize → apply pattern. + +Read the target file(s) first. Then launch **6 parallel subagents**, each with one rubric: + +1. **Convex Easy Wins** — Dead code, unused imports, trivial type fixes, obvious simplifications, inconsistent naming, missing early returns, redundant conditions. What takes 1 line to fix but improves real quality? + +2. **Extending Capabilities** — Missing error handling for realistic failure modes, uncovered edge cases, unexposed configuration that should be configurable, missing validation at system boundaries, incomplete API surface. + +3. **Reducing Steps to Accurate Results** — Unnecessary abstractions, over-engineering, indirection that obscures intent, simpler alternatives to current patterns, premature generalization, dead flexibility (extension points nothing uses). + +4. **Improvement-Integration Points** — Testability gaps (untestable code paths, hard-wired dependencies), observability hooks (logging, metrics, tracing), plugin/extension points, API surface improvements, migration paths. + +5. **Correctness & Safety** — Logic errors, race conditions, injection vectors (SQL, XSS, command), resource leaks (connections, file handles, memory), error swallowing, incorrect assumptions, off-by-one, null/undefined access. + +6. **Structural Quality** — Naming consistency, module boundary clarity, dependency direction (no circular deps, layered correctly), cohesion (do modules have single responsibilities?), coupling (are modules appropriately decoupled?). + +Each subagent must return: **Score (1-5)**, **Specific Findings (with file:line references)**, **Concrete Recommendations (with code patches where applicable)**. + +After all return: aggregate scores, deduplicate, prioritize (correctness → convex wins → steps → capabilities → structure → integration), apply improvements, commit. + +Note: For code reviews, **correctness** takes priority over convex wins — a bug fix is always more important than a cleanup. + +Stop when all dimensions are 4+/5 or a round yields fewer than 3 findings. diff --git a/.claude/commands/review-docs.md b/.claude/commands/review-docs.md new file mode 100644 index 0000000000..c4465bc81a --- /dev/null +++ b/.claude/commands/review-docs.md @@ -0,0 +1,23 @@ +# Review Documentation + +Adversarially evaluate the documentation at **$ARGUMENTS** using the fan-out → synthesize → apply pattern. + +Read the target file(s) first. Then launch **6 parallel subagents**, each with one rubric: + +1. **Convex Easy Wins** — Missing defaults, incorrect values, broken links, formatting inconsistencies, missing cross-references. What takes 1 line to fix but saves readers real confusion? + +2. **Extending Capabilities** — Is the full parameter space documented (not just common params)? Are advanced features, composition patterns, edge cases, and power-user workflows covered? Are extension points documented? + +3. **Reducing Steps to Accurate Results** — Is there a Quick Reference for the 80% case? Are "Use this, not that" decision tables present? Are workflow patterns step-by-step? Can a reader find the right answer without reading the full document? Are common errors and fixes inline? + +4. **Improvement-Integration Points** — Are limitations framed as improvement opportunities? Are automation opportunities identified (generators, CI checks, drift detectors)? Does the document suggest how the system could evolve? + +5. **Accuracy & Source-of-Truth Fidelity** — Cross-reference every claim against source code, schemas, or runtime behavior. Find documentation lies: stated vs actual defaults, missing enum values, changed constraints, undocumented parameters. + +6. **Structural & Navigational Quality** — Is the ToC complete and correctly linked? Is the glossary comprehensive? Are headings consistent and scannable? Is the metadata template uniform across all items? Are related items cross-referenced bidirectionally? + +Each subagent must return: **Score (1-5)**, **Specific Findings**, **Concrete Recommendations**. + +After all return: aggregate scores, deduplicate, prioritize (convex wins → accuracy → steps → capabilities → structure → integration), apply improvements, commit. + +Stop when all dimensions are 4+/5 or a round yields fewer than 3 findings. From 0d6d802e0b1a4d928f867fc83ef9e33f91d5772e Mon Sep 17 00:00:00 2001 From: Claude Date: Sat, 31 Jan 2026 07:29:05 +0000 Subject: [PATCH 3/3] feat: Parallelize Phase 1 extraction with 6 concurrent subagents MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Both /extract-anatomy and /anatomize now use parallel subagent orchestration for the extraction phase, not just the evaluation phase. Phase 1 now fans out to 6 simultaneous extraction subagents: E1: Identity & Inventory (master checklist) E2: Inputs, Outputs & Schemas (parameter tables from source) E3: Internal Mechanics & State Effects (runtime behavior) E4: Dependencies & Relationships (dependency graph, peer map) E5: Constraints, Limitations & Availability (hard/soft limits) E6: Configuration & Environment (env vars, CLI flags, config files) Key additions: - Phase 0 (Reconnaissance): fast discovery scan before extraction - Phase 1.5 (Merge): conflict resolution rules (schema > observed) - Validation checklist before proceeding to evaluation - Scaling strategy: split E2/E3 by category for >20 instances - Optional E7 (cross-system) and E8 (versioning) subagents - Rationale table: why parallel extraction beats serial Full pipeline is now: Recon (1) → Extract (6‖) → Merge → Evaluate (6‖) → Synthesize → Apply https://claude.ai/code/session_01CJnBiMXiuhsnvvvcqAMPYS --- .claude/commands/anatomize.md | 298 ++++++++++++++++++---------- .claude/commands/extract-anatomy.md | 154 +++++++++++--- 2 files changed, 314 insertions(+), 138 deletions(-) diff --git a/.claude/commands/anatomize.md b/.claude/commands/anatomize.md index 6024e8b592..621c66c6d9 100644 --- a/.claude/commands/anatomize.md +++ b/.claude/commands/anatomize.md @@ -4,121 +4,201 @@ You are executing a two-phase methodology: **Anatomize** then **Adversarially Op If no target is provided, ask the user to specify one (e.g., "native tools", "API endpoints", "database models", "CI/CD pipelines", "authentication flows"). +Both phases use parallel subagent orchestration. The full pipeline is: + +``` +Recon (1 agent) → Extract (6 parallel) → Synthesize → Evaluate (6 parallel) → Synthesize → Apply +``` + +--- + +## Phase 0: Reconnaissance + +Before extraction, run a fast **discovery scan** with an `Explore` subagent to determine: + +1. **Where the target lives** — codebase, installed packages, config files, runtime artifacts +2. **How many instances exist** — rough count to calibrate parallelization +3. **What the source of truth is** — source code, schemas, bundle, config parsers, API specs + +Output: a manifest of locations and estimated instance count. + +--- + +## Phase 1: Parallel Anatomical Extraction + +Fan-out extraction across **6 parallel subagents**, each responsible for one extraction dimension. Launch these **simultaneously in a single message**: + +### E1: Identity & Inventory +> Discover and catalog every instance of the target. +- **Name** (canonical, aliases, historical names, user-facing name) +- **Category** / logical group +- **Purpose** (one sentence: what it does and why it exists) +- Confirm nothing is missing — this is the master checklist + +### E2: Inputs, Outputs & Schemas +> Extract the full parameter and return shape anatomy from actual schema definitions. +- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), description +- **Outputs / Return Shape** — what it produces: structure, max size, truncation behavior +- Source: Zod schemas, JSON Schema, OpenAPI specs, TypeScript types, function signatures + +### E3: Internal Mechanics & State Effects +> Understand how each item actually works at runtime. +- **Execution model** — synchronous/async, sandboxed, delegated, queued +- **Transformation pipeline** — what happens between input and output +- **State effects** — reads from, writes to, mutates (files, env, memory, network) +- **Behavioral properties** — read-only, concurrency-safe, requires permission, auto-allowed +- Source: implementation code, execution path traces + +### E4: Dependencies & Relationships +> Map the dependency graph between items and the broader system. +- **Upstream** — prerequisites, feature flags, servers, environment requirements +- **Downstream** — what depends on this, what breaks if it changes +- **Peer relationships** — common pairings, mutual exclusions, ordering constraints, composition patterns +- **Cross-references** — which items reference or invoke each other +- Source: imports, invocations, configuration wiring + +### E5: Constraints, Limitations & Availability +> Catalog every hard and soft limit. +- **Hard constraints** — enforced by schema/code: validation rules, size caps, rate limits, timeouts +- **Soft constraints** — guidance: best practices, recommended patterns, anti-patterns +- **Availability conditions** — feature flags, platform requirements, mutual exclusivity +- Source: validators, error handlers, feature flag checks, conditional enablement + +### E6: Configuration & Environment +> Extract all configuration surfaces that affect the target. +- **Environment variables** — name, default, what they affect +- **CLI flags** — name, description, interaction with the target +- **Configuration files** — paths, schemas, scopes (project/local/user/global) +- **Runtime settings** — programmatic configuration options +- Source: env var readers, argument parsers, config file loaders + +### Scaling Strategy + +If the target has **>20 instances**, split E2 and E3 further by category: +``` +E2a: Schemas for category A E3a: Mechanics for category A +E2b: Schemas for category B E3b: Mechanics for category B +``` + +If the target has **cross-system** scope, add `E7: Cross-system integration points`. +If the target has **versioning** concerns, add `E8: Version history and migration paths`. + --- -## Phase 1: Anatomize - -Produce a comprehensive anatomical document of every instance of the target. For each item, document: - -### 1. Identity -- **Name** (canonical, aliases, historical names) -- **Category** (which logical group it belongs to) -- **Purpose** (one-sentence description of what it does and why it exists) - -### 2. Anatomical Architecture -- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), and description -- **Outputs / Return Shape** — what it produces, including structure, max size, truncation behavior -- **Internal Mechanics** — how it works (execution model, delegation, transformation pipeline) -- **State Effects** — what it reads from, writes to, or mutates - -### 3. Dependencies -- **Upstream** — what must exist or be configured for this to function (prerequisites, feature flags, servers, environment) -- **Downstream** — what depends on this, what breaks if this changes -- **Peer Relationships** — tools/components it commonly pairs with, mutual exclusions, ordering constraints - -### 4. Constraints & Limitations -- **Hard Constraints** — enforced limits (schema validation, size caps, rate limits, timeout ceilings) -- **Soft Constraints** — guidance and best practices (recommended usage patterns, anti-patterns) -- **Behavioral Properties** — read-only, concurrency-safe, requires permission, sandboxed, auto-allowed -- **Availability Conditions** — feature flags, platform requirements, configuration prerequisites - -### Methodology -1. Use `Explore` subagents to search the codebase, installed packages, and configuration files -2. Cross-reference source of truth (actual runtime schemas/code) against any existing documentation -3. Produce structured markdown with tables for parameters and bullet lists for properties -4. Include a Quick Reference matrix at the top and a Glossary for domain-specific terms - -### Output -Write the anatomical document to a file. Propose a sensible path based on the target (e.g., `docs/-reference.md`). Include: -- Glossary of domain terms -- Quick Reference summary table -- Full per-item anatomy sections -- Cross-reference notes between related items -- Configuration reference (env vars, CLI flags, config files) if applicable +## Phase 1.5: Merge & Assemble + +After all extraction subagents return: + +### Merge +1. Use E1's inventory as the canonical item list — every item here gets a full section +2. For each item, merge in: E2's parameter tables, E3's mechanics, E4's dependencies, E5's constraints, E6's configuration +3. Resolve conflicts: E2's schema-extracted values win over E3's observed values (schema is the source of truth) +4. Flag terms each subagent had to define → aggregate into glossary + +### Assemble the Document + +Write to `docs/-reference.md`. Structure: + +1. **Title and introduction** — one paragraph +2. **Glossary** — aggregated from all subagents +3. **Quick Reference matrix** — one row per item, key properties as columns +4. **Selection Guide** — "If you want to X, use Y" (from E4's relationship data) +5. **Workflow Patterns** — composition sequences (from E4's peer relationships) +6. **Permission & Security Model** — auto-allowed vs permission-required (from E3 + E5) +7. **Per-item anatomy sections** — full detail, in category order +8. **Configuration reference** — consolidated from E6 +9. **Cross-reference index** — bidirectional links from E4 +10. **Summary table** — category → items +11. **Historical names / aliases** — from E1 + +### Validate Before Proceeding + +- [ ] Every item from E1 has a full section +- [ ] Every parameter from E2 appears in its item's table +- [ ] Every behavioral property from E3 is listed +- [ ] Every dependency from E4 has a cross-reference +- [ ] Every constraint from E5 is documented +- [ ] Every config surface from E6 is in the configuration reference --- -## Phase 2: Adversarially Optimize - -Once the anatomical document exists, launch **parallel subagents** to adversarially evaluate it. Each subagent gets ONE rubric dimension and must return: -- **Score** (1-5) -- **Specific findings** (what's wrong, missing, or improvable — with line references) -- **Concrete recommendations** (exact changes, not vague suggestions) - -### Rubric Dimensions - -Launch these **6 subagents in parallel**: - -#### R1: Convex Easy Wins -> Score how many high-impact, low-effort improvements exist. -- Are there missing defaults that take 1 line to add? -- Are there incorrect values that are a simple find-and-fix? -- Are there missing cross-references that just need a "See also: X" line? -- Are there inconsistencies in formatting/structure that a single pass fixes? -- **Scoring:** 5 = no easy wins left (already polished), 1 = many trivial fixes remaining - -#### R2: Extending Capabilities -> Score how well the document enables users to push beyond basic usage. -- Does each item document its full parameter space (not just the common ones)? -- Are advanced features, edge cases, and power-user patterns documented? -- Are composition patterns shown (how items combine for emergent capability)? -- Are extension points documented (hooks, plugins, custom configurations)? -- **Scoring:** 5 = comprehensive advanced coverage, 1 = basic-only documentation - -#### R3: Reducing Steps to Accurate Results -> Score how efficiently a reader can go from question to correct answer. -- Is there a Quick Reference for the 80% use case? -- Are "Use this, not that" decision tables present? -- Are workflow patterns documented with step-by-step sequences? -- Can a reader find the right tool/item without reading the entire document? -- Are common errors and their fixes documented inline? -- **Scoring:** 5 = minimal steps to any answer, 1 = requires full document read - -#### R4: Improvement-Integration Points -> Score how well the document identifies where the system can be improved. -- Are limitations framed as improvement opportunities? -- Are gaps between current and ideal behavior identified? -- Are automation opportunities noted (scripts, CI, generators)? -- Are version/drift detection strategies suggested? -- **Scoring:** 5 = clear roadmap for improvement, 1 = static reference only - -#### R5: Accuracy & Source-of-Truth Fidelity -> Score correctness against the actual runtime artifact. -- Cross-reference parameter types, defaults, and constraints against source code/schemas -- Identify any documentation lies (stated behavior != actual behavior) -- Check enum values are exhaustive -- Verify constraints match runtime enforcement -- **Scoring:** 5 = fully verified against source, 1 = unverified or contains errors - -#### R6: Structural & Navigational Quality -> Score the document's architecture as a reference artifact. -- Is the ToC complete and correctly linked? -- Is the Glossary comprehensive (no undefined jargon)? -- Are section headings consistent and scannable? -- Is the metadata template consistent across all items? -- Are related items cross-referenced bidirectionally? +## Phase 2: Parallel Adversarial Evaluation + +Once the anatomical document is assembled, launch **6 evaluation subagents in parallel**: + +### R1: Convex Easy Wins +> High-impact, low-effort improvements remaining. +- Missing defaults that take 1 line to add? +- Incorrect values that are a find-and-fix? +- Missing cross-references that need a "See also: X" line? +- Formatting inconsistencies fixable in a single pass? +- **Scoring:** 5 = fully polished, 1 = many trivial fixes remaining + +### R2: Extending Capabilities +> Does it enable users to push beyond basic usage? +- Full parameter space documented (not just common params)? +- Advanced features, edge cases, power-user patterns? +- Composition patterns (items combining for emergent capability)? +- Extension points (hooks, plugins, custom configurations)? +- **Scoring:** 5 = comprehensive advanced coverage, 1 = basic-only + +### R3: Reducing Steps to Accurate Results +> How efficiently can a reader go from question to answer? +- Quick Reference for the 80% case? +- "Use this, not that" decision tables? +- Workflow patterns with step-by-step sequences? +- Common errors and fixes documented inline? +- **Scoring:** 5 = minimal steps to any answer, 1 = requires full read + +### R4: Improvement-Integration Points +> Does it identify where the system can be improved? +- Limitations framed as improvement opportunities? +- Gaps between current and ideal behavior? +- Automation opportunities (scripts, CI, generators)? +- Version/drift detection strategies? +- **Scoring:** 5 = clear improvement roadmap, 1 = static reference only + +### R5: Accuracy & Source-of-Truth Fidelity +> Is it correct against the actual runtime artifact? +- Parameter types, defaults, constraints verified against source? +- Documentation lies identified (stated != actual)? +- Enum values exhaustive? +- Constraints match runtime enforcement? +- **Scoring:** 5 = fully verified, 1 = unverified or contains errors + +### R6: Structural & Navigational Quality +> Is it well-architected as a reference artifact? +- ToC complete and correctly linked? +- Glossary comprehensive (no undefined jargon)? +- Section headings consistent and scannable? +- Metadata template consistent across all items? +- Related items cross-referenced bidirectionally? - **Scoring:** 5 = professional reference quality, 1 = inconsistent/hard to navigate -### Synthesis +--- + +## Phase 2.5: Synthesize & Apply + +After all 6 evaluation subagents return: + +1. **Aggregate** scores into a summary table +2. **Deduplicate** overlapping findings across dimensions +3. **Prioritize** by impact: + - Convex easy wins first (maximum value per effort) + - Accuracy fixes (wrong > missing) + - Step reduction (efficiency compounds) + - Capability extensions (breadth) + - Structural quality (polish) + - Integration points (forward-looking) +4. **Apply** all Critical and Major fixes, then Minor, then low-effort Enhancements +5. **Commit** with detailed message listing pre/post scores and what changed + +--- -After all 6 subagents return: -1. **Aggregate scores** into a summary table -2. **Deduplicate** overlapping findings -3. **Prioritize** by impact (convex wins first, then capability extensions, then structural) -4. **Apply** all improvements to the anatomical document -5. **Commit** with a detailed message listing what changed and why +## Stopping Criteria -### Target Scores -- Phase 1 baseline is expected around 2-3/5 (first drafts always have gaps) -- After Phase 2 application, target 4+/5 across all dimensions -- If any dimension remains below 3/5, run a focused follow-up pass on that dimension +- **Target:** 4+/5 across all 6 dimensions +- **Stop when:** all dimensions 4+/5, OR a round yields <3 actionable findings, OR 3 rounds completed +- **Don't stop when:** any dimension <3/5, OR accuracy findings remain unaddressed +- **First draft baseline:** expect 2-3/5 — this is normal, not a failure +- **Diminishing returns:** round 1 captures ~60%, round 2 ~25%, round 3 ~10% diff --git a/.claude/commands/extract-anatomy.md b/.claude/commands/extract-anatomy.md index bb983fa927..fb4ad9e662 100644 --- a/.claude/commands/extract-anatomy.md +++ b/.claude/commands/extract-anatomy.md @@ -6,41 +6,137 @@ If no target is provided, ask the user to specify one. --- -## For Each Item Found, Document: +## Phase 0: Reconnaissance -### Identity -- Name (canonical, aliases, historical names) -- Category / logical group -- Purpose (one sentence) +Before extraction, run a fast **discovery scan** to determine: -### Architecture -- **Parameters / Inputs** — name, type, required/optional, default, constraints (min/max/enum/regex), description -- **Outputs / Return Shape** — structure, max size, truncation behavior -- **Internal Mechanics** — execution model, delegation, transformation pipeline -- **State Effects** — reads from, writes to, mutates +1. **Where the target lives** — codebase, installed packages, config files, runtime artifacts +2. **How many instances exist** — rough count to determine parallelization strategy +3. **What the source of truth is** — source code, schemas, bundle, config parsers, API specs -### Dependencies -- **Upstream** — prerequisites, feature flags, servers, environment requirements +Use an `Explore` subagent for this. Output: a manifest of locations and estimated instance count. + +--- + +## Phase 1: Parallel Extraction + +Fan-out extraction across **parallel subagents**, each responsible for one extraction dimension. Launch these **simultaneously in a single message**: + +### Subagent E1: Identity & Inventory +> Discover and catalog every instance of the target. +- **Name** (canonical, aliases, historical names, user-facing name) +- **Category** / logical group +- **Purpose** (one sentence: what it does and why it exists) +- Output: complete inventory list, confirm nothing is missing + +### Subagent E2: Inputs, Outputs & Schemas +> Extract the full parameter and return shape anatomy. +- **Parameters / Inputs** — every parameter with: name, type, required/optional, default value, constraints (min/max/enum values/regex), description +- **Outputs / Return Shape** — what it produces, including structure, max size, truncation behavior +- Source: read actual schema definitions (Zod, JSON Schema, OpenAPI, TypeScript types, function signatures) +- Output: structured parameter tables per item + +### Subagent E3: Internal Mechanics & State Effects +> Understand how each item actually works at runtime. +- **Execution model** — synchronous/async, sandboxed, delegated, queued +- **Transformation pipeline** — what happens between input and output +- **State effects** — what it reads from, writes to, or mutates (files, env, memory, network) +- **Behavioral properties** — read-only, concurrency-safe, requires permission, auto-allowed +- Source: read implementation code, trace execution paths +- Output: mechanics summary per item + +### Subagent E4: Dependencies & Relationships +> Map the dependency graph between items and the broader system. +- **Upstream** — prerequisites, feature flags, servers, environment requirements, configuration - **Downstream** — what depends on this, what breaks if it changes -- **Peers** — common pairings, mutual exclusions, ordering constraints +- **Peer relationships** — common pairings, mutual exclusions, ordering constraints, composition patterns +- **Cross-references** — which items reference or invoke each other +- Source: trace imports, invocations, configuration wiring +- Output: dependency map and relationship matrix + +### Subagent E5: Constraints, Limitations & Availability +> Catalog every hard and soft limit. +- **Hard constraints** — enforced by schema/code: validation rules, size caps, rate limits, timeout ceilings, numeric bounds +- **Soft constraints** — guidance: best practices, recommended patterns, anti-patterns, "do not" rules +- **Availability conditions** — feature flags, platform requirements, configuration prerequisites, mutual exclusivity +- Source: read validators, error handlers, feature flag checks, conditional enablement logic +- Output: constraint table per item + +### Subagent E6: Configuration & Environment +> Extract all configuration surfaces that affect the target. +- **Environment variables** — name, default, what they affect +- **CLI flags** — name, description, interaction with the target +- **Configuration files** — paths, schemas, scopes (project/local/user/global) +- **Runtime settings** — programmatic configuration options +- Source: read env var readers, argument parsers, config file loaders +- Output: configuration reference tables + +--- + +## Phase 2: Synthesis + +After all 6 extraction subagents return: + +### Merge +1. Use E1's inventory as the canonical item list +2. For each item, merge in E2's parameter tables, E3's mechanics, E4's dependencies, E5's constraints, E6's configuration +3. Resolve conflicts (if two subagents report different defaults, E2's schema-extracted value wins over E3's observed value) + +### Structure the Document + +Write to `docs/-reference.md` (or propose a better path). Assemble in this order: + +1. **Title and introduction** — one paragraph describing what this reference covers +2. **Glossary** — domain-specific terms discovered across all subagents (each subagent should flag terms it had to define) +3. **Quick Reference matrix** — summary table with one row per item, columns for category, purpose, key properties +4. **Tool Selection Guide / Use-Case Table** — "If you want to X, use Y" (derived from E4's relationship data) +5. **Common Workflow Patterns** — composition sequences derived from E4's peer relationships +6. **Per-item anatomy sections** — full detail, one section per item, in category order +7. **Configuration reference** — consolidated from E6 +8. **Cross-reference index** — bidirectional links derived from E4 +9. **Summary table** — category → items mapping +10. **Historical names / aliases** — from E1's alias data + +### Validate Completeness + +Before finalizing, verify: +- [ ] Every item from E1's inventory has a full section +- [ ] Every parameter from E2 appears in its item's table +- [ ] Every behavioral property from E3 is listed +- [ ] Every dependency from E4 has a cross-reference +- [ ] Every constraint from E5 is documented +- [ ] Every config surface from E6 is in the configuration reference + +--- + +## Why Parallel Extraction + +| Serial Approach | Parallel Approach | +|----------------|-------------------| +| One subagent extracts everything per item, one item at a time | Six subagents each extract one dimension across ALL items simultaneously | +| Bottlenecked on sequential item processing | All dimensions extracted concurrently | +| Evaluator fatigue: later items get less attention | Each subagent has a narrow focus, consistent depth | +| Schema + mechanics + dependencies tangled in one pass | Clean separation: schema expert, dependency expert, constraint expert | +| Conflicts hidden within single output | Conflicts surfaced during merge (different subagents report different values → resolution) | -### Constraints & Limitations -- **Hard** — schema validation, size caps, rate limits, timeouts -- **Soft** — best practices, recommended patterns, anti-patterns -- **Behavioral Properties** — read-only, concurrency-safe, permissions, sandbox, auto-allowed -- **Availability** — feature flags, platform requirements, configuration prerequisites +The key insight: **extraction dimensions are independent**. Knowing an item's parameters doesn't require knowing its dependencies. Knowing its constraints doesn't require knowing its mechanics. So they parallelize cleanly. -## Methodology +### When to Add More Subagents -1. Search codebase, installed packages, and config files using `Explore` subagents -2. Cross-reference runtime source of truth against existing documentation -3. Output structured markdown with parameter tables and property lists +If the target has **>20 instances**, consider splitting E2 (schemas) and E3 (mechanics) further by category: +``` +E2a: Schemas for File I/O tools +E2b: Schemas for Search tools +E2c: Schemas for Agent tools +... +``` -## Output Structure +If the target has **cross-system dependencies** (e.g., frontend + backend + database), add: +``` +E7: Cross-system integration points +``` -Write to `docs/-reference.md` (or propose a better path). Include: -- Glossary of domain terms -- Quick Reference summary matrix -- Per-item anatomy sections -- Cross-references between related items -- Configuration reference (env vars, CLI flags, config files) if applicable +If the target has **versioning concerns**, add: +``` +E8: Version history and migration paths +```