entireio · alishakawaguchi · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026
@@ -0,0 +1,18 @@
+{
+  "name": "entire-dev-tools",
+  "owner": {
+    "name": "Entire Team"
+  },
+  "plugins": [
+    {
+      "name": "e2e",
+      "source": "./e2e",
+      "description": "E2E test triage, debugging, and fix implementation toolkit"
+    },
+    {
+      "name": "agent-integration",
+      "source": "./agent-integration",
+      "description": "Multi-phase toolkit for integrating a new AI coding agent with the Entire CLI"
+    }
+  ]
+}
@@ -0,0 +1,5 @@
+{
+  "name": "e2e",
+  "description": "E2E test triage, debugging, and fix implementation toolkit",
+  "version": "1.0.0"
+}
@@ -0,0 +1,15 @@
+# E2E Plugin
+
+Local plugin providing individual commands for E2E test triage and debugging.
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/e2e:triage-ci` | Run failing tests locally, classify flaky vs real-bug, present findings report |
+| `/e2e:debug` | Deep-dive artifact analysis for root cause diagnosis |
+| `/e2e:implement` | Apply fixes from triage/debug findings, verify with E2E tests |
+
+## Related
+
+- Orchestrator skill: `.claude/skills/e2e/SKILL.md` (`/e2e` — runs triage-ci then implement)
@@ -0,0 +1,7 @@
+---
+description: "Deep-dive artifact analysis for diagnosing E2E test failures"
+---
+
+# Debug Command
+
+Read and follow the full procedure from `.claude/skills/e2e/debug.md`.
@@ -0,0 +1,7 @@
+---
+description: "Apply fixes from triage/debug findings, verify with scoped E2E tests"
+---
+
+# Implement Command
+
+Read and follow the full procedure from `.claude/skills/e2e/implement.md`.
@@ -0,0 +1,7 @@
+---
+description: "Triage E2E failures via local reruns or CI artifacts, classify flaky vs real-bug, present findings report"
+---
+
+# Triage CI Command
+
+Read and follow the full procedure from `.claude/skills/e2e/triage-ci.md`.
@@ -1,4 +1,16 @@
 {
+  "extraKnownMarketplaces": {
+    "entire-dev-tools": {
+      "source": {
+        "source": "directory",
+        "path": "./.claude/plugins"
+      }
+    }
+  },
+  "enabledPlugins": {
+    "e2e@entire-dev-tools": true,
+    "agent-integration@entire-dev-tools": true
+  },
   "hooks": {
     "SessionStart": [
       {

@@ -52,7 +52,7 @@ This skill enforces strict E2E-first test-driven development. The rules:
 3. **Unit tests are written last.** After all E2E tiers pass (Step 14), you write unit tests using real data collected from E2E runs as golden fixtures.
 4. **If you didn't watch it fail, you don't know if it tests the right thing.** Never write a test you haven't seen fail first.
 5. **Minimum viable fix.** At each E2E failure, implement only the code needed to fix that failure. Don't anticipate future tiers.
-6. **`/debug-e2e` is your debugger.** When an E2E test fails, use the artifact directory with `/debug-e2e` before guessing at fixes.
+6. **`/e2e:debug` is your debugger.** When an E2E test fails, use the artifact directory with `/e2e:debug` before guessing at fixes.
 
 ## Pipeline
 

@@ -13,7 +13,7 @@ Build the agent Go package using strict E2E-first TDD. Unit tests are written ON
 1. **E2E tests are the spec.** The existing `ForEachAgent` test scenarios define "working". You implement until they pass.
 2. **Watch it fail first.** Every E2E tier starts by running the test and observing the failure. If you haven't seen the failure, you don't understand what needs fixing.
 3. **Minimum viable fix.** At each failure, implement only the code needed to make that specific assertion pass. Don't anticipate future tiers.
-4. **`/debug-e2e` is your debugger.** When an E2E test fails, use the artifact directory with `/debug-e2e` before guessing at fixes.
+4. **`/e2e:debug` is your debugger.** When an E2E test fails, use the artifact directory with `/e2e:debug` before guessing at fixes.
 5. **No unit tests during Steps 4-13.** Unit tests are written in Step 14 after all E2E tiers pass, using real data from E2E runs as golden fixtures.
 6. **Format and lint, don't unit test.** Between E2E tiers, run `mise run fmt && mise run lint` to keep code clean. Any earlier `mise run test` invocations (e.g., in Step 3) are strictly compile-only sanity checks — no `mise run test` between E2E tiers (Steps 4-13).
 7. **If you didn't watch it fail, you don't know if it tests the right thing.**
@@ -83,7 +83,7 @@ This test requires no agent prompts — it only exercises hooks, so it's the fas
 
 1. Run: `mise run test:e2e --agent $AGENT_SLUG TestHumanOnlyChangesAndCommits`
 2. **Watch it fail** — read the failure output carefully
-3. If there are artifact dirs, use `/debug-e2e {artifact-dir}` to understand what happened
+3. If there are artifact dirs, use `/e2e:debug {artifact-dir}` to understand what happened
 4. Implement the minimum code to fix the first failure
 5. Repeat until the test passes
 
@@ -105,7 +105,7 @@ The foundational test. This exercises the full agent lifecycle: start session
 
 1. Run: `mise run test:e2e --agent $AGENT_SLUG TestSingleSessionManualCommit`
 2. **Watch it fail** — read the failure output carefully
-3. Use `/debug-e2e {artifact-dir}` to understand what happened
+3. Use `/e2e:debug {artifact-dir}` to understand what happened
 4. Implement the minimum code to fix the first failure
 5. Repeat until the test passes
 
@@ -127,7 +127,7 @@ Validates transcript quality: JSONL validity, content hash correctness, prompt e
 
 1. Run: `mise run test:e2e --agent $AGENT_SLUG TestCheckpointMetadataDeepValidation`
 2. **Watch it fail** — this test often exposes subtle transcript formatting bugs
-3. Use `/debug-e2e {artifact-dir}` on any failures
+3. Use `/e2e:debug {artifact-dir}` on any failures
 4. Fix and repeat
 
 Run: `mise run fmt && mise run lint`
@@ -146,7 +146,7 @@ Agent creates files and commits them within a single prompt turn. Tests the in-t
 **Cycle:**
 
 1. Run: `mise run test:e2e --agent $AGENT_SLUG TestSingleSessionAgentCommitInTurn`
-2. **Watch it fail** — use `/debug-e2e {artifact-dir}` on failures
+2. **Watch it fail** — use `/e2e:debug {artifact-dir}` on failures
 3. Fix and repeat — if the agent doesn't support committing, skip this test
 
 Run: `mise run fmt && mise run lint`
@@ -164,7 +164,7 @@ Run these tests to validate multi-session behavior:
 **Cycle (for each test):**
 
 1. Run: `mise run test:e2e --agent $AGENT_SLUG TestMultiSessionManualCommit`
-2. **Watch it fail** — use `/debug-e2e {artifact-dir}` on failures
+2. **Watch it fail** — use `/e2e:debug {artifact-dir}` on failures
 3. Fix and repeat
 4. Move to next test
 
@@ -183,7 +183,7 @@ Run these tests for file operation correctness:
 - `TestDeletedFilesCommitDeletion` — Agent deletes a file, user commits the deletion
 - `TestMixedNewAndModifiedFiles` — Agent both creates and modifies files
 
-**Cycle:** Same as above — run each test, **watch it fail**, use `/debug-e2e` on failures, fix, repeat.
+**Cycle:** Same as above — run each test, **watch it fail**, use `/e2e:debug` on failures, fix, repeat.
 
 Run: `mise run fmt && mise run lint`
 
@@ -215,7 +215,7 @@ Run these if the agent supports interactive multi-step sessions:
 - `TestRewindAfterCommit` — Rewind to a checkpoint after committing
 - `TestRewindMultipleFiles` — Rewind with multiple files changed
 
-**Cycle:** Same pattern — run, **watch it fail**, `/debug-e2e` on failures, fix, repeat.
+**Cycle:** Same pattern — run, **watch it fail**, `/e2e:debug` on failures, fix, repeat.
 
 Run: `mise run fmt && mise run lint`
 
@@ -256,7 +256,7 @@ mise run test:e2e --agent $AGENT_SLUG TestFailingTestName
 
 If a test passes when run individually but fails in the full suite, it's a flaky failure — not a real error. Only investigate failures that reproduce consistently when run in isolation.
 
-Fix any real failures before proceeding — the same cycle applies: read the failure, use `/debug-e2e {artifact-dir}`, implement the minimum fix, re-run.
+Fix any real failures before proceeding — the same cycle applies: read the failure, use `/e2e:debug {artifact-dir}`, implement the minimum fix, re-run.
 
 All E2E tests must pass before writing unit tests.
 
@@ -321,7 +321,7 @@ At every E2E failure, follow this protocol:
 
 1. **Read the test output** — the assertion message often tells you exactly what's wrong
 2. **Find the artifact directory** — E2E tests save artifacts (logs, transcripts, git state) to a temp dir printed in the output
-3. **Run `/debug-e2e {artifact-dir}`** — this skill analyzes artifacts and diagnoses the root cause
+3. **Run `/e2e:debug {artifact-dir}`** — this skill analyzes artifacts and diagnoses the root cause
 4. **Implement the minimum fix** — don't over-engineer; fix only what the test demands
 5. **Re-run the failing test** — not the whole suite, just the one test
 

@@ -199,7 +199,7 @@ Use `/commit` to commit all files.
 - **Interactive tests**: Use `s.StartSession`, `s.Send`, `s.WaitFor` — tmux pane is auto-captured in artifacts
 - **Run commands**: `mise run test:e2e --agent ${slug} TestName` — see `e2e/README.md` for all options
 - **E2E tests are run during the implement phase**: This phase only creates the runner. The implement phase runs E2E tests at each tier to drive development.
-- **Debugging failures**: If tests fail during the implement phase, use `/debug-e2e` with the artifact directory to diagnose CLI-level issues (hooks, checkpoints, session phases, attribution)
+- **Debugging failures**: If tests fail during the implement phase, use `/e2e:debug` with the artifact directory to diagnose CLI-level issues (hooks, checkpoints, session phases, attribution)
 
 ## Output
 

@@ -0,0 +1,32 @@
+---
+name: e2e
+description: >
+  Orchestrate E2E test triage and fix implementation: runs triage-ci then implement sequentially.
+  Accepts test names, --agent, artifact path, or CI run reference.
+  For individual phases, use /e2e:triage-ci, /e2e:debug, or /e2e:implement.
+  Use when the user says "triage e2e", "fix e2e failures", or wants the full triage-to-fix pipeline.
+---
+
+# E2E Triage & Fix — Full Pipeline
+
+Run triage-ci then implement sequentially. Parameters are collected once and reused across both phases.
+
+## Parameters
+
+The user provides one or more of:
+- **Test name(s)** -- e.g., `TestInteractiveMultiStep`
+- **`--agent <agent>`** -- optional, defaults to all agents that previously failed
+- **A local artifact path** -- skip straight to analysis of existing artifacts
+- **CI run reference** -- `latest`, a run ID, or a run URL
+
+## Phase 1: Triage CI
+
+Read and follow the full procedure from `.claude/skills/e2e/triage-ci.md`.
+
+This produces a findings report with classifications (flaky/real-bug/test-bug) for each test+agent pair.
+
+## Phase 2: Implement Fixes
+
+Read and follow the full procedure from `.claude/skills/e2e/implement.md`.
+
+Uses the findings from Phase 1 (already in conversation context) to propose, apply, and verify fixes.
@@ -1,17 +1,12 @@
----
-name: debug-e2e
-description: Use when investigating E2E test failures from artifacts to diagnose bugs in the Entire CLI, or when pointed at an artifact path for root cause analysis
----
-
 # Debug Entire CLI via E2E Artifacts
 
 Diagnose Entire CLI bugs using captured artifacts from the E2E test suite. Artifacts are written to `e2e/artifacts/` locally or downloaded from CI via GitHub Actions.
 
 ## Inputs
 
 The user provides either:
-- **A test run directory:** `e2e/artifacts/{timestamp}/` — triage all failures
-- **A specific test directory:** `e2e/artifacts/{timestamp}/{TestName}-{agent}/` — debug one test
+- **A test run directory:** `e2e/artifacts/{timestamp}/` -- triage all failures
+- **A specific test directory:** `e2e/artifacts/{timestamp}/{TestName}-{agent}/` -- debug one test
 
 ## Artifact Layout
 
@@ -32,7 +27,7 @@ e2e/artifacts/{timestamp}/
 
 ## Preserved Repo
 
-When the test run was executed with `E2E_KEEP_REPOS=1`, each test's artifact directory contains a `repo` symlink pointing to the preserved temporary git repository. This is the actual repo the test operated on — you can inspect it directly.
+When the test run was executed with `E2E_KEEP_REPOS=1`, each test's artifact directory contains a `repo` symlink pointing to the preserved temporary git repository. This is the actual repo the test operated on -- you can inspect it directly.
 
 **Navigate via the symlink** (e.g., `{artifact-dir}/repo/`) rather than resolving the `/tmp/...` path. The symlink lives inside the artifact directory so permissions and paths stay consistent.
 
@@ -42,7 +37,7 @@ The preserved repo contains:
 - The `.claude/` directory (if Claude Code was the agent)
 - All files the agent created or modified, in their final state
 
-This is the most powerful debugging tool — you can run `git log`, `git diff`, `git show`, inspect `.entire/` internals, and see exactly what the CLI left behind.
+This is the most powerful debugging tool -- you can run `git log`, `git diff`, `git show`, inspect `.entire/` internals, and see exactly what the CLI left behind.
 
 ## Debugging Workflow
 
@@ -53,9 +48,9 @@ Read `report.nocolor.txt` to identify failures and their error messages. Each en
 ### 2. Read console.log (most important)
 
 Full transcript of every operation:
-- `> claude -p "..." ...` — agent prompts with stdout/stderr
-- `> git add/commit/...` — git commands
-- `> send: ...` — interactive session inputs
+- `> claude -p "..." ...` -- agent prompts with stdout/stderr
+- `> git add/commit/...` -- git commands
+- `> send: ...` -- interactive session inputs
 
 This tells you what happened chronologically.
 
@@ -71,14 +66,14 @@ Cross-reference console.log (what happened) with the test (what should have happ
 |---------|-------------------|
 | Checkpoint not created / timeout | Check `entire-logs/entire.log` for hook invocations, phase transitions, errors |
 | Wrong checkpoint content | Check `git-tree.txt` for checkpoint branch files, `checkpoint-metadata/` for session info |
-| Hooks didn't fire | Check `entire.log` for missing hook entries (session-start, user-prompt-submit, stop, post-commit) |
-| Stash/unstash problems | Check `entire.log` for stash-related log lines, `git-log.txt` for commit ordering |
+| Hooks didn't fire | Check `entire-logs/entire.log` for missing hook entries (session-start, user-prompt-submit, stop, post-commit) |
+| Stash/unstash problems | Check `entire-logs/entire.log` for stash-related log lines, `git-log.txt` for commit ordering |
 | Attribution issues | Check `checkpoint-metadata/` for `files_touched`, session metadata for attribution data |
-| Strategy mismatch | Check `entire.log` for `strategy` field, verify auto-commit vs manual-commit behavior |
+| Strategy mismatch | Check `entire-logs/entire.log` for `strategy` field, verify auto-commit vs manual-commit behavior |
 
 ### 5. Deep dive files
 
-- **entire-logs/entire.log**: Structured JSON logs — hook lifecycle, session phases (`active` → `idle` → `ended`), warnings, errors. Key fields: `component`, `hook`, `strategy`, `session_id`.
+- **entire-logs/entire.log**: Structured JSON logs -- hook lifecycle, session phases (`active` -> `idle` -> `ended`), warnings, errors. Key fields: `component`, `hook`, `strategy`, `session_id`.
 - **git-log.txt**: Commit graph showing main branch, `entire/checkpoints/v1`, checkpoint initialization.
 - **git-tree.txt**: Files at HEAD vs checkpoint branch (separated by `--- entire/checkpoints/v1 ---`).
 - **checkpoint-metadata/**: `metadata.json` has `checkpoint_id`, `strategy`, `files_touched`, `token_usage`, and `sessions` array. Session subdirs have per-session details.