Skip to content

Add two-tier health check framework#864

Merged
jwbron merged 25 commits intomainfrom
egg/issue-850
Feb 22, 2026
Merged

Add two-tier health check framework#864
jwbron merged 25 commits intomainfrom
egg/issue-850

Conversation

@james-in-a-box
Copy link
Contributor

Add two-tier health check framework for pipeline failure detection

The orchestrator's failure detection was a collection of one-off mechanisms
(startup reconciliation, container monitor, iteration caps) with no shared
abstraction and no ability to detect semantic failures — cases where
infrastructure looks healthy but agents didn't accomplish meaningful work.
The issue-835 post-mortem demonstrated this gap: reviewers exited cleanly
but had nothing to review because coder work never reached the remote.

This PR introduces a unified two-tier health check framework:

Tier 1 — Programmatic checks (fast, deterministic, cheap): container
liveness, startup state reconciliation, phase output presence, and state
consistency. These run on every lifecycle event and catch structural failures.

Tier 2 — Agent inspector (LLM-powered semantic analysis): evaluates
whether the pipeline is making meaningful progress by reasoning over agent
outputs, git history, and logs. Produces structured verdicts (HEALTHY /
DEGRADED / FAILED) with reasoning. Runs when Tier 1 flags DEGRADED or at
phase completion.

Both tiers implement a shared HealthCheck protocol and produce HealthResult
values. A HealthCheckRunner dispatches checks by lifecycle trigger
(STARTUP, RUNTIME_TICK, WAVE_COMPLETE, PHASE_COMPLETE, ON_DEMAND) and
handles automatic Tier 1 → Tier 2 escalation. Results are surfaced via the
existing SSE/event system and new REST endpoints.

Integration points: multi_agent.py runs checks at wave completion,
routes/phases.py runs checks at phase completion, and
routes/health.py exposes on-demand health check endpoints.

Closes #850

Test plan:

  • 305+ new tests across 8 test files covering types, context assembly,
    Tier 1 checks, Tier 2 agent inspector, runner dispatch/escalation,
    lifecycle integration, and edge cases
  • pytest orchestrator/tests/test_health_check_*.py passes
  • Existing orchestrator tests unaffected

Authored-by: egg

egg-orchestrator added 22 commits February 21, 2026 19:57
Container ca452dbd3bfc6ab23bd9acd73a27ba46fff63c87ea61dbde429e68bfbc13ac85 exited with uncommitted changes.
This commit preserves the agent's work-in-progress.

Authored-by: egg
Container d365774501d9ad0bef9c23971a9099327cac2bd36473d9ac2795a355e2c32ffe exited with uncommitted changes.
This commit preserves the agent's work-in-progress.

Authored-by: egg
Implements the core health check framework and all Tier 1 programmatic
checks for holistic pipeline failure detection. This unifies existing
failure detection mechanisms (ContainerMonitor, startup reconciliation)
under a shared HealthCheck protocol and adds two new checks — phase
output presence and state consistency — that would have caught issue-835
where agents exited cleanly but produced no meaningful work.

New files:
- orchestrator/health_checks/ — HealthCheck protocol, HealthResult,
  HealthCheckRunner with trigger dispatch and Tier 1→2 escalation,
  PipelineHealthContext with lazy properties
- orchestrator/health_checks/tier1/ — ContainerLivenessCheck,
  StartupStateCheck (adapters), PhaseOutputPresenceCheck,
  StateConsistencyCheck (new)
- orchestrator/tests/test_health_check_types.py — 27 tests for core
  types, runner, context, and escalation logic
- orchestrator/tests/test_health_check_tier1.py — 26 tests for all
  four Tier 1 checks

Modified files:
- events.py — HEALTH_CHECK_STARTED/COMPLETED/DEGRADED/FAILED events
- models.py — HealthCheckResultModel, health_check_results on PhaseExecution
- cli.py — initialize runner at startup, run STARTUP trigger
- container_monitor.py — RUNTIME_TICK integration on state changes
- multi_agent.py — WAVE_COMPLETE trigger after each wave
- routes/phases.py — PHASE_COMPLETE trigger before advancement
- routes/health.py — GET /api/v1/pipelines/{id}/health endpoint

Issue: #850
Document the health check framework introduced in #850 (phase 1):
- Add health_checks/README.md with full framework reference covering
  types, context, runner, all Tier 1 checks, integration points, and
  instructions for adding new checks
- Update orchestrator/README.md with health check section, new API
  endpoint, file tree entries, and event types
- Update docs/architecture/orchestrator.md with health check framework
  architecture, lifecycle integration, and lazy context design
- Update docs/index.md task-specific guide table with health check entry
- Fix _run_single to return DEGRADED/ALERT on unexpected exception
  instead of HEALTHY/CONTINUE (issue #1: correctness)
- Wrap health checks in 'if not force' in advance_phase so force
  parameter bypasses health checks consistently (issue #2: design)
- Remove unused HealthCheckResultModel and health_check_results field
  from PhaseExecution (issue #3: dead code)
- Replace deprecated datetime.utcnow() with datetime.now(UTC) in
  health check types and context (observation #4)
- Add debug logging to bare except in container_monitor per-pipeline
  health check loop (observation #6)
Add missing docstrings to PipelineHealthContext.__init__() and cheap
accessor properties.  Add docstrings to HealthCheckRunner _emit_*
methods and inline comments explaining aggregate status logic,
graceful degradation strategy, and tier escalation at each lifecycle
integration point (cli.py, container_monitor.py, multi_agent.py,
routes/phases.py, routes/health.py).  Expand events.py inline
comments for new HEALTH_CHECK event types.

Authored-by: egg
Auto-fix 10 ruff lint errors: remove unused imports (datetime,
HealthCheckRunner, worst_action, HealthTrigger, AgentExecution,
AgentExecutionStatus, AgentRole, UTC), sort imports, replace
assert False with raise AssertionError (B011). Reformat 3 files
for consistent code style. Update check results.
Implement the Tier 2 semantic health check system for pipeline
observability. This adds an AgentInspectorCheck that calls the Claude
API to analyze agent state (git log, diff stats, agent outputs,
contract) and returns structured HEALTHY/DEGRADED/FAILED verdicts.

Key changes:
- Enrich PipelineHealthContext with contract lazy property and diff
  truncation (~4000 token cap)
- Implement AgentInspectorCheck with httpx-based Claude API calls,
  JSON verdict parsing, timeout/retry, and graceful degradation
- Wire AgentInspectorCheck into runner via cli.py registration
- Add 61 comprehensive unit tests covering context assembly, prompt
  construction, verdict parsing, API mocking, escalation logic,
  event emission, and edge cases

All 346 health check tests pass. Lint clean.
Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough Code Review — PR #864: Two-Tier Health Check Framework

Overall the framework design is solid: clean protocol types, lazy context, tiered escalation, and good graceful degradation. The test coverage at 305+ tests is substantial. However, there are several correctness and integration issues that should be addressed before merge.


Issue 1 (Bug): docker_client not passed to PipelineHealthContext in three call sites

Severity: High — Causes ContainerLivenessCheck and StartupStateCheck to silently return HEALTHY always.

_run_wave_health_checks() in multi_agent.py:597, the on-demand endpoint in routes/health.py:138, and the phase-advance gate in routes/phases.py:266 all construct PipelineHealthContext without passing docker_client. When docker_client is None, _fetch_live_container_ids() returns an empty set, so:

  • ContainerLivenessCheck always finds "no missing containers" → reports HEALTHY
  • StartupStateCheck always finds "no stale containers" → reports HEALTHY
  • StateConsistencyCheck Check 1 never detects missing containers → misses FAILED state

The CLI startup path (cli.py:187) correctly passes docker_client=dc, and the container monitor path (container_monitor.py:173) correctly passes docker_client=self.docker_client. But the three integration points above do not.

Fix for multi_agent.py:597:

ctx = PipelineHealthContext(
    pipeline=self.pipeline,
    repo_path=self.repo_path,
    trigger=HealthTrigger.WAVE_COMPLETE.value,
    wave_number=wave_number,
    docker_client=self.docker_client,  # Add this
)

Fix for routes/health.py:138:

try:
    from docker_client import get_docker_client
    dc = get_docker_client()
except Exception:
    dc = None
ctx = PipelineHealthContext(
    pipeline=pipeline,
    repo_path=Path(repo_path),
    trigger=HealthTrigger.ON_DEMAND.value,
    state_store=store,
    docker_client=dc,  # Add this
)

Fix for routes/phases.py:266: Same pattern as above.


Issue 2 (Bug): _read_contract truncates JSON before parsing, guaranteeing parse failure for large contracts

context.py:212:

raw = contract_path.read_text(errors="replace")[:_TIER2_CHAR_CAP]
return json.loads(raw)

If the contract JSON is longer than 16000 chars, this slices mid-string, producing invalid JSON (e.g. {"tasks": [{"sta). json.loads will raise JSONDecodeError every time, and the method silently returns {}. The truncation is intended for Tier 2 prompt size control but should happen after parsing, not before.

Fix:

raw = contract_path.read_text(errors="replace")
contract = json.loads(raw)
# Truncation should happen in _build_user_prompt, not here
return contract

Alternatively, truncate the serialized output only when building the Tier 2 prompt, not when reading the data. Tier 1 checks (StateConsistencyCheck._check_contract_consistency) also read this data and need the full structure.


Issue 3 (Bug): _parse_verdict crashes on code fence without newline

agent_inspector.py:134:

if cleaned.startswith("```"):
    first_newline = cleaned.index("\n")

str.index() raises ValueError if no \n is present. If Claude returns ```json{...}``` on a single line, this crashes. The outer except Exception in AgentInspectorCheck.run() catches it and gracefully degrades, so it's not catastrophic, but it means a valid (parseable) response gets discarded as an API failure.

Fix: Use cleaned.find("\n") and handle the -1 case, or use a regex:

if cleaned.startswith("```"):
    nl = cleaned.find("\n")
    if nl != -1:
        cleaned = cleaned[nl + 1:]
    else:
        cleaned = cleaned[3:]  # Strip just the backticks

Issue 4 (Design): Duplicate HEALTH_CHECK_COMPLETED events

runner.py emits HEALTH_CHECK_COMPLETED in two places:

  1. _emit_result() at line 194 emits it for each individual check that returns HEALTHY
  2. _emit_completed() at line 226 emits it as the aggregate completion event

Both use EventType.HEALTH_CHECK_COMPLETED with different payload shapes. Any subscriber filtering on this event type will receive both individual and aggregate events with no way to distinguish them.

Fix: Either use a distinct event type for aggregate completion (e.g. HEALTH_CHECK_RUN_COMPLETED), or add a discriminator field (e.g. "aggregate": true) to the aggregate event payload.


Issue 5 (Design): Tier 2 FAILED verdict cannot trigger FAIL_PIPELINE

AgentInspectorCheck.run() at line 264 maps non-HEALTHY verdicts to HealthAction.ALERT:

action=HealthAction.ALERT if status != HealthStatus.HEALTHY else HealthAction.CONTINUE,

Even when the LLM returns FAILED (clear signs of stuck agent), the action is only ALERT, never FAIL_PIPELINE. Since the phase-advance gate in routes/phases.py:272 and the wave loop in multi_agent.py:604 both check for FAIL_PIPELINE specifically, a Tier 2 FAILED verdict has no mechanical effect — it only emits an event. This seems intentional (Tier 2 is advisory), but it undermines the stated goal of catching issue-835 patterns. Consider mapping HealthStatus.FAILED from Tier 2 to HealthAction.FAIL_PIPELINE.


Issue 6 (Minor): _run_health_checks_on_change in container_monitor runs on every state change per-container

container_monitor.py:200 calls _run_health_checks_on_change() inside _check_container(), which iterates over ALL running pipelines. If 10 containers change state in one poll cycle, health checks run 10 times against all pipelines. This should probably run once at the end of _check_all_containers() instead of per-container.


Issue 7 (Minor): HealthResult.timestamp serialization appends "Z" to already-UTC datetime

types.py:84:

"timestamp": self.timestamp.isoformat() + "Z",

datetime.now(UTC) produces timezone-aware datetimes whose isoformat() already includes +00:00. Appending Z produces 2024-01-15T12:00:00+00:00Z which is not valid ISO 8601. Either use .isoformat() alone (with +00:00), or strip timezone info and append Z:

"timestamp": self.timestamp.replace(tzinfo=None).isoformat() + "Z",

Summary

Issues 1–3 are correctness bugs that should be fixed. Issue 1 in particular renders the container liveness checks non-functional in most code paths. Issue 2 breaks contract parsing for larger projects. Issue 3 is a low-probability crash with graceful fallback but is trivial to fix.

Issues 4–7 are design and minor issues worth addressing but not blockers on their own.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Verification: Issue #850 — Holistic Pipeline Failure Detection

The contract JSON for this issue has empty acceptance_criteria and phases fields. Verification is performed against the 6 scope items defined in the issue body, which is the authoritative source.

Scope Item Verification

1. Define HealthCheck, PipelineHealthContext, HealthResult interfaces — VERIFIED

  • HealthCheck protocol defined in orchestrator/health_checks/types.py with name, tier, triggers, run() method
  • HealthResult frozen dataclass with status (HEALTHY/DEGRADED/FAILED), check_name, tier, reasoning, action (CONTINUE/FAIL_PIPELINE/ALERT), details, timestamp
  • PipelineHealthContext in context.py with cheap accessors (pipeline_id, branch, current_phase) and lazy-loaded expensive properties (git_log, git_diff_stat, agent_outputs, contract, live_container_ids)
  • Clean protocol-based design — no inheritance required

2. Migrate startup reconciliation and container monitor to interface — VERIFIED

  • StartupStateCheck (tier1/startup_state.py) adapts reconcile_stale_containers logic as a HealthCheck
  • ContainerLivenessCheck (tier1/container_liveness.py) adapts container monitor logic as a HealthCheck
  • Original modules preserved: startup_reconciliation.py untouched, container_monitor.py has minimal non-intrusive hooks only (set_health_check_runner, optional _run_health_checks_on_change)

3. Implement new Tier 1 checks: phase output presence, state consistency — VERIFIED

  • PhaseOutputPresenceCheck (tier1/phase_output.py): Detects issue-835 pattern — agents completed but no commits on remote. Phase-specific logic for IMPLEMENT (checks git commits) and PLAN (checks draft artifacts). Returns DEGRADED/ALERT.
  • StateConsistencyCheck (tier1/state_consistency.py): Cross-references orchestrator state vs Docker reality vs contract. Three checks: RUNNING agents with missing Docker containers, container/agent status mismatch, COMPLETE agents with PENDING tasks.

4. Implement Tier 2 agent inspector — VERIFIED

  • AgentInspectorCheck (tier2/agent_inspector.py): Calls Claude API with assembled context (git log, diff stats, agent outputs, contract)
  • Produces structured JSON verdict (HEALTHY/DEGRADED/FAILED) with reasoning
  • Graceful degradation: API failures return HEALTHY with warning — Tier 2 never blocks pipeline
  • Advisory-only: never returns FAIL_PIPELINE, only ALERT or CONTINUE
  • Token budgeting: per-file cap (4000 chars), contract cap (3000 chars), diff stat cap (~16KB)

5. Wire trigger points into lifecycle — VERIFIED

Trigger matrix matches issue specification exactly:

Trigger Location Tier 1 Tier 2 Action on FAIL_PIPELINE
STARTUP cli.py Always Never Logged, non-fatal
RUNTIME_TICK container_monitor.py Always Never Logged, non-fatal
WAVE_COMPLETE multi_agent.py Always If Tier 1 DEGRADED Stops wave execution
PHASE_COMPLETE routes/phases.py Always Always Blocks advance (409)
ON_DEMAND routes/health.py Always Always Returns results (200)

force=true parameter bypasses health checks in phase advance (escape hatch for stuck pipelines).

6. Surface health check results via SSE/event system — VERIFIED

  • 5 new event types in events.py: HEALTH_CHECK, HEALTH_CHECK_STARTED, HEALTH_CHECK_COMPLETED, HEALTH_CHECK_DEGRADED, HEALTH_CHECK_FAILED
  • HealthCheckRunner emits per-check and aggregate events via EventBus
  • Events flow through existing SSE infrastructure for observability
  • New REST endpoint: GET /api/v1/pipelines/<id>/health for on-demand inspection

Test Coverage

  • 419 health check tests across 10 test files, all passing
  • 1146 orchestrator tests passing (3 pre-existing docker_client failures unrelated to this PR)
  • Lint: ruff check and format pass clean
  • Test files cover: types, context, all 4 Tier 1 checks, Tier 2 inspector, runner dispatch/escalation, lifecycle integration, and edge cases

Code Review Feedback Status

  • Phase 1 review feedback fully addressed: exception handling returns DEGRADED (not HEALTHY), force bypass works, dead code removed, datetime.now(UTC) adopted
  • Phase 2 review approved with non-blocking minor suggestions (timestamp ISO format, RUNTIME_TICK debouncing, event type ambiguity)

Non-Blocking Observations

  1. Model config inconsistency: Default model is claude-sonnet-4-20250514 but shared config defines ANTHROPIC_HEALTH_CHECK_MODEL = "claude-3-haiku-20240307". The env var HEALTH_CHECK_MODEL overrides at runtime, but the disconnect between the hard-coded default and shared config could cause cost surprises.

  2. Timestamp ISO format: HealthResult.to_dict() appends 'Z' to a timezone-aware timestamp, producing malformed +00:00Z suffix. Low impact but worth fixing.

  3. RUNTIME_TICK frequency: Health checks fire per-container state change, not per-poll-cycle. With many containers, this could cause redundant checks. Consider debouncing.

Verdict: APPROVE

All 6 scope items from issue #850 are fully implemented and verified. The two-tier health check framework correctly implements the proposed design with proper trigger integration, tier escalation, graceful degradation, and comprehensive test coverage. The issue-835 failure pattern (agents exit cleanly but produce no work) would now be caught by PhaseOutputPresenceCheck at Tier 1 and AgentInspectorCheck at Tier 2.

— Authored by egg

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No agent-mode design concerns.

This PR introduces orchestrator-level infrastructure (health check framework), not agent workflows. The Tier 2 agent inspector is a narrow, single-purpose LLM call for pipeline diagnostics — not a general agent with tool access — so the pre-assembled context and structured JSON output are appropriate patterns here. The agent-mode design guidelines target interactive agent workflows where the agent has tools and should explore freely; a diagnostic API call with a focused verdict protocol is a different category entirely.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

- Pass docker_client to PipelineHealthContext in multi_agent.py,
  routes/health.py, and routes/phases.py so container liveness and
  startup state checks work in all code paths (Issue 1)
- Remove pre-parse truncation in _read_contract so large contracts
  parse correctly; truncation stays in prompt building (Issue 2)
- Use str.find instead of str.index in _parse_verdict to avoid
  ValueError on code fences without newlines (Issue 3)
- Add "aggregate": true discriminator to aggregate
  HEALTH_CHECK_COMPLETED events to distinguish from per-check
  events (Issue 4)
- Move _run_health_checks_on_change from per-container to once per
  poll cycle in _check_all_containers (Issue 6)
- Fix timestamp serialization to strip tzinfo before appending Z,
  producing valid ISO 8601 (Issue 7)
@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review — PR #864: Two-Tier Health Check Framework

Reviewed delta: faa962e..b552372 (1 commit addressing 6 of 7 review issues).

Previous Issues — Verification

Issue 1 (docker_client not passed): Fixed correctly in all three call sites. multi_agent.py:602 passes self.docker_client. routes/health.py:140 and routes/phases.py:270 both acquire it via get_docker_client() with try/except for graceful degradation. Tests updated to set docker_client = None on the executor, matching the real initialization fallback path.

Issue 2 (_read_contract truncation): Fixed correctly. The pre-parse [:_TIER2_CHAR_CAP] slice removed from context.py:212. Full JSON is now parsed; downstream prompt-building caps individual fields. Test updated to verify large contracts parse successfully.

Issue 3 (_parse_verdict crash): Fixed correctly. str.index("\n") replaced with str.find("\n") and the -1 case handled by stripping just the backtick prefix. No test added for the single-line code fence path, but it degrades correctly (falls through to json.loads which handles or rejects appropriately).

Issue 4 (duplicate HEALTH_CHECK_COMPLETED): Fixed correctly. Aggregate event now includes "aggregate": True discriminator. Per-check events use result.to_dict() which has no such key, so subscribers can distinguish them.

Issue 5 (Tier 2 FAILED → FAIL_PIPELINE): Author disagreed and kept Tier 2 as advisory-only. The reasoning is sound — LLM inference can produce false positives, and gating pipeline termination on operational confidence before promoting the action severity is the right call for a new framework.

Issue 6 (per-container health checks): Fixed correctly. _run_health_checks_on_change() moved from _check_container() (per-container) to end of _check_all_containers() (once per poll cycle). The method runs every cycle regardless of state changes, which is acceptable for RUNTIME_TICK semantics.

Issue 7 (timestamp serialization): Fixed correctly. .replace(tzinfo=None).isoformat() + "Z" produces valid ISO 8601 like 2024-01-15T12:00:00Z.

New Changes — No Issues Found

The fix commit is clean and minimal. No new logic beyond the targeted fixes. Test updates properly mirror the code changes (adding docker_client = None to test executors, updating the contract truncation test assertion).

Non-Blocking Observations

  1. Method naming: _run_health_checks_on_change() now runs every poll cycle, not just on state changes. The docstring was correctly updated ("at the end of each monitor poll cycle"), but the method name is slightly misleading. Consider renaming to _run_runtime_tick_checks() for clarity.

  2. Missing state_store in routes/phases.py: The PipelineHealthContext in routes/phases.py:274 does not pass state_store, while routes/health.py:148 does. Currently no Tier 1 check uses state_store directly from the context, so this is not a bug, but it's an inconsistency that could bite if a future check expects it.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Re-Verification: Issue #850 — Incremental Review

Previous review: commit faa962e — all 6 scope items verified.
This review: commit b552372 — 1 new commit since last review.

Delta Analysis

The new commit addresses 6 issues from the code review (egg-reviewer's CHANGES_REQUESTED review). Each fix verified:

Issue Description Fix Status
1 docker_client not passed to PipelineHealthContext in 3 call sites Added docker_client param in multi_agent.py, routes/health.py, routes/phases.py with graceful degradation Correct
2 _read_contract truncated before JSON parsing, corrupting large contracts Removed pre-parse [:_TIER2_CHAR_CAP] slice; truncation now happens in prompt building Correct
3 _parse_verdict used str.index() which raises ValueError on missing newline Replaced with str.find() + explicit -1 check Correct
4 Aggregate and per-check HEALTH_CHECK_COMPLETED events indistinguishable Added "aggregate": True to aggregate event payload Correct
6 RUNTIME_TICK health checks fired per-container instead of per-poll-cycle Moved _run_health_checks_on_change() from _check_container() to _check_all_containers() Correct
7 Timezone-aware timestamp produced malformed ISO 8601 (+00:00Z) timestamp.replace(tzinfo=None).isoformat() + "Z" strips tzinfo before appending Z Correct

Tests updated in test_health_check_integration.py, test_health_check_lifecycle_integration.py, and test_health_check_tier2.py to match new behavior (docker_client param, contract parsing assertion).

Regression Check

All 325 health check tests pass (168 core + 157 tester-authored). No scope items regressed.

Scope Item Re-Verification

All 6 scope items from issue #850 remain fully satisfied:

  1. HealthCheck/PipelineHealthContext/HealthResult interfaces — Unchanged, verified.
  2. Startup reconciliation and container monitor migration — Unchanged, verified.
  3. New Tier 1 checks (phase output, state consistency) — Unchanged, verified.
  4. Tier 2 agent inspector_read_contract and _parse_verdict fixes improve correctness without changing scope coverage. Verified.
  5. Trigger point wiringdocker_client now correctly passed at all 3 call sites (multi_agent, routes/health, routes/phases). Health checks run once per poll cycle. Verified.
  6. SSE/event system surfacing — Aggregate events now distinguishable via "aggregate": True. Verified.

Verdict

Approved. The delta correctly addresses all flagged review issues without introducing regressions or contract violations.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

Rename _run_health_checks_on_change() to _run_runtime_tick_checks() since
the method now runs every poll cycle, not just on state changes. Add missing
state_store parameter to PipelineHealthContext in routes/phases.py for
consistency with routes/health.py.
@james-in-a-box
Copy link
Contributor Author

Addressed both non-blocking suggestions in commit 93376fa:

  1. Method naming: Renamed _run_health_checks_on_change() to _run_runtime_tick_checks() in container_monitor.py and all test files. The docstring and set_health_check_runner docstring were already updated in the previous commit; the method name now matches the semantics.

  2. Missing state_store in routes/phases.py: Added state_store=store to the PipelineHealthContext construction, matching the pattern used in routes/health.py.

All 430 health check tests pass. Ruff check and format clean.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No agent-mode design concerns.

The delta since my last review (b552372) is a single commit (93376fa) that renames _run_health_checks_on_change() to _run_runtime_tick_checks() and adds a missing state_store parameter in routes/phases.py. These are minor naming and consistency fixes with no impact on agent-mode design. My previous assessment stands: this is orchestrator-level infrastructure with a narrow diagnostic LLM call, not an interactive agent workflow.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review — PR #864: Two-Tier Health Check Framework

Reviewed delta: b552372..93376fa (1 commit addressing 2 non-blocking suggestions).

Previous Non-Blocking Suggestions — Verification

Suggestion 1 (method naming): Fixed correctly. _run_health_checks_on_change() renamed to _run_runtime_tick_checks() in container_monitor.py:148. The call site in _check_all_containers() at line 235 updated. Docstring on set_health_check_runner() at line 137 also updated to reference the new name. All 8 test references across test_health_check_integration.py and test_health_check_lifecycle_integration.py updated consistently. No stale references to the old name remain in code or tests.

Suggestion 2 (missing state_store in routes/phases.py): Fixed correctly. state_store=store added to PipelineHealthContext at routes/phases.py:278, matching the pattern in routes/health.py:149.

New Changes — No Issues Found

The commit is clean and narrowly scoped to the two requested changes. No new logic, no unrelated modifications.

Minor Observation (Non-Blocking)

multi_agent.py:597 (_run_wave_health_checks) is now the only PipelineHealthContext construction site that omits state_store. No check currently reads state_store from the context, so this is not a bug. Noting for completeness — if a future check relies on state_store, this call site will need updating.

— Authored by egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Re-Verification (Incremental)

Previous review: commit b552372f — approved
Current HEAD: commit 93376fa3
Delta: 1 commit ("Address non-blocking review suggestions")

Changes Since Last Review

The delta consists of a single commit with 4 files changed (14 additions, 13 deletions):

  1. orchestrator/container_monitor.py — Renamed _run_health_checks_on_change()_run_runtime_tick_checks(). Pure rename across method definition, docstring reference, and call site. The method now runs once per poll cycle (fixed in b552372f), and the new name accurately reflects that behavior.

  2. orchestrator/routes/phases.py — Added missing state_store=store parameter to PipelineHealthContext constructor in advance_phase(). This aligns with routes/health.py (line 149) and container_monitor.py which both already pass state_store. The parameter is optional in the constructor (defaults to None), so this is a consistency improvement, not a bug fix.

  3. orchestrator/tests/test_health_check_integration.py — Updated 5 test references from _run_health_checks_on_change to _run_runtime_tick_checks. No logic changes.

  4. orchestrator/tests/test_health_check_lifecycle_integration.py — Updated 4 test references from _run_health_checks_on_change to _run_runtime_tick_checks. No logic changes.

Contract Compliance

The contract for issue #850 has no formal acceptance_criteria entries (empty array). The prior contract review verified against the issue's 6 scope items:

# Scope Item Status
1 HealthCheck, PipelineHealthContext, HealthResult interfaces Still verified — no changes to types.py or context.py interfaces
2 Migrate startup reconciliation + container monitor Still verified — adapter checks unchanged
3 New Tier 1 checks (phase output, state consistency) Still verified — check implementations unchanged
4 Tier 2 agent inspector Still verified — AgentInspectorCheck unchanged
5 Lifecycle trigger wiring Still verified — trigger points unchanged; state_store addition in phases.py is a parameter consistency fix, not a behavioral change
6 Event system integration + observability Still verified — event emission unchanged

Regression Check

  • No previously verified scope items are broken by this delta
  • The method rename is fully consistent (all references updated, including tests)
  • The state_store parameter addition is additive and matches the existing pattern

Verdict

No contract violations. All 6 scope items from issue #850 remain fully implemented and verified. The delta addresses non-blocking code quality suggestions with no functional or behavioral changes.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box
Copy link
Contributor Author

Addressed the non-blocking observation from the re-review in commit eb16943:

Missing state_store in multi_agent.py: Added state_store=get_state_store(self.repo_path) to the PipelineHealthContext construction in _run_wave_health_checks(). All four production call sites (cli.py, container_monitor.py, routes/health.py, routes/phases.py, and now multi_agent.py) are now consistent in passing state_store.

All 420 health check tests and 1141 orchestrator tests pass (excluding 1 pre-existing docker_client test failure).

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No agent-mode design concerns.

The delta since my last review (93376fa) is a single line adding state_store=get_state_store(self.repo_path) to PipelineHealthContext in multi_agent.py:603, closing the consistency gap noted in the previous code review. No agent-mode design impact. My previous assessment stands: this is orchestrator infrastructure with a narrow diagnostic LLM call, not an interactive agent workflow.

— Authored by egg

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review — PR #864: Two-Tier Health Check Framework

Reviewed delta: 93376fa..eb16943 (1 commit: "Add state_store to PipelineHealthContext in multi_agent.py").

Previous Non-Blocking Observation — Verification

Missing state_store in multi_agent.py: Fixed correctly. state_store=get_state_store(self.repo_path) added to PipelineHealthContext at multi_agent.py:603. The get_state_store import was already present at line 41. StateStore.__init__ is lightweight (no I/O, just path storage), so constructing a new instance per wave health check is acceptable.

All five production PipelineHealthContext construction sites are now consistent:

Call site docker_client state_store
cli.py:183 dc startup_store
container_monitor.py:169 self.docker_client store
multi_agent.py:603 self.docker_client get_state_store(self.repo_path)
routes/health.py:145 dc store
routes/phases.py:273 dc store

New Changes — No Issues Found

The commit is a single-line, narrowly-scoped fix. No new logic, no unrelated modifications. The remaining files in the commit are .egg-state data and .dockerignore — not code under review.

No remaining blocking or non-blocking issues.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Verification: Re-review (Incremental)

Previous review: commit 93376fa3 — approved
Current review: commit eb169432
Delta: 1 file changed, 1 line added

Delta Analysis

Single commit since last review:

  • eb169432 — "Add state_store to PipelineHealthContext in multi_agent.py"

The change adds state_store=get_state_store(self.repo_path) to the PipelineHealthContext construction in _run_wave_health_checks(). This was the only call site across the codebase that omitted the state_store parameter. All other construction sites (routes/phases.py, routes/health.py, cli.py, container_monitor.py) already pass it. This is a correctness fix.

Scope Verification (Issue #850)

All 6 scope items remain fully satisfied:

# Scope Item Status
1 HealthCheck protocol, PipelineHealthContext, HealthResult interfaces Verified
2 Migrate startup reconciliation + container monitor to interface Verified
3 New Tier 1 checks: phase output presence, state consistency Verified
4 Tier 2 agent inspector: context assembly + LLM + structured verdict Verified
5 Trigger points: STARTUP, RUNTIME_TICK, WAVE_COMPLETE, PHASE_COMPLETE Verified
6 Health check events via SSE + on-demand REST endpoint Verified

Regression Check

  • The new commit does not break any previously verified criteria
  • 419 health check tests pass (no change from previous review)
  • The state_store addition is consistent with all other PipelineHealthContext call sites

Verdict: Approved (no blocking issues)

The delta is a single-line correctness fix. All scope items hold. No regressions.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

Automated feedback loop has reached the maximum of 3 rounds. Human review is needed to make further progress on this PR.

@james-in-a-box
Copy link
Contributor Author

egg contract-verification completed. View run logs

16 previous review(s) hidden.

@jwbron jwbron merged commit 59028e5 into main Feb 22, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Holistic pipeline failure detection: programmatic checks + agent inspector

1 participant