Conversation
Container ca452dbd3bfc6ab23bd9acd73a27ba46fff63c87ea61dbde429e68bfbc13ac85 exited with uncommitted changes. This commit preserves the agent's work-in-progress. Authored-by: egg
Container d365774501d9ad0bef9c23971a9099327cac2bd36473d9ac2795a355e2c32ffe exited with uncommitted changes. This commit preserves the agent's work-in-progress. Authored-by: egg
Implements the core health check framework and all Tier 1 programmatic
checks for holistic pipeline failure detection. This unifies existing
failure detection mechanisms (ContainerMonitor, startup reconciliation)
under a shared HealthCheck protocol and adds two new checks — phase
output presence and state consistency — that would have caught issue-835
where agents exited cleanly but produced no meaningful work.
New files:
- orchestrator/health_checks/ — HealthCheck protocol, HealthResult,
HealthCheckRunner with trigger dispatch and Tier 1→2 escalation,
PipelineHealthContext with lazy properties
- orchestrator/health_checks/tier1/ — ContainerLivenessCheck,
StartupStateCheck (adapters), PhaseOutputPresenceCheck,
StateConsistencyCheck (new)
- orchestrator/tests/test_health_check_types.py — 27 tests for core
types, runner, context, and escalation logic
- orchestrator/tests/test_health_check_tier1.py — 26 tests for all
four Tier 1 checks
Modified files:
- events.py — HEALTH_CHECK_STARTED/COMPLETED/DEGRADED/FAILED events
- models.py — HealthCheckResultModel, health_check_results on PhaseExecution
- cli.py — initialize runner at startup, run STARTUP trigger
- container_monitor.py — RUNTIME_TICK integration on state changes
- multi_agent.py — WAVE_COMPLETE trigger after each wave
- routes/phases.py — PHASE_COMPLETE trigger before advancement
- routes/health.py — GET /api/v1/pipelines/{id}/health endpoint
Issue: #850
Document the health check framework introduced in #850 (phase 1): - Add health_checks/README.md with full framework reference covering types, context, runner, all Tier 1 checks, integration points, and instructions for adding new checks - Update orchestrator/README.md with health check section, new API endpoint, file tree entries, and event types - Update docs/architecture/orchestrator.md with health check framework architecture, lifecycle integration, and lazy context design - Update docs/index.md task-specific guide table with health check entry
- Fix _run_single to return DEGRADED/ALERT on unexpected exception instead of HEALTHY/CONTINUE (issue #1: correctness) - Wrap health checks in 'if not force' in advance_phase so force parameter bypasses health checks consistently (issue #2: design) - Remove unused HealthCheckResultModel and health_check_results field from PhaseExecution (issue #3: dead code) - Replace deprecated datetime.utcnow() with datetime.now(UTC) in health check types and context (observation #4) - Add debug logging to bare except in container_monitor per-pipeline health check loop (observation #6)
Add missing docstrings to PipelineHealthContext.__init__() and cheap accessor properties. Add docstrings to HealthCheckRunner _emit_* methods and inline comments explaining aggregate status logic, graceful degradation strategy, and tier escalation at each lifecycle integration point (cli.py, container_monitor.py, multi_agent.py, routes/phases.py, routes/health.py). Expand events.py inline comments for new HEALTH_CHECK event types. Authored-by: egg
Auto-fix 10 ruff lint errors: remove unused imports (datetime, HealthCheckRunner, worst_action, HealthTrigger, AgentExecution, AgentExecutionStatus, AgentRole, UTC), sort imports, replace assert False with raise AssertionError (B011). Reformat 3 files for consistent code style. Update check results.
Implement the Tier 2 semantic health check system for pipeline observability. This adds an AgentInspectorCheck that calls the Claude API to analyze agent state (git log, diff stats, agent outputs, contract) and returns structured HEALTHY/DEGRADED/FAILED verdicts. Key changes: - Enrich PipelineHealthContext with contract lazy property and diff truncation (~4000 token cap) - Implement AgentInspectorCheck with httpx-based Claude API calls, JSON verdict parsing, timeout/retry, and graceful degradation - Wire AgentInspectorCheck into runner via cli.py registration - Add 61 comprehensive unit tests covering context assembly, prompt construction, verdict parsing, API mocking, escalation logic, event emission, and edge cases All 346 health check tests pass. Lint clean.
There was a problem hiding this comment.
Thorough Code Review — PR #864: Two-Tier Health Check Framework
Overall the framework design is solid: clean protocol types, lazy context, tiered escalation, and good graceful degradation. The test coverage at 305+ tests is substantial. However, there are several correctness and integration issues that should be addressed before merge.
Issue 1 (Bug): docker_client not passed to PipelineHealthContext in three call sites
Severity: High — Causes ContainerLivenessCheck and StartupStateCheck to silently return HEALTHY always.
_run_wave_health_checks() in multi_agent.py:597, the on-demand endpoint in routes/health.py:138, and the phase-advance gate in routes/phases.py:266 all construct PipelineHealthContext without passing docker_client. When docker_client is None, _fetch_live_container_ids() returns an empty set, so:
ContainerLivenessCheckalways finds "no missing containers" → reports HEALTHYStartupStateCheckalways finds "no stale containers" → reports HEALTHYStateConsistencyCheckCheck 1 never detects missing containers → misses FAILED state
The CLI startup path (cli.py:187) correctly passes docker_client=dc, and the container monitor path (container_monitor.py:173) correctly passes docker_client=self.docker_client. But the three integration points above do not.
Fix for multi_agent.py:597:
ctx = PipelineHealthContext(
pipeline=self.pipeline,
repo_path=self.repo_path,
trigger=HealthTrigger.WAVE_COMPLETE.value,
wave_number=wave_number,
docker_client=self.docker_client, # Add this
)Fix for routes/health.py:138:
try:
from docker_client import get_docker_client
dc = get_docker_client()
except Exception:
dc = None
ctx = PipelineHealthContext(
pipeline=pipeline,
repo_path=Path(repo_path),
trigger=HealthTrigger.ON_DEMAND.value,
state_store=store,
docker_client=dc, # Add this
)Fix for routes/phases.py:266: Same pattern as above.
Issue 2 (Bug): _read_contract truncates JSON before parsing, guaranteeing parse failure for large contracts
context.py:212:
raw = contract_path.read_text(errors="replace")[:_TIER2_CHAR_CAP]
return json.loads(raw)If the contract JSON is longer than 16000 chars, this slices mid-string, producing invalid JSON (e.g. {"tasks": [{"sta). json.loads will raise JSONDecodeError every time, and the method silently returns {}. The truncation is intended for Tier 2 prompt size control but should happen after parsing, not before.
Fix:
raw = contract_path.read_text(errors="replace")
contract = json.loads(raw)
# Truncation should happen in _build_user_prompt, not here
return contractAlternatively, truncate the serialized output only when building the Tier 2 prompt, not when reading the data. Tier 1 checks (StateConsistencyCheck._check_contract_consistency) also read this data and need the full structure.
Issue 3 (Bug): _parse_verdict crashes on code fence without newline
agent_inspector.py:134:
if cleaned.startswith("```"):
first_newline = cleaned.index("\n")str.index() raises ValueError if no \n is present. If Claude returns ```json{...}``` on a single line, this crashes. The outer except Exception in AgentInspectorCheck.run() catches it and gracefully degrades, so it's not catastrophic, but it means a valid (parseable) response gets discarded as an API failure.
Fix: Use cleaned.find("\n") and handle the -1 case, or use a regex:
if cleaned.startswith("```"):
nl = cleaned.find("\n")
if nl != -1:
cleaned = cleaned[nl + 1:]
else:
cleaned = cleaned[3:] # Strip just the backticksIssue 4 (Design): Duplicate HEALTH_CHECK_COMPLETED events
runner.py emits HEALTH_CHECK_COMPLETED in two places:
_emit_result()at line 194 emits it for each individual check that returns HEALTHY_emit_completed()at line 226 emits it as the aggregate completion event
Both use EventType.HEALTH_CHECK_COMPLETED with different payload shapes. Any subscriber filtering on this event type will receive both individual and aggregate events with no way to distinguish them.
Fix: Either use a distinct event type for aggregate completion (e.g. HEALTH_CHECK_RUN_COMPLETED), or add a discriminator field (e.g. "aggregate": true) to the aggregate event payload.
Issue 5 (Design): Tier 2 FAILED verdict cannot trigger FAIL_PIPELINE
AgentInspectorCheck.run() at line 264 maps non-HEALTHY verdicts to HealthAction.ALERT:
action=HealthAction.ALERT if status != HealthStatus.HEALTHY else HealthAction.CONTINUE,Even when the LLM returns FAILED (clear signs of stuck agent), the action is only ALERT, never FAIL_PIPELINE. Since the phase-advance gate in routes/phases.py:272 and the wave loop in multi_agent.py:604 both check for FAIL_PIPELINE specifically, a Tier 2 FAILED verdict has no mechanical effect — it only emits an event. This seems intentional (Tier 2 is advisory), but it undermines the stated goal of catching issue-835 patterns. Consider mapping HealthStatus.FAILED from Tier 2 to HealthAction.FAIL_PIPELINE.
Issue 6 (Minor): _run_health_checks_on_change in container_monitor runs on every state change per-container
container_monitor.py:200 calls _run_health_checks_on_change() inside _check_container(), which iterates over ALL running pipelines. If 10 containers change state in one poll cycle, health checks run 10 times against all pipelines. This should probably run once at the end of _check_all_containers() instead of per-container.
Issue 7 (Minor): HealthResult.timestamp serialization appends "Z" to already-UTC datetime
types.py:84:
"timestamp": self.timestamp.isoformat() + "Z",datetime.now(UTC) produces timezone-aware datetimes whose isoformat() already includes +00:00. Appending Z produces 2024-01-15T12:00:00+00:00Z which is not valid ISO 8601. Either use .isoformat() alone (with +00:00), or strip timezone info and append Z:
"timestamp": self.timestamp.replace(tzinfo=None).isoformat() + "Z",Summary
Issues 1–3 are correctness bugs that should be fixed. Issue 1 in particular renders the container liveness checks non-functional in most code paths. Issue 2 breaks contract parsing for larger projects. Issue 3 is a low-probability crash with graceful fallback but is trivial to fix.
Issues 4–7 are design and minor issues worth addressing but not blockers on their own.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Contract Verification: Issue #850 — Holistic Pipeline Failure Detection
The contract JSON for this issue has empty acceptance_criteria and phases fields. Verification is performed against the 6 scope items defined in the issue body, which is the authoritative source.
Scope Item Verification
1. Define HealthCheck, PipelineHealthContext, HealthResult interfaces — VERIFIED
HealthCheckprotocol defined inorchestrator/health_checks/types.pywithname,tier,triggers,run()methodHealthResultfrozen dataclass withstatus(HEALTHY/DEGRADED/FAILED),check_name,tier,reasoning,action(CONTINUE/FAIL_PIPELINE/ALERT),details,timestampPipelineHealthContextincontext.pywith cheap accessors (pipeline_id,branch,current_phase) and lazy-loaded expensive properties (git_log,git_diff_stat,agent_outputs,contract,live_container_ids)- Clean protocol-based design — no inheritance required
2. Migrate startup reconciliation and container monitor to interface — VERIFIED
StartupStateCheck(tier1/startup_state.py) adaptsreconcile_stale_containerslogic as a HealthCheckContainerLivenessCheck(tier1/container_liveness.py) adapts container monitor logic as a HealthCheck- Original modules preserved:
startup_reconciliation.pyuntouched,container_monitor.pyhas minimal non-intrusive hooks only (set_health_check_runner, optional_run_health_checks_on_change)
3. Implement new Tier 1 checks: phase output presence, state consistency — VERIFIED
PhaseOutputPresenceCheck(tier1/phase_output.py): Detects issue-835 pattern — agents completed but no commits on remote. Phase-specific logic for IMPLEMENT (checks git commits) and PLAN (checks draft artifacts). Returns DEGRADED/ALERT.StateConsistencyCheck(tier1/state_consistency.py): Cross-references orchestrator state vs Docker reality vs contract. Three checks: RUNNING agents with missing Docker containers, container/agent status mismatch, COMPLETE agents with PENDING tasks.
4. Implement Tier 2 agent inspector — VERIFIED
AgentInspectorCheck(tier2/agent_inspector.py): Calls Claude API with assembled context (git log, diff stats, agent outputs, contract)- Produces structured JSON verdict (
HEALTHY/DEGRADED/FAILED) with reasoning - Graceful degradation: API failures return HEALTHY with warning — Tier 2 never blocks pipeline
- Advisory-only: never returns FAIL_PIPELINE, only ALERT or CONTINUE
- Token budgeting: per-file cap (4000 chars), contract cap (3000 chars), diff stat cap (~16KB)
5. Wire trigger points into lifecycle — VERIFIED
Trigger matrix matches issue specification exactly:
| Trigger | Location | Tier 1 | Tier 2 | Action on FAIL_PIPELINE |
|---|---|---|---|---|
| STARTUP | cli.py |
Always | Never | Logged, non-fatal |
| RUNTIME_TICK | container_monitor.py |
Always | Never | Logged, non-fatal |
| WAVE_COMPLETE | multi_agent.py |
Always | If Tier 1 DEGRADED | Stops wave execution |
| PHASE_COMPLETE | routes/phases.py |
Always | Always | Blocks advance (409) |
| ON_DEMAND | routes/health.py |
Always | Always | Returns results (200) |
force=true parameter bypasses health checks in phase advance (escape hatch for stuck pipelines).
6. Surface health check results via SSE/event system — VERIFIED
- 5 new event types in
events.py:HEALTH_CHECK,HEALTH_CHECK_STARTED,HEALTH_CHECK_COMPLETED,HEALTH_CHECK_DEGRADED,HEALTH_CHECK_FAILED HealthCheckRunneremits per-check and aggregate events viaEventBus- Events flow through existing SSE infrastructure for observability
- New REST endpoint:
GET /api/v1/pipelines/<id>/healthfor on-demand inspection
Test Coverage
- 419 health check tests across 10 test files, all passing
- 1146 orchestrator tests passing (3 pre-existing docker_client failures unrelated to this PR)
- Lint: ruff check and format pass clean
- Test files cover: types, context, all 4 Tier 1 checks, Tier 2 inspector, runner dispatch/escalation, lifecycle integration, and edge cases
Code Review Feedback Status
- Phase 1 review feedback fully addressed: exception handling returns DEGRADED (not HEALTHY), force bypass works, dead code removed, datetime.now(UTC) adopted
- Phase 2 review approved with non-blocking minor suggestions (timestamp ISO format, RUNTIME_TICK debouncing, event type ambiguity)
Non-Blocking Observations
-
Model config inconsistency: Default model is
claude-sonnet-4-20250514but shared config definesANTHROPIC_HEALTH_CHECK_MODEL = "claude-3-haiku-20240307". The env varHEALTH_CHECK_MODELoverrides at runtime, but the disconnect between the hard-coded default and shared config could cause cost surprises. -
Timestamp ISO format:
HealthResult.to_dict()appends'Z'to a timezone-aware timestamp, producing malformed+00:00Zsuffix. Low impact but worth fixing. -
RUNTIME_TICK frequency: Health checks fire per-container state change, not per-poll-cycle. With many containers, this could cause redundant checks. Consider debouncing.
Verdict: APPROVE
All 6 scope items from issue #850 are fully implemented and verified. The two-tier health check framework correctly implements the proposed design with proper trigger integration, tier escalation, graceful degradation, and comprehensive test coverage. The issue-835 failure pattern (agents exit cleanly but produce no work) would now be caught by PhaseOutputPresenceCheck at Tier 1 and AgentInspectorCheck at Tier 2.
— Authored by egg
There was a problem hiding this comment.
No agent-mode design concerns.
This PR introduces orchestrator-level infrastructure (health check framework), not agent workflows. The Tier 2 agent inspector is a narrow, single-purpose LLM call for pipeline diagnostics — not a general agent with tool access — so the pre-assembled context and structured JSON output are appropriate patterns here. The agent-mode design guidelines target interactive agent workflows where the agent has tools and should explore freely; a diagnostic API call with a focused verdict protocol is a different category entirely.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- Pass docker_client to PipelineHealthContext in multi_agent.py, routes/health.py, and routes/phases.py so container liveness and startup state checks work in all code paths (Issue 1) - Remove pre-parse truncation in _read_contract so large contracts parse correctly; truncation stays in prompt building (Issue 2) - Use str.find instead of str.index in _parse_verdict to avoid ValueError on code fences without newlines (Issue 3) - Add "aggregate": true discriminator to aggregate HEALTH_CHECK_COMPLETED events to distinguish from per-check events (Issue 4) - Move _run_health_checks_on_change from per-container to once per poll cycle in _check_all_containers (Issue 6) - Fix timestamp serialization to strip tzinfo before appending Z, producing valid ISO 8601 (Issue 7)
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-Review — PR #864: Two-Tier Health Check Framework
Reviewed delta: faa962e..b552372 (1 commit addressing 6 of 7 review issues).
Previous Issues — Verification
Issue 1 (docker_client not passed): Fixed correctly in all three call sites. multi_agent.py:602 passes self.docker_client. routes/health.py:140 and routes/phases.py:270 both acquire it via get_docker_client() with try/except for graceful degradation. Tests updated to set docker_client = None on the executor, matching the real initialization fallback path.
Issue 2 (_read_contract truncation): Fixed correctly. The pre-parse [:_TIER2_CHAR_CAP] slice removed from context.py:212. Full JSON is now parsed; downstream prompt-building caps individual fields. Test updated to verify large contracts parse successfully.
Issue 3 (_parse_verdict crash): Fixed correctly. str.index("\n") replaced with str.find("\n") and the -1 case handled by stripping just the backtick prefix. No test added for the single-line code fence path, but it degrades correctly (falls through to json.loads which handles or rejects appropriately).
Issue 4 (duplicate HEALTH_CHECK_COMPLETED): Fixed correctly. Aggregate event now includes "aggregate": True discriminator. Per-check events use result.to_dict() which has no such key, so subscribers can distinguish them.
Issue 5 (Tier 2 FAILED → FAIL_PIPELINE): Author disagreed and kept Tier 2 as advisory-only. The reasoning is sound — LLM inference can produce false positives, and gating pipeline termination on operational confidence before promoting the action severity is the right call for a new framework.
Issue 6 (per-container health checks): Fixed correctly. _run_health_checks_on_change() moved from _check_container() (per-container) to end of _check_all_containers() (once per poll cycle). The method runs every cycle regardless of state changes, which is acceptable for RUNTIME_TICK semantics.
Issue 7 (timestamp serialization): Fixed correctly. .replace(tzinfo=None).isoformat() + "Z" produces valid ISO 8601 like 2024-01-15T12:00:00Z.
New Changes — No Issues Found
The fix commit is clean and minimal. No new logic beyond the targeted fixes. Test updates properly mirror the code changes (adding docker_client = None to test executors, updating the contract truncation test assertion).
Non-Blocking Observations
-
Method naming:
_run_health_checks_on_change()now runs every poll cycle, not just on state changes. The docstring was correctly updated ("at the end of each monitor poll cycle"), but the method name is slightly misleading. Consider renaming to_run_runtime_tick_checks()for clarity. -
Missing
state_storeinroutes/phases.py: ThePipelineHealthContextinroutes/phases.py:274does not passstate_store, whileroutes/health.py:148does. Currently no Tier 1 check usesstate_storedirectly from the context, so this is not a bug, but it's an inconsistency that could bite if a future check expects it.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Contract Re-Verification: Issue #850 — Incremental Review
Previous review: commit faa962e — all 6 scope items verified.
This review: commit b552372 — 1 new commit since last review.
Delta Analysis
The new commit addresses 6 issues from the code review (egg-reviewer's CHANGES_REQUESTED review). Each fix verified:
| Issue | Description | Fix | Status |
|---|---|---|---|
| 1 | docker_client not passed to PipelineHealthContext in 3 call sites |
Added docker_client param in multi_agent.py, routes/health.py, routes/phases.py with graceful degradation |
Correct |
| 2 | _read_contract truncated before JSON parsing, corrupting large contracts |
Removed pre-parse [:_TIER2_CHAR_CAP] slice; truncation now happens in prompt building |
Correct |
| 3 | _parse_verdict used str.index() which raises ValueError on missing newline |
Replaced with str.find() + explicit -1 check |
Correct |
| 4 | Aggregate and per-check HEALTH_CHECK_COMPLETED events indistinguishable |
Added "aggregate": True to aggregate event payload |
Correct |
| 6 | RUNTIME_TICK health checks fired per-container instead of per-poll-cycle |
Moved _run_health_checks_on_change() from _check_container() to _check_all_containers() |
Correct |
| 7 | Timezone-aware timestamp produced malformed ISO 8601 (+00:00Z) |
timestamp.replace(tzinfo=None).isoformat() + "Z" strips tzinfo before appending Z |
Correct |
Tests updated in test_health_check_integration.py, test_health_check_lifecycle_integration.py, and test_health_check_tier2.py to match new behavior (docker_client param, contract parsing assertion).
Regression Check
All 325 health check tests pass (168 core + 157 tester-authored). No scope items regressed.
Scope Item Re-Verification
All 6 scope items from issue #850 remain fully satisfied:
- HealthCheck/PipelineHealthContext/HealthResult interfaces — Unchanged, verified.
- Startup reconciliation and container monitor migration — Unchanged, verified.
- New Tier 1 checks (phase output, state consistency) — Unchanged, verified.
- Tier 2 agent inspector —
_read_contractand_parse_verdictfixes improve correctness without changing scope coverage. Verified. - Trigger point wiring —
docker_clientnow correctly passed at all 3 call sites (multi_agent, routes/health, routes/phases). Health checks run once per poll cycle. Verified. - SSE/event system surfacing — Aggregate events now distinguishable via
"aggregate": True. Verified.
Verdict
Approved. The delta correctly addresses all flagged review issues without introducing regressions or contract violations.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Rename _run_health_checks_on_change() to _run_runtime_tick_checks() since the method now runs every poll cycle, not just on state changes. Add missing state_store parameter to PipelineHealthContext in routes/phases.py for consistency with routes/health.py.
|
Addressed both non-blocking suggestions in commit 93376fa:
All 430 health check tests pass. Ruff check and format clean. — Authored by egg |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
No agent-mode design concerns.
The delta since my last review (b552372) is a single commit (93376fa) that renames _run_health_checks_on_change() to _run_runtime_tick_checks() and adds a missing state_store parameter in routes/phases.py. These are minor naming and consistency fixes with no impact on agent-mode design. My previous assessment stands: this is orchestrator-level infrastructure with a narrow diagnostic LLM call, not an interactive agent workflow.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-Review — PR #864: Two-Tier Health Check Framework
Reviewed delta: b552372..93376fa (1 commit addressing 2 non-blocking suggestions).
Previous Non-Blocking Suggestions — Verification
Suggestion 1 (method naming): Fixed correctly. _run_health_checks_on_change() renamed to _run_runtime_tick_checks() in container_monitor.py:148. The call site in _check_all_containers() at line 235 updated. Docstring on set_health_check_runner() at line 137 also updated to reference the new name. All 8 test references across test_health_check_integration.py and test_health_check_lifecycle_integration.py updated consistently. No stale references to the old name remain in code or tests.
Suggestion 2 (missing state_store in routes/phases.py): Fixed correctly. state_store=store added to PipelineHealthContext at routes/phases.py:278, matching the pattern in routes/health.py:149.
New Changes — No Issues Found
The commit is clean and narrowly scoped to the two requested changes. No new logic, no unrelated modifications.
Minor Observation (Non-Blocking)
multi_agent.py:597 (_run_wave_health_checks) is now the only PipelineHealthContext construction site that omits state_store. No check currently reads state_store from the context, so this is not a bug. Noting for completeness — if a future check relies on state_store, this call site will need updating.
— Authored by egg
There was a problem hiding this comment.
Contract Re-Verification (Incremental)
Previous review: commit b552372f — approved
Current HEAD: commit 93376fa3
Delta: 1 commit ("Address non-blocking review suggestions")
Changes Since Last Review
The delta consists of a single commit with 4 files changed (14 additions, 13 deletions):
-
orchestrator/container_monitor.py— Renamed_run_health_checks_on_change()→_run_runtime_tick_checks(). Pure rename across method definition, docstring reference, and call site. The method now runs once per poll cycle (fixed inb552372f), and the new name accurately reflects that behavior. -
orchestrator/routes/phases.py— Added missingstate_store=storeparameter toPipelineHealthContextconstructor inadvance_phase(). This aligns withroutes/health.py(line 149) andcontainer_monitor.pywhich both already passstate_store. The parameter is optional in the constructor (defaults toNone), so this is a consistency improvement, not a bug fix. -
orchestrator/tests/test_health_check_integration.py— Updated 5 test references from_run_health_checks_on_changeto_run_runtime_tick_checks. No logic changes. -
orchestrator/tests/test_health_check_lifecycle_integration.py— Updated 4 test references from_run_health_checks_on_changeto_run_runtime_tick_checks. No logic changes.
Contract Compliance
The contract for issue #850 has no formal acceptance_criteria entries (empty array). The prior contract review verified against the issue's 6 scope items:
| # | Scope Item | Status |
|---|---|---|
| 1 | HealthCheck, PipelineHealthContext, HealthResult interfaces |
Still verified — no changes to types.py or context.py interfaces |
| 2 | Migrate startup reconciliation + container monitor | Still verified — adapter checks unchanged |
| 3 | New Tier 1 checks (phase output, state consistency) | Still verified — check implementations unchanged |
| 4 | Tier 2 agent inspector | Still verified — AgentInspectorCheck unchanged |
| 5 | Lifecycle trigger wiring | Still verified — trigger points unchanged; state_store addition in phases.py is a parameter consistency fix, not a behavioral change |
| 6 | Event system integration + observability | Still verified — event emission unchanged |
Regression Check
- No previously verified scope items are broken by this delta
- The method rename is fully consistent (all references updated, including tests)
- The
state_storeparameter addition is additive and matches the existing pattern
Verdict
No contract violations. All 6 scope items from issue #850 remain fully implemented and verified. The delta addresses non-blocking code quality suggestions with no functional or behavioral changes.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Addressed the non-blocking observation from the re-review in commit eb16943: Missing All 420 health check tests and 1141 orchestrator tests pass (excluding 1 pre-existing docker_client test failure). — Authored by egg |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
No agent-mode design concerns.
The delta since my last review (93376fa) is a single line adding state_store=get_state_store(self.repo_path) to PipelineHealthContext in multi_agent.py:603, closing the consistency gap noted in the previous code review. No agent-mode design impact. My previous assessment stands: this is orchestrator infrastructure with a narrow diagnostic LLM call, not an interactive agent workflow.
— Authored by egg
There was a problem hiding this comment.
Re-Review — PR #864: Two-Tier Health Check Framework
Reviewed delta: 93376fa..eb16943 (1 commit: "Add state_store to PipelineHealthContext in multi_agent.py").
Previous Non-Blocking Observation — Verification
Missing state_store in multi_agent.py: Fixed correctly. state_store=get_state_store(self.repo_path) added to PipelineHealthContext at multi_agent.py:603. The get_state_store import was already present at line 41. StateStore.__init__ is lightweight (no I/O, just path storage), so constructing a new instance per wave health check is acceptable.
All five production PipelineHealthContext construction sites are now consistent:
| Call site | docker_client |
state_store |
|---|---|---|
cli.py:183 |
dc |
startup_store |
container_monitor.py:169 |
self.docker_client |
store |
multi_agent.py:603 |
self.docker_client |
get_state_store(self.repo_path) |
routes/health.py:145 |
dc |
store |
routes/phases.py:273 |
dc |
store |
New Changes — No Issues Found
The commit is a single-line, narrowly-scoped fix. No new logic, no unrelated modifications. The remaining files in the commit are .egg-state data and .dockerignore — not code under review.
No remaining blocking or non-blocking issues.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Contract Verification: Re-review (Incremental)
Previous review: commit 93376fa3 — approved
Current review: commit eb169432
Delta: 1 file changed, 1 line added
Delta Analysis
Single commit since last review:
eb169432— "Add state_store to PipelineHealthContext in multi_agent.py"
The change adds state_store=get_state_store(self.repo_path) to the PipelineHealthContext construction in _run_wave_health_checks(). This was the only call site across the codebase that omitted the state_store parameter. All other construction sites (routes/phases.py, routes/health.py, cli.py, container_monitor.py) already pass it. This is a correctness fix.
Scope Verification (Issue #850)
All 6 scope items remain fully satisfied:
| # | Scope Item | Status |
|---|---|---|
| 1 | HealthCheck protocol, PipelineHealthContext, HealthResult interfaces | Verified |
| 2 | Migrate startup reconciliation + container monitor to interface | Verified |
| 3 | New Tier 1 checks: phase output presence, state consistency | Verified |
| 4 | Tier 2 agent inspector: context assembly + LLM + structured verdict | Verified |
| 5 | Trigger points: STARTUP, RUNTIME_TICK, WAVE_COMPLETE, PHASE_COMPLETE | Verified |
| 6 | Health check events via SSE + on-demand REST endpoint | Verified |
Regression Check
- The new commit does not break any previously verified criteria
- 419 health check tests pass (no change from previous review)
- The
state_storeaddition is consistent with all otherPipelineHealthContextcall sites
Verdict: Approved (no blocking issues)
The delta is a single-line correctness fix. All scope items hold. No regressions.
— Authored by egg
|
Automated feedback loop has reached the maximum of 3 rounds. Human review is needed to make further progress on this PR. |
|
egg contract-verification completed. View run logs 16 previous review(s) hidden. |
Add two-tier health check framework for pipeline failure detection
The orchestrator's failure detection was a collection of one-off mechanisms
(startup reconciliation, container monitor, iteration caps) with no shared
abstraction and no ability to detect semantic failures — cases where
infrastructure looks healthy but agents didn't accomplish meaningful work.
The issue-835 post-mortem demonstrated this gap: reviewers exited cleanly
but had nothing to review because coder work never reached the remote.
This PR introduces a unified two-tier health check framework:
Tier 1 — Programmatic checks (fast, deterministic, cheap): container
liveness, startup state reconciliation, phase output presence, and state
consistency. These run on every lifecycle event and catch structural failures.
Tier 2 — Agent inspector (LLM-powered semantic analysis): evaluates
whether the pipeline is making meaningful progress by reasoning over agent
outputs, git history, and logs. Produces structured verdicts (HEALTHY /
DEGRADED / FAILED) with reasoning. Runs when Tier 1 flags DEGRADED or at
phase completion.
Both tiers implement a shared
HealthCheckprotocol and produceHealthResultvalues. A
HealthCheckRunnerdispatches checks by lifecycle trigger(STARTUP, RUNTIME_TICK, WAVE_COMPLETE, PHASE_COMPLETE, ON_DEMAND) and
handles automatic Tier 1 → Tier 2 escalation. Results are surfaced via the
existing SSE/event system and new REST endpoints.
Integration points:
multi_agent.pyruns checks at wave completion,routes/phases.pyruns checks at phase completion, androutes/health.pyexposes on-demand health check endpoints.Closes #850
Test plan:
Tier 1 checks, Tier 2 agent inspector, runner dispatch/escalation,
lifecycle integration, and edge cases
pytest orchestrator/tests/test_health_check_*.pypassesAuthored-by: egg