Skip to content

Agent anchor mechanism for post-compaction state recovery #1032

@james-in-a-box

Description

@james-in-a-box

Summary

Define a persistent "anchor" mechanism that ensures agents maintain coherent working state after context compaction. When an agent's context window is compressed, it risks losing track of its current task, decisions made with other agents, and progress markers. This mechanism provides a structured recovery point.

Motivation

In the async team model (#1027, #1028, #1030), agents run longer sessions, coordinate via messaging, and make incremental progress across complex tasks. Context compaction is inevitable in long-running sessions, and without a recovery mechanism, compacted agents can:

  • Lose task focus — forget what sub-task they're working on or repeat completed work
  • Forget decisions — re-open questions already resolved with other agents (e.g., "coder and tester agreed on approach B" gets lost)
  • Lose coordination state — miss that they're waiting on another agent, or that another agent is waiting on them
  • Repeat mistakes — re-attempt approaches that already failed
  • Break consensus — act contrary to team agreements they no longer remember

The existing egg-contract system is a precursor but is phase-scoped and pipeline-scoped. The new team model needs something more granular and agent-scoped.

Proposed Design

Agent Anchor File

Each running agent maintains a structured state file that serves as its post-compaction recovery point:

.egg-state/agent-anchors/<agent-id>.yaml

Contents (example):

agent_id: coder-abc123
role: coder
team: issue-432
task: "Fix auth bypass in gateway/auth.py"
spawned_by: liaison-xyz789

status: in_progress
progress:
  - completed: "Identified root cause in token validation"
  - completed: "Fixed validate_token() to check expiry"
  - current: "Updating error handling for expired tokens"
  - pending: "Notify tester that fix is ready for coverage"

decisions:
  - with: tester-def456
    decided: "Use parametrized tests for token edge cases"
    timestamp: "2026-03-11T14:30:00Z"
  - with: mediator-ghi789
    decided: "Approach B (strict validation) over approach A (lenient)"
    timestamp: "2026-03-11T14:45:00Z"

waiting_on: []
blocked_by: []

files_modified:
  - gateway/auth.py
  - gateway/token_utils.py

key_context:
  - "Token validation was skipping expiry check when token had admin scope"
  - "Must maintain backward compatibility with v1 tokens"

Update Triggers

The anchor file is updated:

Post-Compaction Injection

After context compaction, the anchor file is injected into the agent's context (similar to how CLAUDE.md is always loaded). This gives the agent enough state to continue coherently without re-reading its entire conversation history.

Team-Level Anchor

The mediator (or orchestrator, in automated flows) maintains a team-level anchor:

.egg-state/agent-anchors/team-<team-id>.yaml

This tracks:

  • Which agents are active and their current status
  • Team-level decisions and consensus state
  • Cross-agent dependencies and handoff status
  • Escalation history

Key Design Questions

  1. Format: YAML (human-readable) vs JSON (machine-parseable) vs both?
  2. Size budget: Anchors must be small enough to inject post-compaction without consuming too much of the refreshed context window. What's the max size? (Proposal: 2KB per agent anchor, 4KB for team anchor)
  3. Update mechanism: Should agents update their own anchors (self-report), or should the orchestrator/message bus update them (observed state)? Likely both.
  4. Conflict resolution: If an agent's in-memory state diverges from its anchor (e.g., anchor says "waiting on tester" but the tester already responded before compaction), how is this reconciled?
  5. Anchor cleanup: When should anchors be deleted? On agent termination? On team completion? Retained for checkpoint/audit?
  6. Gateway enforcement: Should the gateway enforce that agents can only write their own anchor file?

Dependencies

Success Criteria

  • Agents maintain structured state files that are updated as work progresses
  • After context compaction, agents can resume coherently using their anchor file
  • Team-level anchors provide the mediator with a consistent view of team state
  • Anchor files are small enough to inject without significant context cost
  • No duplicate work or contradictory decisions after compaction events
  • Anchor contents are included in checkpoints for audit/debugging

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions