Skip to content

Propagate checkpoint metadata through session creation#701

Merged
jwbron merged 2 commits intomainfrom
egg/propagate-checkpoint-metadata
Feb 15, 2026
Merged

Propagate checkpoint metadata through session creation#701
jwbron merged 2 commits intomainfrom
egg/propagate-checkpoint-metadata

Conversation

@james-in-a-box
Copy link
Contributor

Summary

  • Propagate issue_number, pipeline_id, and agent_role from environment variables through session creation so checkpoint metadata is available for linking and correlation
  • Add validation for the new fields in the gateway (type checks, length limits)
  • Thread the parameters through runtime.pygateway.py client → gateway.py server → session_manager.py

Test plan

  • Verify session creation with issue_number, pipeline_id, and agent_role populates checkpoint metadata
  • Confirm validation rejects invalid values (non-integer issue_number, oversized agent_role)
  • Confirm existing session creation without the new fields still works (all are optional)

Authored-by: egg

Issue, pipeline_id, and agent_role env vars are available in the launcher
but were never passed through to gateway session registration, leaving
checkpoint metadata fields always None.

Thread EGG_ISSUE_NUMBER, EGG_PIPELINE_ID, and EGG_AGENT_ROLE from the
launcher environment through create_session() and _setup_session_repos()
to the gateway's register_session(), which already stores them on the
Session dataclass.

Issue: none

Authored-by: egg
Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Propagate checkpoint metadata through session creation

Two bugs found — one causes data loss, the other can crash container startup.


Bug 1: pr_number not propagated through gateway session_create (data loss)

gateway/gateway.pysession_create()

The client side (sandbox/egg_lib/gateway.py:create_session) correctly sends pr_number in the request payload, and session_manager.register_session accepts it. But the gateway's session_create endpoint never extracts pr_number from the request data and never passes it to register_session.

The field is silently dropped — register_session receives pr_number=None even when the client sends a value. This means checkpoint metadata will be missing PR linkage.

The fix: add extraction, validation, and forwarding of pr_number in session_create(), matching the pattern used for issue_number:

pr_number = data.get("pr_number")  # Optional GitHub PR number

# Validate pr_number if provided
if pr_number is not None and (not isinstance(pr_number, int) or pr_number < 0):
    return make_error("Invalid pr_number: must be a non-negative integer")

And pass pr_number=pr_number to register_session() and include it in audit_log.

Similarly, _setup_session_repos in runtime.py does not accept or forward pr_number either. The full chain needs to be consistent.


Bug 2: Unhandled ValueError on malformed EGG_ISSUE_NUMBER (crash)

sandbox/egg_lib/runtime.py — lines 594 and 877

issue_number_str = os.environ.get("EGG_ISSUE_NUMBER")
issue_number = int(issue_number_str) if issue_number_str else None

If EGG_ISSUE_NUMBER is set to a non-integer value (e.g., "abc", "12.5", ""), int() raises ValueError and crashes run_claude / exec_in_new_container. Environment variables come from external configuration — they should be validated, not trusted.

This is duplicated in two places (lines 594 and 877). Fix both:

issue_number_str = os.environ.get("EGG_ISSUE_NUMBER")
try:
    issue_number = int(issue_number_str) if issue_number_str else None
except ValueError:
    issue_number = None

Or, since the gateway validates anyway, at minimum log a warning so misconfigurations are diagnosable.


Minor: issue_number=0 accepted as valid

The validation issue_number < 0 allows issue_number=0. GitHub issue numbers start at 1. This is unlikely to cause problems in practice but is worth noting — consider issue_number < 1 if you want strict validation.


Minor: _setup_session_repos missing pr_number parameter

_setup_session_repos accepts issue_number, pipeline_id, and agent_role but not pr_number. For consistency, if pr_number is part of the checkpoint metadata contract (it is — Session has the field, register_session accepts it, the client sends it), the runtime should also propagate it.


Summary

Issue Severity File
pr_number silently dropped in gateway Bug — data loss gateway/gateway.py
int() crash on bad EGG_ISSUE_NUMBER Bug — crash sandbox/egg_lib/runtime.py (x2)
issue_number=0 accepted Minor gateway/gateway.py
pr_number not threaded through _setup_session_repos / runtime Consistency gap sandbox/egg_lib/runtime.py

Please fix the two bugs before merging.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

Address review feedback on PR #701:

- Extract, validate, and forward pr_number through gateway
  session_create to register_session (was silently dropped)
- Wrap int() parsing of EGG_ISSUE_NUMBER in try/except ValueError
  in both run_claude and exec_in_new_container to prevent crash
  on malformed env vars
- Add EGG_PR_NUMBER parsing with same safe pattern
- Thread pr_number through _setup_session_repos for consistency
- Tighten issue_number/pr_number validation from < 0 to < 1
  since GitHub numbers start at 1
@james-in-a-box
Copy link
Contributor Author

All four issues addressed in c12a82a:

Bug 1 — pr_number not propagated through gateway session_create: Fixed. Added extraction, validation (< 1 for positive integers), and forwarding of pr_number to register_session() and audit_log.

Bug 2 — Unhandled ValueError on malformed EGG_ISSUE_NUMBER: Fixed in both run_claude and exec_in_new_container. Wrapped int() in try/except, logs a warning and falls back to None. Applied the same pattern for EGG_PR_NUMBER.

Minor — issue_number=0 accepted: Changed validation from < 0 to < 1 for both issue_number and pr_number. Error message updated to "must be a positive integer".

Minor — _setup_session_repos missing pr_number: Added pr_number parameter to function signature, docstring, and create_session() call. Both run_claude and exec_in_new_container now read EGG_PR_NUMBER and pass it through.

All existing tests pass (214 gateway, 74 sandbox, 91 edge/error path).

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: All issues resolved

Verified the delta from 0953c76 to c12a82a against all four issues from the previous review.

Bug 1: pr_number not propagated through gateway session_create — Fixed

pr_number is now extracted from the request (data.get("pr_number")), validated, passed to register_session(), and included in audit_log. The full chain is consistent: client → gateway endpoint → session manager → persistence → deserialization.

Bug 2: Unhandled ValueError on malformed EGG_ISSUE_NUMBER — Fixed

Both run_claude and exec_in_new_container now wrap int() calls in try/except with warn() fallback to None. The same defensive pattern is applied to the new EGG_PR_NUMBER env var. warn is properly imported (line 59).

Minor: issue_number=0 accepted — Fixed

Validation changed from < 0 to < 1 for both issue_number and pr_number. Error messages updated to "must be a positive integer".

Minor: _setup_session_repos missing pr_number — Fixed

pr_number added to function signature, docstring, and forwarded to create_session(). Both run_claude and exec_in_new_container read EGG_PR_NUMBER and pass it through the full call chain.

No new issues found in the delta.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

egg review completed. View run logs

3 previous review(s) hidden.

@jwbron jwbron merged commit 599c934 into main Feb 15, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant