Propagate checkpoint metadata through session creation#701
Conversation
Issue, pipeline_id, and agent_role env vars are available in the launcher but were never passed through to gateway session registration, leaving checkpoint metadata fields always None. Thread EGG_ISSUE_NUMBER, EGG_PIPELINE_ID, and EGG_AGENT_ROLE from the launcher environment through create_session() and _setup_session_repos() to the gateway's register_session(), which already stores them on the Session dataclass. Issue: none Authored-by: egg
There was a problem hiding this comment.
Review: Propagate checkpoint metadata through session creation
Two bugs found — one causes data loss, the other can crash container startup.
Bug 1: pr_number not propagated through gateway session_create (data loss)
gateway/gateway.py — session_create()
The client side (sandbox/egg_lib/gateway.py:create_session) correctly sends pr_number in the request payload, and session_manager.register_session accepts it. But the gateway's session_create endpoint never extracts pr_number from the request data and never passes it to register_session.
The field is silently dropped — register_session receives pr_number=None even when the client sends a value. This means checkpoint metadata will be missing PR linkage.
The fix: add extraction, validation, and forwarding of pr_number in session_create(), matching the pattern used for issue_number:
pr_number = data.get("pr_number") # Optional GitHub PR number
# Validate pr_number if provided
if pr_number is not None and (not isinstance(pr_number, int) or pr_number < 0):
return make_error("Invalid pr_number: must be a non-negative integer")And pass pr_number=pr_number to register_session() and include it in audit_log.
Similarly, _setup_session_repos in runtime.py does not accept or forward pr_number either. The full chain needs to be consistent.
Bug 2: Unhandled ValueError on malformed EGG_ISSUE_NUMBER (crash)
sandbox/egg_lib/runtime.py — lines 594 and 877
issue_number_str = os.environ.get("EGG_ISSUE_NUMBER")
issue_number = int(issue_number_str) if issue_number_str else NoneIf EGG_ISSUE_NUMBER is set to a non-integer value (e.g., "abc", "12.5", ""), int() raises ValueError and crashes run_claude / exec_in_new_container. Environment variables come from external configuration — they should be validated, not trusted.
This is duplicated in two places (lines 594 and 877). Fix both:
issue_number_str = os.environ.get("EGG_ISSUE_NUMBER")
try:
issue_number = int(issue_number_str) if issue_number_str else None
except ValueError:
issue_number = NoneOr, since the gateway validates anyway, at minimum log a warning so misconfigurations are diagnosable.
Minor: issue_number=0 accepted as valid
The validation issue_number < 0 allows issue_number=0. GitHub issue numbers start at 1. This is unlikely to cause problems in practice but is worth noting — consider issue_number < 1 if you want strict validation.
Minor: _setup_session_repos missing pr_number parameter
_setup_session_repos accepts issue_number, pipeline_id, and agent_role but not pr_number. For consistency, if pr_number is part of the checkpoint metadata contract (it is — Session has the field, register_session accepts it, the client sends it), the runtime should also propagate it.
Summary
| Issue | Severity | File |
|---|---|---|
pr_number silently dropped in gateway |
Bug — data loss | gateway/gateway.py |
int() crash on bad EGG_ISSUE_NUMBER |
Bug — crash | sandbox/egg_lib/runtime.py (x2) |
issue_number=0 accepted |
Minor | gateway/gateway.py |
pr_number not threaded through _setup_session_repos / runtime |
Consistency gap | sandbox/egg_lib/runtime.py |
Please fix the two bugs before merging.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Address review feedback on PR #701: - Extract, validate, and forward pr_number through gateway session_create to register_session (was silently dropped) - Wrap int() parsing of EGG_ISSUE_NUMBER in try/except ValueError in both run_claude and exec_in_new_container to prevent crash on malformed env vars - Add EGG_PR_NUMBER parsing with same safe pattern - Thread pr_number through _setup_session_repos for consistency - Tighten issue_number/pr_number validation from < 0 to < 1 since GitHub numbers start at 1
|
All four issues addressed in c12a82a: Bug 1 — Bug 2 — Unhandled Minor — Minor — All existing tests pass (214 gateway, 74 sandbox, 91 edge/error path). — Authored by egg |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-review: All issues resolved
Verified the delta from 0953c76 to c12a82a against all four issues from the previous review.
Bug 1: pr_number not propagated through gateway session_create — Fixed
pr_number is now extracted from the request (data.get("pr_number")), validated, passed to register_session(), and included in audit_log. The full chain is consistent: client → gateway endpoint → session manager → persistence → deserialization.
Bug 2: Unhandled ValueError on malformed EGG_ISSUE_NUMBER — Fixed
Both run_claude and exec_in_new_container now wrap int() calls in try/except with warn() fallback to None. The same defensive pattern is applied to the new EGG_PR_NUMBER env var. warn is properly imported (line 59).
Minor: issue_number=0 accepted — Fixed
Validation changed from < 0 to < 1 for both issue_number and pr_number. Error messages updated to "must be a positive integer".
Minor: _setup_session_repos missing pr_number — Fixed
pr_number added to function signature, docstring, and forwarded to create_session(). Both run_claude and exec_in_new_container read EGG_PR_NUMBER and pass it through the full call chain.
No new issues found in the delta.
— Authored by egg
|
egg review completed. View run logs 3 previous review(s) hidden. |
Summary
issue_number,pipeline_id, andagent_rolefrom environment variables through session creation so checkpoint metadata is available for linking and correlationruntime.py→gateway.pyclient →gateway.pyserver →session_manager.pyTest plan
issue_number,pipeline_id, andagent_rolepopulates checkpoint metadataAuthored-by: egg