feat(relay): add cloud relay transport via Cloudflare + PartyServer#216
feat(relay): add cloud relay transport via Cloudflare + PartyServer#216
Conversation
Add cloud relay transport track for Cloudflare Workers + PartyServer as an alternative to webhook-based event delivery. Includes spec with 8 FRs, plan with 11 tasks covering relay worker package, partysocket client, config extension, and orchestrator integration.
Add 'relay' as a third PollingMode alongside 'poll' and 'webhook'. Add RelayConfig interface (url, token, room, secret) to ServiceConfig. Add buildRelayConfig() with $ENV_VAR resolution for token and secret. Extend pollingModeValue() to recognize 'relay' mode.
Add RelayTransport class using partysocket for auto-reconnecting WebSocket connection to cloud relay. Integrate into Orchestrator start/stop lifecycle — when polling.mode is 'relay', creates and connects transport that calls triggerRefresh() on incoming events. Export RelayTransport and RelayConfig from core barrel.
Scaffold packages/relay-worker/ as a new workspace for the cloud relay. Implements RelayParty server class with: - onRequest: webhook ingress with HMAC SHA-256 signature verification - onConnect: bearer token authentication via query parameter - broadcast: fan-out webhook events to all connected clients - Hibernation enabled for zero-cost idle rooms Worker entry point routes /webhook/:room to the party's onRequest and /parties/relay-party/:room for WebSocket connections.
Add validateConfig checks for relay mode: require relay.url and relay.room when polling.mode is 'relay'. Add Nitro plugin (04.relay.ts) that creates and manages RelayTransport lifecycle when relay mode is configured.
Fix unused import (spyOn), unused variable (event → _event), and sort order issues in barrel exports and imports.
Add partysocket and partyserver to tech stack dependencies. Add relay-worker to project structure. Add cloud relay transport to product capabilities.
…Error Address spec compliance issues: - NFR-1: Add event_id to relay broadcast envelope and bounded dedup cache in RelayTransport to prevent duplicate processing after WebSocket reconnection - Fix relay-worker TypeScript: await getServerByName(), cast env for routePartykitRequest() - Extend ValidationError union with missing_relay_url and missing_relay_room codes
Remove apps/agent-please/server/plugins/04.relay.ts — the orchestrator already creates and manages RelayTransport in its start()/stop() lifecycle. The Nitro plugin was creating a second WebSocket connection to the same relay room, causing double triggerRefresh() calls per webhook event. Also optimize relay-party.ts to read request body once and reuse for both signature verification and JSON parsing.
Move relay-worker from packages/ to apps/ since it's an independently deployable application, not a shared library. Update partyserver from 0.0.66 to 0.3.3, fix Env type with index signature and typed DurableObjectNamespace<RelayParty> for v0.3.x compatibility. Update all documentation references.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the deployment flexibility of Agent Please by introducing a cloud-based event relay system. This new architecture allows the agent to receive issue tracker events via WebSockets from a Cloudflare Worker, eliminating the previous requirement for a publicly accessible HTTP endpoint. This change is particularly beneficial for local development environments and deployments behind restrictive firewalls, simplifying setup and improving operational resilience without altering the core event processing logic. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
0 issues found across 1 file (changes from recent commits).
Requires human review: Significant feature adding a new architectural component (Cloudflare Worker), new production dependencies, and modifying core orchestrator lifecycle and configuration logic.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
1 issue found across 21 files
Confidence score: 3/5
- There is a concrete configuration risk in
packages/core/src/config.ts:relay.urlandrelay.roomare not usingresolveEnvValue, so$RELAY_URL/$RELAY_ROOMcan remain literal strings instead of resolving to env values ornullwhen missing. - This is moderate merge risk (severity 6/10 with high confidence): validation may incorrectly pass while runtime relay settings are effectively misconfigured, which can cause user-facing connection/room behavior issues.
- Pay close attention to
packages/core/src/config.ts- ensure relay fields resolve env placeholders consistently so missing variables fail in a detectable way.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/src/config.ts">
<violation number="1" location="packages/core/src/config.ts:342">
P2: Use resolveEnvValue for relay.url and relay.room so $ENV_VAR placeholders are resolved (and missing envs surface as null). As written, "$RELAY_URL"/"$RELAY_ROOM" stays literal and validation will pass even though the relay can’t connect.</violation>
</file>
Architecture diagram
sequenceDiagram
participant Tracker as Issue Tracker (GitHub/Asana)
participant Worker as Relay Worker (CF Worker)
participant DO as RelayParty (Durable Object)
participant Client as RelayTransport (Agent Core)
participant Orch as Orchestrator
Note over Client,Orch: Initialization (Polling Mode: 'relay')
Orch->>Client: NEW: new RelayTransport(config)
Client->>Worker: NEW: WSS Connect /parties/relay/:room?token=xxx
Worker->>DO: Route request to room
alt Invalid Token
DO-->>Client: Close connection (4001 Unauthorized)
else Valid Token
DO-->>Client: Message: { type: 'connected' }
end
Note over Tracker,Orch: Runtime Event Flow
Tracker->>Worker: POST /webhook/:room (with HMAC Signature)
Worker->>DO: NEW: Forward fetch request
alt HMAC Signature Verification
DO->>DO: NEW: verifySignature(body, secret)
alt Invalid Signature
DO-->>Worker: 401 Unauthorized
Worker-->>Tracker: 401 Unauthorized
else Valid Signature
DO->>DO: NEW: Create event envelope with randomUUID
DO->>Client: NEW: Broadcast (WebSocket Message)
DO-->>Worker: 200 OK (accepted: true)
Worker-->>Tracker: 200 OK
end
end
Note over Client,Orch: Client Processing
Client->>Client: NEW: JSON.parse(data)
opt NEW: Duplicate Check
Client->>Client: Check event_id in seenEventIds (Set)
end
alt NEW: Not a duplicate
Client->>Orch: CHANGED: triggerRefresh()
Orch->>Orch: Start workflow reconcile tick
else Is duplicate
Client->>Client: Log skip and ignore
end
Note over Client,Worker: Resilience
opt Connection Lost
Client->>Worker: NEW: Auto-reconnect (partysocket backoff)
end
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Code Review
This pull request introduces a new Cloud Relay Transport feature, enabling Agent Please to function behind NAT or firewalls by utilizing a Cloudflare Worker as an event relay. Key changes include adding a new relay-worker application, integrating the partysocket client into the core, and updating configuration and orchestration logic to support the new relay mode. Feedback suggests updating metadata.json to use "TBD" for empty issue/PR fields, correcting plan.md to remove a reference to a non-existent Nitro plugin file, adding error logging to a catch block in apps/relay-worker/src/relay-party.ts, removing a redundant length check in the verifySignature function, and re-evaluating the logic for calling triggerRefresh() when unparseable messages are received in packages/core/src/relay-transport.ts.
- Use resolveEnvValue for relay.url and relay.room in buildRelayConfig - Use "TBD" instead of empty string for issue/pr in metadata.json - Remove deleted 04.relay.ts from plan.md Key Files / Create section - Log error in empty catch block for JSON.parse in relay-party.ts - Remove triggerRefresh on unparseable messages in relay-transport.ts
There was a problem hiding this comment.
0 issues found across 5 files (changes from recent commits).
Requires human review: This PR introduces a new architectural component (Cloudflare Worker), a new transport layer (WebSocket), and security-sensitive logic (HMAC/bearer token auth) requiring human review.
Add relay: { url: null, token: null, room: null, secret: null } to
all test files that construct ServiceConfig literals, fixing TS2741
type errors in CI type-check.
There was a problem hiding this comment.
0 issues found across 4 files (changes from recent commits).
Requires human review: Significant feature addition introducing a new transport architecture (Cloudflare Workers + WebSockets), new dependencies, and changes to the core orchestrator and configuration logic.
|
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".claude/agent-memory/please-please-code-explorer/project_service_config_schema.md">
<violation number="1" location=".claude/agent-memory/please-please-code-explorer/project_service_config_schema.md:16">
P2: The new ServiceConfig docs incorrectly omit `relay` from `polling.mode`; this conflicts with the relay mode introduced by this PR.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| | `platforms` | `Record<string, PlatformConfig>` | Keyed map; union of GitHub/Slack/Asana | | ||
| | `projects` | `ProjectConfig[]` | Array with per-project status mappings | | ||
| | `channels` | `ChannelConfig[]` | Array with per-channel platform + associations | | ||
| | `polling` | `{ mode: PollingMode, interval_ms: number }` | `mode: 'poll' | 'webhook'` | |
There was a problem hiding this comment.
P2: The new ServiceConfig docs incorrectly omit relay from polling.mode; this conflicts with the relay mode introduced by this PR.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/agent-memory/please-please-code-explorer/project_service_config_schema.md, line 16:
<comment>The new ServiceConfig docs incorrectly omit `relay` from `polling.mode`; this conflicts with the relay mode introduced by this PR.</comment>
<file context>
@@ -0,0 +1,81 @@
+| `platforms` | `Record<string, PlatformConfig>` | Keyed map; union of GitHub/Slack/Asana |
+| `projects` | `ProjectConfig[]` | Array with per-project status mappings |
+| `channels` | `ChannelConfig[]` | Array with per-channel platform + associations |
+| `polling` | `{ mode: PollingMode, interval_ms: number }` | `mode: 'poll' | 'webhook'` |
+| `workspace` | `{ root: string, branch_prefix: string \| null }` | Path with `~` expansion |
+| `hooks` | `{ after_create, before_run, after_run, before_remove: string \| null, timeout_ms: number }` | Shell hook scripts |
</file context>



Summary
Add cloud relay transport as an alternative to webhook-based event delivery. Uses Cloudflare Workers + Durable Objects (PartyServer) as a persistent edge relay, enabling Agent Please to operate behind NAT/firewalls without public endpoints.
relaypolling mode alongside existingpollandwebhookmodesRelayTransportclient usingpartysocketwith auto-reconnect and event deduplicationapps/relay-worker/— Cloudflare Worker with PartyServer for webhook ingress and WebSocket fan-outrelay.url,relay.token,relay.room,relay.secret) with$ENV_VARresolutionArchitecture
Track
cloud-relay-20260325.please/docs/tracks/active/cloud-relay-20260325/spec.md.please/docs/tracks/active/cloud-relay-20260325/plan.mdTest plan
relay-workertype-checks withtsc --noEmitSummary by cubic
Adds a cloud relay transport via Cloudflare Workers + PartyServer to deliver tracker webhooks over WebSocket, letting Agent Please run behind NAT/firewalls without a public endpoint. Also adds config/validation for the new relay mode and fixes missing
relayin testServiceConfigfixtures.New Features
relaywith configrelay.url,relay.room,relay.token,relay.secret(supports$ENV_VARfor all fields).RelayTransportclient usingpartysocketwith auto-reconnect and bounded event dedup; orchestrator manages lifecycle; skips unparseable messages.apps/relay-worker(@pleaseai/relay-worker) on Cloudflare usingpartyserverfor webhook ingress and WebSocket fan-out, with HMAC (SHA-256) verification and bearer token auth; hibernation enabled; logs JSON parse errors.relay.urlandrelay.roomwhen mode isrelay.Migration
wrangler deploy, then point tracker webhooks tohttps://<worker>/webhook/:room.WORKFLOW.md, setpolling.mode: relayand providerelay.urlandrelay.room(you can use$ENV_VAR; optionalrelay.token/relay.secretorRELAY_TOKEN/RELAY_SECRETenvs).poll/webhookmodes.Written for commit 4de2148. Summary will update on new commits.