Skip to content

feat(relay): add cloud relay transport via Cloudflare + PartyServer#216

Merged
amondnet merged 15 commits intomainfrom
amondnet/trapezoidal-ginger
Mar 25, 2026
Merged

feat(relay): add cloud relay transport via Cloudflare + PartyServer#216
amondnet merged 15 commits intomainfrom
amondnet/trapezoidal-ginger

Conversation

@amondnet
Copy link
Copy Markdown
Contributor

@amondnet amondnet commented Mar 25, 2026

Summary

Add cloud relay transport as an alternative to webhook-based event delivery. Uses Cloudflare Workers + Durable Objects (PartyServer) as a persistent edge relay, enabling Agent Please to operate behind NAT/firewalls without public endpoints.

  • Add relay polling mode alongside existing poll and webhook modes
  • Add RelayTransport client using partysocket with auto-reconnect and event deduplication
  • Add apps/relay-worker/ — Cloudflare Worker with PartyServer for webhook ingress and WebSocket fan-out
  • Add relay config section (relay.url, relay.token, relay.room, relay.secret) with $ENV_VAR resolution
  • Add config validation for relay mode (require url and room)
  • HMAC SHA-256 signature verification on webhook ingress, bearer token auth on WebSocket connections
  • Hibernation enabled for zero-cost idle relay rooms

Architecture

Issue Tracker → HTTP webhook → Cloudflare Worker (PartyServer) → WebSocket → Agent Please (triggerRefresh)

Track

  • Track: cloud-relay-20260325
  • Spec: .please/docs/tracks/active/cloud-relay-20260325/spec.md
  • Plan: .please/docs/tracks/active/cloud-relay-20260325/plan.md

Test plan

  • All 776 core tests pass (14 new relay tests)
  • Lint clean across all workspaces
  • relay-worker type-checks with tsc --noEmit
  • Config parses relay section from YAML correctly
  • Config validates relay mode requires url and room
  • RelayTransport deduplicates events by event_id
  • Existing poll/webhook modes unaffected

Summary by cubic

Adds a cloud relay transport via Cloudflare Workers + PartyServer to deliver tracker webhooks over WebSocket, letting Agent Please run behind NAT/firewalls without a public endpoint. Also adds config/validation for the new relay mode and fixes missing relay in test ServiceConfig fixtures.

  • New Features

    • New polling mode: relay with config relay.url, relay.room, relay.token, relay.secret (supports $ENV_VAR for all fields).
    • RelayTransport client using partysocket with auto-reconnect and bounded event dedup; orchestrator manages lifecycle; skips unparseable messages.
    • New apps/relay-worker (@pleaseai/relay-worker) on Cloudflare using partyserver for webhook ingress and WebSocket fan-out, with HMAC (SHA-256) verification and bearer token auth; hibernation enabled; logs JSON parse errors.
    • Config validation requires relay.url and relay.room when mode is relay.
  • Migration

    • Deploy the worker with wrangler deploy, then point tracker webhooks to https://<worker>/webhook/:room.
    • In WORKFLOW.md, set polling.mode: relay and provide relay.url and relay.room (you can use $ENV_VAR; optional relay.token/relay.secret or RELAY_TOKEN/RELAY_SECRET envs).
    • No changes needed for existing poll/webhook modes.

Written for commit 4de2148. Summary will update on new commits.

amondnet added 11 commits March 25, 2026 23:18
Add cloud relay transport track for Cloudflare Workers + PartyServer
as an alternative to webhook-based event delivery. Includes spec
with 8 FRs, plan with 11 tasks covering relay worker package,
partysocket client, config extension, and orchestrator integration.
Add 'relay' as a third PollingMode alongside 'poll' and 'webhook'.
Add RelayConfig interface (url, token, room, secret) to ServiceConfig.
Add buildRelayConfig() with $ENV_VAR resolution for token and secret.
Extend pollingModeValue() to recognize 'relay' mode.
Add RelayTransport class using partysocket for auto-reconnecting
WebSocket connection to cloud relay. Integrate into Orchestrator
start/stop lifecycle — when polling.mode is 'relay', creates and
connects transport that calls triggerRefresh() on incoming events.
Export RelayTransport and RelayConfig from core barrel.
Scaffold packages/relay-worker/ as a new workspace for the cloud
relay. Implements RelayParty server class with:
- onRequest: webhook ingress with HMAC SHA-256 signature verification
- onConnect: bearer token authentication via query parameter
- broadcast: fan-out webhook events to all connected clients
- Hibernation enabled for zero-cost idle rooms

Worker entry point routes /webhook/:room to the party's onRequest
and /parties/relay-party/:room for WebSocket connections.
Add validateConfig checks for relay mode: require relay.url and
relay.room when polling.mode is 'relay'. Add Nitro plugin
(04.relay.ts) that creates and manages RelayTransport lifecycle
when relay mode is configured.
Fix unused import (spyOn), unused variable (event → _event),
and sort order issues in barrel exports and imports.
Add partysocket and partyserver to tech stack dependencies.
Add relay-worker to project structure. Add cloud relay transport
to product capabilities.
…Error

Address spec compliance issues:
- NFR-1: Add event_id to relay broadcast envelope and bounded
  dedup cache in RelayTransport to prevent duplicate processing
  after WebSocket reconnection
- Fix relay-worker TypeScript: await getServerByName(), cast env
  for routePartykitRequest()
- Extend ValidationError union with missing_relay_url and
  missing_relay_room codes
Remove apps/agent-please/server/plugins/04.relay.ts — the
orchestrator already creates and manages RelayTransport in its
start()/stop() lifecycle. The Nitro plugin was creating a second
WebSocket connection to the same relay room, causing double
triggerRefresh() calls per webhook event.

Also optimize relay-party.ts to read request body once and reuse
for both signature verification and JSON parsing.
Move relay-worker from packages/ to apps/ since it's an independently
deployable application, not a shared library. Update partyserver from
0.0.66 to 0.3.3, fix Env type with index signature and typed
DurableObjectNamespace<RelayParty> for v0.3.x compatibility.
Update all documentation references.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the deployment flexibility of Agent Please by introducing a cloud-based event relay system. This new architecture allows the agent to receive issue tracker events via WebSockets from a Cloudflare Worker, eliminating the previous requirement for a publicly accessible HTTP endpoint. This change is particularly beneficial for local development environments and deployments behind restrictive firewalls, simplifying setup and improving operational resilience without altering the core event processing logic.

Highlights

  • Cloud Relay Transport: Introduced a new relay polling mode, offering an alternative to webhook-based event delivery for Agent Please instances operating behind NAT/firewalls.
  • Cloudflare Worker Integration: Added a dedicated Cloudflare Worker (apps/relay-worker/) utilizing PartyServer and Durable Objects to act as a persistent edge relay for webhook ingress and WebSocket event fan-out.
  • Client-Side Relay Transport: Implemented a RelayTransport client in packages/core/ using partysocket for WebSocket connections, featuring auto-reconnect and event deduplication.
  • Configuration and Validation: Extended the configuration to include a relay section (relay.url, relay.token, relay.room, relay.secret) with environment variable resolution and added validation to ensure required fields are present when relay mode is active.
  • Security Enhancements: Incorporated HMAC SHA-256 signature verification for webhook ingress and bearer token authentication for WebSocket connections to the relay worker.
  • Cost Optimization: Enabled hibernation for idle relay rooms within the Cloudflare Worker, ensuring zero compute cost when not actively in use.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Significant feature adding a new architectural component (Cloudflare Worker), new production dependencies, and modifying core orchestrator lifecycle and configuration logic.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 21 files

Confidence score: 3/5

  • There is a concrete configuration risk in packages/core/src/config.ts: relay.url and relay.room are not using resolveEnvValue, so $RELAY_URL/$RELAY_ROOM can remain literal strings instead of resolving to env values or null when missing.
  • This is moderate merge risk (severity 6/10 with high confidence): validation may incorrectly pass while runtime relay settings are effectively misconfigured, which can cause user-facing connection/room behavior issues.
  • Pay close attention to packages/core/src/config.ts - ensure relay fields resolve env placeholders consistently so missing variables fail in a detectable way.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/src/config.ts">

<violation number="1" location="packages/core/src/config.ts:342">
P2: Use resolveEnvValue for relay.url and relay.room so $ENV_VAR placeholders are resolved (and missing envs surface as null). As written, "$RELAY_URL"/"$RELAY_ROOM" stays literal and validation will pass even though the relay can’t connect.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Tracker as Issue Tracker (GitHub/Asana)
    participant Worker as Relay Worker (CF Worker)
    participant DO as RelayParty (Durable Object)
    participant Client as RelayTransport (Agent Core)
    participant Orch as Orchestrator

    Note over Client,Orch: Initialization (Polling Mode: 'relay')
    Orch->>Client: NEW: new RelayTransport(config)
    Client->>Worker: NEW: WSS Connect /parties/relay/:room?token=xxx
    Worker->>DO: Route request to room
    
    alt Invalid Token
        DO-->>Client: Close connection (4001 Unauthorized)
    else Valid Token
        DO-->>Client: Message: { type: 'connected' }
    end

    Note over Tracker,Orch: Runtime Event Flow
    Tracker->>Worker: POST /webhook/:room (with HMAC Signature)
    Worker->>DO: NEW: Forward fetch request
    
    alt HMAC Signature Verification
        DO->>DO: NEW: verifySignature(body, secret)
        alt Invalid Signature
            DO-->>Worker: 401 Unauthorized
            Worker-->>Tracker: 401 Unauthorized
        else Valid Signature
            DO->>DO: NEW: Create event envelope with randomUUID
            DO->>Client: NEW: Broadcast (WebSocket Message)
            DO-->>Worker: 200 OK (accepted: true)
            Worker-->>Tracker: 200 OK
        end
    end

    Note over Client,Orch: Client Processing
    Client->>Client: NEW: JSON.parse(data)
    
    opt NEW: Duplicate Check
        Client->>Client: Check event_id in seenEventIds (Set)
    end
    
    alt NEW: Not a duplicate
        Client->>Orch: CHANGED: triggerRefresh()
        Orch->>Orch: Start workflow reconcile tick
    else Is duplicate
        Client->>Client: Log skip and ignore
    end

    Note over Client,Worker: Resilience
    opt Connection Lost
        Client->>Worker: NEW: Auto-reconnect (partysocket backoff)
    end
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Cloud Relay Transport feature, enabling Agent Please to function behind NAT or firewalls by utilizing a Cloudflare Worker as an event relay. Key changes include adding a new relay-worker application, integrating the partysocket client into the core, and updating configuration and orchestration logic to support the new relay mode. Feedback suggests updating metadata.json to use "TBD" for empty issue/PR fields, correcting plan.md to remove a reference to a non-existent Nitro plugin file, adding error logging to a catch block in apps/relay-worker/src/relay-party.ts, removing a redundant length check in the verifySignature function, and re-evaluating the logic for calling triggerRefresh() when unparseable messages are received in packages/core/src/relay-transport.ts.

@amondnet amondnet self-assigned this Mar 25, 2026
- Use resolveEnvValue for relay.url and relay.room in buildRelayConfig
- Use "TBD" instead of empty string for issue/pr in metadata.json
- Remove deleted 04.relay.ts from plan.md Key Files / Create section
- Log error in empty catch block for JSON.parse in relay-party.ts
- Remove triggerRefresh on unparseable messages in relay-transport.ts
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 5 files (changes from recent commits).

Requires human review: This PR introduces a new architectural component (Cloudflare Worker), a new transport layer (WebSocket), and security-sensitive logic (HMAC/bearer token auth) requiring human review.

Add relay: { url: null, token: null, room: null, secret: null } to
all test files that construct ServiceConfig literals, fixing TS2741
type errors in CI type-check.
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 4 files (changes from recent commits).

Requires human review: Significant feature addition introducing a new transport architecture (Cloudflare Workers + WebSockets), new dependencies, and changes to the core orchestrator and configuration logic.

@amondnet amondnet enabled auto-merge (squash) March 25, 2026 17:56
@sonarqubecloud
Copy link
Copy Markdown

@amondnet amondnet merged commit b438b71 into main Mar 25, 2026
6 checks passed
@amondnet amondnet deleted the amondnet/trapezoidal-ginger branch March 25, 2026 17:57
@pleaeai-bot pleaeai-bot bot mentioned this pull request Mar 25, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".claude/agent-memory/please-please-code-explorer/project_service_config_schema.md">

<violation number="1" location=".claude/agent-memory/please-please-code-explorer/project_service_config_schema.md:16">
P2: The new ServiceConfig docs incorrectly omit `relay` from `polling.mode`; this conflicts with the relay mode introduced by this PR.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

| `platforms` | `Record<string, PlatformConfig>` | Keyed map; union of GitHub/Slack/Asana |
| `projects` | `ProjectConfig[]` | Array with per-project status mappings |
| `channels` | `ChannelConfig[]` | Array with per-channel platform + associations |
| `polling` | `{ mode: PollingMode, interval_ms: number }` | `mode: 'poll' | 'webhook'` |
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new ServiceConfig docs incorrectly omit relay from polling.mode; this conflicts with the relay mode introduced by this PR.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .claude/agent-memory/please-please-code-explorer/project_service_config_schema.md, line 16:

<comment>The new ServiceConfig docs incorrectly omit `relay` from `polling.mode`; this conflicts with the relay mode introduced by this PR.</comment>

<file context>
@@ -0,0 +1,81 @@
+| `platforms` | `Record<string, PlatformConfig>` | Keyed map; union of GitHub/Slack/Asana |
+| `projects` | `ProjectConfig[]` | Array with per-project status mappings |
+| `channels` | `ChannelConfig[]` | Array with per-channel platform + associations |
+| `polling` | `{ mode: PollingMode, interval_ms: number }` | `mode: 'poll' | 'webhook'` |
+| `workspace` | `{ root: string, branch_prefix: string \| null }` | Path with `~` expansion |
+| `hooks` | `{ after_create, before_run, after_run, before_remove: string \| null, timeout_ms: number }` | Shell hook scripts |
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant