Skip to content

feat: SSH transport for local sandboxed deployments #104

@howie

Description

@howie

Background

OpenAB targets k3s on cloud, where Kubernetes NetworkPolicy and Pod isolation handle security. For local deployments (developer laptop, home server), the only path today is:

[agent]
command = "claude"
args = ["--acp"]  # full host permissions

The Claude subprocess inherits the host's full filesystem and network access. For a Discord bot accepting messages from arbitrary users, this is a meaningful attack surface.

Proposal: SSH as a zero-code-change transport

AcpConnection::spawn() treats the agent as a stdio JSON-RPC process. SSH is a transparent byte pipe over that same stdio — no changes to ACP protocol, SessionPool, or AcpConnection internals.

# Current (local, no isolation)
[agent]
command = "claude"
args = ["--acp"]

# Proposed (SSH to sandbox)
[agent]
command = "ssh"
args = [
  "-T",                                     # No PTY (see below)
  "-o", "BatchMode=yes",                    # Fail-fast, no interactive prompts
  "-o", "ServerAliveInterval=30",           # Keep-alive for long sessions
  "-o", "ServerAliveCountMax=3",
  "-o", "StrictHostKeyChecking=accept-new", # Daemon has no terminal
  "user@sandbox-host",
  "claude", "--acp"
]

That's the entire diff from OpenAB's perspective. Zero code changes, zero new dependencies, zero additional maintenance burden for maintainers. If SSH transport breaks, it's a user configuration issue, not an OpenAB bug.

Current                              Proposed
───────────────────────              ──────────────────────────────
OpenAB                               OpenAB
  │ spawn                              │ spawn
  ▼                                    ▼
claude (host permissions)            ssh -T user@sandbox
  ├─ reads ~/.ssh ✗                    │ encrypted stdio pipe
  ├─ reads ~/Documents ✗               ▼
  └─ unrestricted network ✗          claude (inside sandbox)
                                       ├─ Landlock: /workspace only ✓
                                       ├─ Network: allowlist only ✓
                                       └─ MCP via host proxy ✓

Why -T is critical (experimentally verified)

I tested SSH stdio with an OrbStack VM. Results:

Flag Behavior JSON-RPC safe?
-T Clean byte pipe, stderr separated ✅ Yes
-t Warns "PTY not allocated", stderr leaks into stdout ❌ Corrupts JSON stream
-tt Forced PTY + piped stdin → hangs indefinitely (exit 144/SIGKILL) ❌ Deadlock

PTY inserts CR/LF conversion (\n\r\n), merges stderr into stdout, and enables echo mode — all of which break JSON-RPC parsing. -T is mandatory, not optional.

Sandbox is user's choice

The proposal is about SSH as a transport, not any specific sandbox:

Environment SSH target Notes
Mac (OrbStack) vm-name@orb Via ~/.orbstack/ssh/config ProxyCommand
Linux user@nspawn-container systemd-nspawn with SSH
Remote machine user@10.0.0.5 Any Linux server
Docker wrapper script using docker exec Alternative to SSH

MCP server access from sandbox (experimentally verified)

Tested from OrbStack VM:

From VM → 127.0.0.1:18765     → FAIL ❌ (VM's localhost ≠ host's localhost)
From VM → host.internal:18765  → 200 OK ✅ (OrbStack DNS alias to host)

For MCP servers running on the host, the sandbox cannot use localhost. Options vary by sandbox technology:

MCP access patterns:

Option A: Host DNS alias (OrbStack)
  claude (VM) ──http://host.internal:PORT──> MCP server (host)

Option B: SSH port forwarding (universal)
  ssh -L 8080:localhost:8080 user@sandbox
  claude (VM) ──http://localhost:8080──> [tunnel] ──> MCP server (host)

Option C: Network bridge (Docker --network host)
  claude (container) ──http://localhost:PORT──> MCP server (host)

This means web search from inside a sandbox works without opening the sandbox to arbitrary domains:

claude (sandbox) ──host.internal:8080──> MCP search proxy (host) ──HTTPS──> Brave/Tavily

Known limitations

1. kill_on_drop does not reliably terminate remote processes

Experimentally verified: killing the local SSH client process leaves the remote subprocess running.

kill ssh-client → SSH server receives EOF → sends SIGHUP to remote shell
                                          → but remote claude may survive
                                            (especially with nohup or ControlMaster)

Mitigations:

  • Do not use SSH ControlMaster for agent connections
  • Ensure SSH server has ClientAliveInterval set (detects dead clients)
  • Session pool TTL cleanup should verify remote process health
  • Future: OpenAB could send an explicit ACP session/close before dropping the connection

2. SSH connection startup latency

Each AcpConnection::spawn() incurs SSH handshake overhead (~50-200ms). Negligible for long-lived sessions (pool TTL = 24h), but noticeable if sessions are frequently recycled. ControlMaster could reduce this but conflicts with limitation #1.

3. SSH key auth is required

OpenAB runs as a daemon without a terminal. Interactive password prompts will hang the process. -o BatchMode=yes forces fail-fast behavior. Users must configure SSH key-based auth beforehand.

Scope

Intentionally narrow:

  • ✅ Document SSH as a supported command pattern with config example and SSH flag rationale
  • ✅ Add config.toml.example snippet for the SSH sandbox case
  • ❌ No changes to AcpConnection, SessionPool, or ACP protocol
  • ❌ No sandbox-specific code or dependencies
  • ❌ No changes to cloud/k3s deployment path

If there's interest in a first-class [agent.backend] abstraction later, that can be a separate discussion.

Relation to #99

#99 addresses the input side — how prompts reach OpenAB (Discord vs HTTP).
This issue addresses the execution side — where and with what permissions the agent runs.

The two are orthogonal and can be implemented independently.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions