A fork of beam-cloud/beta9 for Agentosaurus
Enable external worker support with a new Go-based agent and unified inference routing, so you can run hybrid GPU workloads over SSH/Tailscale with clear docs and a multi-arch CI build pipeline.
Work in Progress - This fork is under active development as the compute infrastructure layer for Agentosaurus, a platform focused on democratizing GPU access for climate research and AI workloads within the European Union.
Build a distributed GPU compute platform that:
- Brings Your Own GPU: Connect any machine (cloud VMs, workstations, Mac Studios) to a unified compute pool
- Supports Heterogeneous Hardware: NVIDIA CUDA, Apple MPS, AMD ROCm (planned), Intel Arc (planned)
- Enables Serverless by Default: Scale to zero, pay only for compute time used
- Maintains EU Data Sovereignty: All data and compute remains within European infrastructure
- Prioritizes Carbon Efficiency: Verified renewable energy usage with transparent carbon reporting
TAILSCALE MESH VPN
(Encrypted overlay network - 100.x.x.x addressing)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β GATEWAY β β WORKER 1 β β WORKER 2 β
β (OCI Cloud) β β (Mac MPS) β β (NVIDIA GPU) β
β β β β β β
β βββββββββββββ β β βββββββββββββ β β βββββββββββββ β
β β Gateway ββββΌββββββΌβββ b9agent β β β β b9agent β β
β β :1993/94 β β β β :9999 β β β β :9999 β β
β βββββββββββββ β β βββββββ¬ββββββ β β βββββββ¬ββββββ β
β β β β β β β β β
β βΌ β β βΌ β β βΌ β
β βββββββββββββ β β βββββββββββββ β β βββββββββββββ β
β β k3s API ββββΌββββββΌβββ Ollama β β β β vLLM β β
β β Scheduler β β β β :11434 β β β β :8000 β β
β βββββββββββββ β β βββββββββββββ β β βββββββββββββ β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Control Plane MPS Inference CUDA Inference
- New Go-based Agent (
b9agent): Includes TUI, persistent config, control API (start/stop/pull models), keepalive, and job monitoring. - Unified Inference Routing: Gateway inference router, model registry, and OpenAI-compatible endpoints; enabled Tailscale and hostNetwork for external connectivity.
- External Worker API: Expanded machine API with register/keepalive and TTL-based lifecycle; added detailed API/docs for external workers and self-hosting.
- Python SDK: Added
beta9.inferencemodule (chat/generate/embed) and test scripts. - Flexible Configuration: External worker config supports direct Redis host, external image registries/ports, CoreDNS override, and Podman-friendly k3d settings.
- Multi-Arch CI: New GitHub Action to build/push multi-arch images to
registry.agentosaurus.comwith improved sequencing and reliability.
- Registry Credentials: Create
registry-credentialssecret and setExternalImageRegistry/ runner registry in config. - Networking: Configure
TAILSCALE_AUTHKEYor directREDIS_HOST; expose NodePorts for Redis/S3/Registry; deploy Gateway withhostNetwork: true. - Agent Setup: Initialize and run
b9agent, then set up SSH tunnel (forward 1994; reverse 6443 if needed). - K8s Config: Apply CoreDNS and metrics-server manifests; update k3d config for Podman compatibility.
Real-time terminal interface showing:
βββ Beta9 Agent: 1c1b50c8 ββββββββββββββββββββββββββββββββββββββββββββββββ
β Status: READY β Gateway: http://100.72.101.23:1994 β Pool: external β
β CPU: 28.1% β Memory: 78.1% β GPUs: 0 β Last Heartbeat: 25s ago β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β WORKER PODS β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ’
β No jobs yet β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β INFERENCE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ’
β Status: running β Endpoint: 100.100.74.117:11434 β
β Models: gemma3:1b β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β LOGS β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ’
β 10:15:23 Control API listening on :9999 β
β 10:15:24 Inference: starting Ollama... β
β 10:15:26 Inference: ready on :11434 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Press Ctrl+C to quit
- Go 1.21+ (for building the agent)
- Tailscale account and network
- Ollama (for Mac inference)
cd backend/beta9
go build ./cmd/b9agent/..../b9agent init \
--gateway <GATEWAY_TAILSCALE_IP>:1994 \
--token <MACHINE_TOKEN> \
--pool external./b9agent# Test inference pipeline
TEST_MODEL=llama3.2 ./backend/remote_servers/scripts/dgpu/test_inference.shAgent configuration is stored in ~/.b9agent/config.yaml:
gateway:
host: "100.72.101.23"
port: 1994
machine:
id: "1c1b50c8"
token: "<machine-token>"
hostname: "100.100.74.117"
pool: "external"
k3s:
token: "<k3s-bearer-token>"The inference module provides a lightweight client for inference endpoints:
from beta9 import inference
# Configure endpoint
inference.configure(host="100.100.74.117", port=11434)
# Chat completion
result = inference.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(result.content)
# Text generation
result = inference.generate(
model="llama3.2",
prompt="Once upon a time"
)
# Embeddings
embedding = inference.embed(
model="nomic-embed-text",
input="Hello world"
)
# List models
models = inference.list_models()Run the inference test suite:
# Default model (llama3.2)
./backend/remote_servers/scripts/dgpu/test_inference.sh
# Custom model
TEST_MODEL=gemma3:1b ./backend/remote_servers/scripts/dgpu/test_inference.sh
# Custom host
BETA9_INFERENCE_HOST=100.100.74.117 ./backend/remote_servers/scripts/dgpu/test_inference.shTest output:
[0/6] Sending start-inference command to agent... β
[1/6] Testing health endpoint... β
[2/6] Checking model availability... β
[3/6] Testing chat via curl... β
[4/6] Testing Python SDK... β
[5/6] Testing latency (3 requests)... β
[6/6] Stopping inference server... β
- External worker registration and keepalive
- Apple Silicon (MPS) inference via Ollama
- Control API for inference lifecycle
- TUI dashboard with inference status and logs
- Python SDK for inference
- vLLM integration for NVIDIA GPUs
- SGLang integration for structured outputs
- Model routing based on hardware capabilities
- Automatic model format conversion (GGUF/Safetensors)
- Prometheus metrics export
- Carbon footprint tracking
- Rate limiting and quotas
- Multi-tenant isolation
- EU AI Act compliance reporting
- Model provenance tracking
- Audit logging
- SSO integration
beta9/
βββ cmd/
β βββ b9agent/ # Go agent binary
β βββ main.go
βββ pkg/
β βββ agent/
β βββ agent.go # Agent lifecycle management
β βββ control.go # HTTP control API
β βββ inference.go # OllamaManager and inference types
β βββ state.go # Agent state for TUI
β βββ tui.go # Terminal UI rendering
βββ sdk/
β βββ src/
β βββ beta9/
β βββ inference.py # Python inference SDK
βββ docs/
βββ external-workers/ # External worker documentation
- Agentosaurus - Organization discovery platform and parent project
- FlowState - AI presentation system using this compute layer
- beam-cloud/beta9 - Upstream project
This fork maintains the same AGPL-3.0 license as the upstream beta9 project.
Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.
- Beam Cloud for the original beta9 project
- Ollama for the inference server
- Tailscale for the mesh VPN infrastructure
Beta9 uses Tailscale for secure mesh networking between components.
- Network Isolation: All traffic between Gateway, Workers, and Clients travels over an encrypted WireGuard mesh.
- Endpoint Security:
- Internal management endpoints (inference control, keepalives) are bound to
0.0.0.0but are effectively protected because the nodes are only reachable via the private Tailscale network. - The Gateway uses
hostNetwork: trueto expose these services directly to the mesh. - Note: Do not expose the Gateway's ports (1993, 1994) to the public internet. Access should only be possible via the Tailscale mesh or a secure ingress.
- Internal management endpoints (inference control, keepalives) are bound to
The following security enhancements are planned for the upcoming Security Epic but are currently mitigated by the network architecture:
- Inference Endpoint Auth (
beta9-b1j): Direct inference endpoints (port 11434/8000) currently do not require per-request authentication.- Mitigation: These ports are only accessible within the encrypted Tailscale mesh. Workers are isolated from the public internet.
- RBAC Scoping (
beta9-7aw): The k3d manifest uses broad ClusterRole permissions for simplicity during beta.- Mitigation: The cluster is intended for single-tenant use. Scoped RBAC profiles will be introduced in Phase 3.
- Token Binding (
beta9-bx0): Keepalive tokens are not strictly bound to machine IDs in the current implementation.- Mitigation: Tokens are secret and transmitted over encrypted channels. Token binding will be enforced in the next auth refactor.