Skip to content
/ beta9 Public
forked from beam-cloud/beta9

Ultrafast serverless GPU inference, sandboxes, and background jobs

License

Notifications You must be signed in to change notification settings

Wingie/beta9

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Beta9 - Distributed GPU Compute Platform

A fork of beam-cloud/beta9 for Agentosaurus

Enable external worker support with a new Go-based agent and unified inference routing, so you can run hybrid GPU workloads over SSH/Tailscale with clear docs and a multi-arch CI build pipeline.

Project Status

Work in Progress - This fork is under active development as the compute infrastructure layer for Agentosaurus, a platform focused on democratizing GPU access for climate research and AI workloads within the European Union.

Vision

Build a distributed GPU compute platform that:

  • Brings Your Own GPU: Connect any machine (cloud VMs, workstations, Mac Studios) to a unified compute pool
  • Supports Heterogeneous Hardware: NVIDIA CUDA, Apple MPS, AMD ROCm (planned), Intel Arc (planned)
  • Enables Serverless by Default: Scale to zero, pay only for compute time used
  • Maintains EU Data Sovereignty: All data and compute remains within European infrastructure
  • Prioritizes Carbon Efficiency: Verified renewable energy usage with transparent carbon reporting

Architecture

                          TAILSCALE MESH VPN
            (Encrypted overlay network - 100.x.x.x addressing)
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚                                                             β”‚
    β–Ό                           β–Ό                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   GATEWAY       β”‚     β”‚   WORKER 1      β”‚     β”‚   WORKER 2      β”‚
β”‚   (OCI Cloud)   β”‚     β”‚   (Mac MPS)     β”‚     β”‚   (NVIDIA GPU)  β”‚
β”‚                 β”‚     β”‚                 β”‚     β”‚                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Gateway   │◄─┼─────┼──│ b9agent   β”‚  β”‚     β”‚  β”‚ b9agent   β”‚  β”‚
β”‚  β”‚ :1993/94  β”‚  β”‚     β”‚  β”‚ :9999     β”‚  β”‚     β”‚  β”‚ :9999     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚     β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚     β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β”‚        β”‚        β”‚     β”‚        β”‚        β”‚     β”‚        β”‚        β”‚
β”‚        β–Ό        β”‚     β”‚        β–Ό        β”‚     β”‚        β–Ό        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ k3s API   │◄─┼─────┼──│ Ollama    β”‚  β”‚     β”‚  β”‚ vLLM      β”‚  β”‚
β”‚  β”‚ Scheduler β”‚  β”‚     β”‚  β”‚ :11434    β”‚  β”‚     β”‚  β”‚ :8000     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     Control Plane           MPS Inference         CUDA Inference

New Features in this Fork

  • New Go-based Agent (b9agent): Includes TUI, persistent config, control API (start/stop/pull models), keepalive, and job monitoring.
  • Unified Inference Routing: Gateway inference router, model registry, and OpenAI-compatible endpoints; enabled Tailscale and hostNetwork for external connectivity.
  • External Worker API: Expanded machine API with register/keepalive and TTL-based lifecycle; added detailed API/docs for external workers and self-hosting.
  • Python SDK: Added beta9.inference module (chat/generate/embed) and test scripts.
  • Flexible Configuration: External worker config supports direct Redis host, external image registries/ports, CoreDNS override, and Podman-friendly k3d settings.
  • Multi-Arch CI: New GitHub Action to build/push multi-arch images to registry.agentosaurus.com with improved sequencing and reliability.

Migration Guide for Existing Users

  1. Registry Credentials: Create registry-credentials secret and set ExternalImageRegistry / runner registry in config.
  2. Networking: Configure TAILSCALE_AUTHKEY or direct REDIS_HOST; expose NodePorts for Redis/S3/Registry; deploy Gateway with hostNetwork: true.
  3. Agent Setup: Initialize and run b9agent, then set up SSH tunnel (forward 1994; reverse 6443 if needed).
  4. K8s Config: Apply CoreDNS and metrics-server manifests; update k3d config for Podman compatibility.

TUI Dashboard

Real-time terminal interface showing:

╔══ Beta9 Agent: 1c1b50c8 ═══════════════════════════════════════════════╗
β•‘ Status: READY β”‚ Gateway: http://100.72.101.23:1994 β”‚ Pool: external    β•‘
β•‘ CPU: 28.1% β”‚ Memory: 78.1% β”‚ GPUs: 0 β”‚ Last Heartbeat: 25s ago         β•‘
╠════════════════════════════════════════════════════════════════════════╣
β•‘ WORKER PODS                                                            β•‘
β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
β•‘ No jobs yet                                                            β•‘
╠════════════════════════════════════════════════════════════════════════╣
β•‘ INFERENCE                                                              β•‘
β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
β•‘ Status: running β”‚ Endpoint: 100.100.74.117:11434                       β•‘
β•‘ Models: gemma3:1b                                                      β•‘
╠════════════════════════════════════════════════════════════════════════╣
β•‘ LOGS                                                                   β•‘
β•Ÿβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β•’
β•‘ 10:15:23 Control API listening on :9999                                β•‘
β•‘ 10:15:24 Inference: starting Ollama...                                 β•‘
β•‘ 10:15:26 Inference: ready on :11434                                    β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
Press Ctrl+C to quit

Quick Start

Prerequisites

  • Go 1.21+ (for building the agent)
  • Tailscale account and network
  • Ollama (for Mac inference)

1. Build the Agent

cd backend/beta9
go build ./cmd/b9agent/...

2. Initialize Configuration

./b9agent init \
  --gateway <GATEWAY_TAILSCALE_IP>:1994 \
  --token <MACHINE_TOKEN> \
  --pool external

3. Start the Agent

./b9agent

4. Test Inference (from remote machine)

# Test inference pipeline
TEST_MODEL=llama3.2 ./backend/remote_servers/scripts/dgpu/test_inference.sh

Configuration

Agent configuration is stored in ~/.b9agent/config.yaml:

gateway:
  host: "100.72.101.23"
  port: 1994
machine:
  id: "1c1b50c8"
  token: "<machine-token>"
  hostname: "100.100.74.117"
pool: "external"
k3s:
  token: "<k3s-bearer-token>"

Python SDK

The inference module provides a lightweight client for inference endpoints:

from beta9 import inference

# Configure endpoint
inference.configure(host="100.100.74.117", port=11434)

# Chat completion
result = inference.chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(result.content)

# Text generation
result = inference.generate(
    model="llama3.2",
    prompt="Once upon a time"
)

# Embeddings
embedding = inference.embed(
    model="nomic-embed-text",
    input="Hello world"
)

# List models
models = inference.list_models()

Testing

Run the inference test suite:

# Default model (llama3.2)
./backend/remote_servers/scripts/dgpu/test_inference.sh

# Custom model
TEST_MODEL=gemma3:1b ./backend/remote_servers/scripts/dgpu/test_inference.sh

# Custom host
BETA9_INFERENCE_HOST=100.100.74.117 ./backend/remote_servers/scripts/dgpu/test_inference.sh

Test output:

[0/6] Sending start-inference command to agent... βœ“
[1/6] Testing health endpoint...                  βœ“
[2/6] Checking model availability...              βœ“
[3/6] Testing chat via curl...                    βœ“
[4/6] Testing Python SDK...                       βœ“
[5/6] Testing latency (3 requests)...             βœ“
[6/6] Stopping inference server...                βœ“

Roadmap

Phase 1: Foundation (Current)

  • External worker registration and keepalive
  • Apple Silicon (MPS) inference via Ollama
  • Control API for inference lifecycle
  • TUI dashboard with inference status and logs
  • Python SDK for inference

Phase 2: Multi-Backend Inference

  • vLLM integration for NVIDIA GPUs
  • SGLang integration for structured outputs
  • Model routing based on hardware capabilities
  • Automatic model format conversion (GGUF/Safetensors)

Phase 3: Production Hardening

  • Prometheus metrics export
  • Carbon footprint tracking
  • Rate limiting and quotas
  • Multi-tenant isolation

Phase 4: Enterprise Features

  • EU AI Act compliance reporting
  • Model provenance tracking
  • Audit logging
  • SSO integration

Project Structure

beta9/
β”œβ”€β”€ cmd/
β”‚   └── b9agent/           # Go agent binary
β”‚       └── main.go
β”œβ”€β”€ pkg/
β”‚   └── agent/
β”‚       β”œβ”€β”€ agent.go       # Agent lifecycle management
β”‚       β”œβ”€β”€ control.go     # HTTP control API
β”‚       β”œβ”€β”€ inference.go   # OllamaManager and inference types
β”‚       β”œβ”€β”€ state.go       # Agent state for TUI
β”‚       └── tui.go         # Terminal UI rendering
β”œβ”€β”€ sdk/
β”‚   └── src/
β”‚       └── beta9/
β”‚           └── inference.py  # Python inference SDK
└── docs/
    └── external-workers/     # External worker documentation

Related Projects

License

This fork maintains the same AGPL-3.0 license as the upstream beta9 project.

Contributing

Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.

Acknowledgments

Security & Networking

Beta9 uses Tailscale for secure mesh networking between components.

  • Network Isolation: All traffic between Gateway, Workers, and Clients travels over an encrypted WireGuard mesh.
  • Endpoint Security:
    • Internal management endpoints (inference control, keepalives) are bound to 0.0.0.0 but are effectively protected because the nodes are only reachable via the private Tailscale network.
    • The Gateway uses hostNetwork: true to expose these services directly to the mesh.
    • Note: Do not expose the Gateway's ports (1993, 1994) to the public internet. Access should only be possible via the Tailscale mesh or a secure ingress.

Known Limitations & Roadmap

The following security enhancements are planned for the upcoming Security Epic but are currently mitigated by the network architecture:

  1. Inference Endpoint Auth (beta9-b1j): Direct inference endpoints (port 11434/8000) currently do not require per-request authentication.
    • Mitigation: These ports are only accessible within the encrypted Tailscale mesh. Workers are isolated from the public internet.
  2. RBAC Scoping (beta9-7aw): The k3d manifest uses broad ClusterRole permissions for simplicity during beta.
    • Mitigation: The cluster is intended for single-tenant use. Scoped RBAC profiles will be introduced in Phase 3.
  3. Token Binding (beta9-bx0): Keepalive tokens are not strictly bound to machine IDs in the current implementation.
    • Mitigation: Tokens are secret and transmitted over encrypted channels. Token binding will be enforced in the next auth refactor.

About

Ultrafast serverless GPU inference, sandboxes, and background jobs

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 74.1%
  • Python 23.5%
  • HCL 1.3%
  • Shell 0.6%
  • Makefile 0.3%
  • Go Template 0.1%
  • Other 0.1%