Geist Backend - Docker Setup Guide

Overview

This backend provides an AI chat API using FastAPI as the router and llama.cpp for inference. It includes OpenAI Harmony format support for improved GPT-OSS model responses.

Architecture

[Client] → [Router:8000] → [Inference:8080]
             ↓                    ↓
         FastAPI API         llama.cpp server
         + Harmony           + GPT-OSS model

Quick Start

🍎 Apple Silicon (Recommended for Local Development)

Docker has significant performance limitations on Apple Silicon. Use the local development script instead:

cd backend

# Start both services with full GPU acceleration (~15x faster than Docker)
./start-local-dev.sh

# Test the setup
./test-local-dev.sh

# Stop services: Press Ctrl+C in the start script terminal

🐳 Docker (Other Platforms)

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

2. Test the API

# Health check
curl http://localhost:8000/health

# Chat endpoint
curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello, how are you?"}'

Services

Router Service (Port 8000)

Purpose: API gateway and request handler
Features:
- OpenAI Harmony format support
- Request routing to inference service
- Response parsing and cleanup
Endpoints:
- GET /health - Health check
- POST /api/chat - Chat completion

Inference Service (Port 8080)

Purpose: Runs the GPT-OSS 20B model
Engine: llama.cpp server
Model: gpt-oss-20b-Q4_K_S.gguf (quantized)
Endpoints:
- /v1/completions - Used with Harmony format
- /v1/chat/completions - Standard chat format

Configuration

Environment Variables

Router Service

HARMONY_ENABLED - Enable/disable Harmony format (default: true)
HARMONY_REASONING_EFFORT - Reasoning depth: low/medium/high (default: low)
INFERENCE_URL - Inference service URL (default: http://inference:8080)
LOG_LEVEL - Logging level (default: DEBUG)

Inference Service

MODEL_PATH - Path to GGUF model file
HOST - Server host (default: 0.0.0.0)
PORT - Server port (default: 8080)
CONTEXT_SIZE - Context window size (default: 4096)
THREADS - CPU threads (0 = auto)
GPU_LAYERS - GPU layers for acceleration (0 = CPU only)

Development Workflow

Rebuild After Code Changes

# Stop services
docker-compose down

# Rebuild router (after code changes)
docker-compose build router

# Restart services
docker-compose up -d

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f router
docker-compose logs -f inference

Clean Restart

# Complete cleanup and restart
docker-compose down -v
docker-compose build --no-cache
docker-compose up -d

Harmony Format

The backend uses OpenAI Harmony format to improve GPT-OSS model responses:

Enabled: Model receives structured conversation context
Disabled: Standard chat completion format

Harmony provides:

Better reasoning with analysis channels
Cleaner final responses
Mobile-optimized brevity
Structured token handling

Troubleshooting

Container Won't Start

# Check logs
docker-compose logs router
docker-compose logs inference

# Verify health
docker-compose ps

Model Loading Issues

# Check if model file exists
docker-compose exec inference ls -la /models/

# Check inference logs
docker-compose logs inference | grep -i error

Harmony Import Errors

# Rebuild router with fresh dependencies
docker-compose build --no-cache router

Port Conflicts

# Check if ports are in use
lsof -i :8000
lsof -i :8080

# Use different ports in docker-compose.yml

Testing Harmony

With Harmony (default)

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Explain quantum computing in one sentence"}'

Without Harmony

# Modify docker-compose.yml: HARMONY_ENABLED=false
docker-compose up -d router

# Test again
curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Explain quantum computing in one sentence"}'

Project Structure

backend/
├── docker-compose.yml      # Service orchestration
├── router/
│   ├── Dockerfile         # Router container
│   ├── main.py           # FastAPI application
│   ├── harmony_service.py # Harmony format handler
│   ├── config.py         # Configuration
│   └── pyproject.toml    # Dependencies
└── inference/
    ├── Dockerfile         # Inference container
    └── model/            # Model files (mounted)

Next Steps

Status

4CST in CET doesnt work in prod but in dev
Follow up on exercise works in prod but not in dev
I guess it has something to do with harmony main prompt

Name		Name	Last commit message	Last commit date
Latest commit History 309 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.editorconfig		.editorconfig
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Geist_API_Collection.postman_collection.json		Geist_API_Collection.postman_collection.json
HelpfulDockerCommands.md		HelpfulDockerCommands.md
MEMORY_SYSTEM_LOCAL.md		MEMORY_SYSTEM_LOCAL.md
README.md		README.md
STT_TEST_CONVERSATION.md		STT_TEST_CONVERSATION.md
TESTING_GUIDE.md		TESTING_GUIDE.md
TESTING_SETUP.md		TESTING_SETUP.md
pyrightconfig.json		pyrightconfig.json
test_memory_integration.py		test_memory_integration.py
test_memory_logic.py		test_memory_logic.py

geistlabs/geistai

Folders and files

Latest commit

History

Repository files navigation