A production-ready FastAPI with Pydantic AI backend service that provides OpenAI-powered chat completions with streaming support, designed for seamless integration with the Vercel AI SDK.
Mamba Server wraps OpenAI models using Pydantic AI's Agent-based architecture, delivering real-time streaming responses via Server-Sent Events (SSE). Built with a layered architecture and streaming-first design, it provides a robust foundation for AI-powered chat applications.
- Streaming Chat Completions - Real-time responses via Server-Sent Events (SSE)
- Vercel AI SDK Compatible - Native support for Vercel AI SDK message format
- Mamba Agents Integration - Route requests to specialized pre-configured agents (research, code review)
- Display Tools - 4 built-in tools:
generateForm,generateChart,generateCode,generateCard - Flexible Authentication - Support for none, API key, or JWT authentication
- Kubernetes Ready - Health checks, liveness/readiness probes, and Helm-ready manifests
- Resilient - Exponential backoff retry for OpenAI API calls
- Multi-source Configuration - Environment variables, env file, and YAML file support
- Python 3.12 or higher
- UV package manager (recommended) or pip
- OpenAI API key
# Clone the repository
git clone https://github.com/your-org/mamba-server.git
cd mamba-server
# Install dependencies with UV
uv sync
# Or with pip
pip install -e .# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key"
# Start the server
uv run uvicorn mamba.main:app --reload
# Server runs at http://localhost:8000Mamba Server supports multiple configuration sources with the following precedence (highest to lowest):
- Environment variables with
MAMBA_prefix ~/mamba.envfile (user home directory)config.local.yaml(optional, git-ignored)config/config.yaml- Code defaults
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key (required) | - |
OPENAI_API_BASE_URL |
Custom OpenAI API base URL | https://api.openai.com/v1 |
MAMBA_SERVER__HOST |
Server bind address | 0.0.0.0 |
MAMBA_SERVER__PORT |
Server port | 8000 |
MAMBA_OPENAI__DEFAULT_MODEL |
Default OpenAI model | gpt-4o |
MAMBA_AUTH__MODE |
Auth mode: none, api_key, jwt |
none |
MAMBA_LOGGING__LEVEL |
Log level: DEBUG, INFO, WARNING, ERROR |
INFO |
Use __ as the nested delimiter (e.g., MAMBA_OPENAI__TIMEOUT_SECONDS).
Create a ~/mamba.env file in your home directory for persistent settings:
OPENAI_API_KEY=your-api-key
MAMBA_AUTH__MODE=api_keyCreate a config.local.yaml in the project root for local overrides:
server:
host: "127.0.0.1"
port: 8000
openai:
default_model: "gpt-4o"
auth:
mode: "api_key"
api_keys:
- key: "your-secret-key"
name: "dev-key"| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Streaming chat completions via SSE (supports agent param) |
/title/generate |
POST | Generate conversation titles |
/models |
GET | List available models |
/health |
GET | Full health check (dependencies included) |
/health/live |
GET | Liveness probe for Kubernetes |
/health/ready |
GET | Readiness probe for Kubernetes |
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'Request Body:
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"model": "gpt-4o"
}Response: Server-Sent Events stream with Vercel AI SDK format.
Once running, access the auto-generated API docs:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Mamba Server supports routing requests to specialized pre-configured agents via the agent parameter. This allows leveraging purpose-built agents with their own tool sets and system prompts.
| Agent | Purpose | Tools |
|---|---|---|
research |
Information gathering and synthesis | Search tools |
code_review |
Code analysis and quality assessment | Complexity metrics tools |
Include the agent parameter in your chat completion request:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Research the latest trends in AI"}
],
"model": "openai/gpt-4o",
"agent": "research"
}'agent: nullor omitted - Uses standard ChatAgent flow (backward compatible)agent: "research"- Routes to research agent (ignorestoolsparameter)agent: "code_review"- Routes to code review agent- Invalid agent name - Returns an error event with list of available agents
When using Mamba Agents, the tools parameter in the request is ignored as each agent comes with its own pre-configured tool set.
# Install with dev dependencies
uv sync --all-extras
# Or with pip
pip install -e ".[dev]"# Run linter
uv run ruff check src tests
# Run formatter
uv run ruff format src tests
# Auto-fix issues
uv run ruff check --fix src tests# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=mamba
# Run specific test file
uv run pytest tests/unit/core/test_agent.py -v# Build the image
docker build -t mamba-server:latest .
# Run the container
docker run -p 8000:8000 \
-e OPENAI_API_KEY="your-api-key" \
mamba-server:latestThe Dockerfile uses a multi-stage build with Python 3.12-slim and runs as a non-root user (mamba:1000).
Kubernetes manifests are provided in the /k8s/ directory:
# Apply manifests
kubectl apply -f k8s/
# Or use kustomize
kubectl apply -k k8s/Health probes are pre-configured:
- Liveness:
/health/live - Readiness:
/health/ready
mamba-server/
├── src/mamba/
│ ├── api/ # HTTP layer
│ │ ├── handlers/ # Endpoint handlers (chat.py, health.py, models.py, title.py)
│ │ ├── deps.py # FastAPI dependency injection
│ │ └── routes.py # Route registration
│ ├── core/ # Business logic
│ │ ├── agent.py # ChatAgent - Pydantic AI wrapper
│ │ ├── mamba_agent.py # Mamba Agents framework adapter
│ │ ├── streaming.py # SSE encoding, timeout handling
│ │ ├── messages.py # Message format conversion
│ │ ├── tools.py # Tool definitions (forms, charts, code, cards)
│ │ ├── tool_schema.py # OpenAI function format conversion
│ │ └── title_utils.py # Title processing utilities
│ ├── middleware/ # Request processing chain
│ │ ├── auth.py # Auth modes: none, api_key, jwt
│ │ ├── logging.py # Structured request/response logging
│ │ └── request_id.py # X-Request-ID propagation
│ ├── models/ # Pydantic schemas
│ │ ├── events.py # StreamEvent discriminated union
│ │ ├── request.py # ChatCompletionRequest, UIMessage
│ │ ├── response.py # ModelsResponse
│ │ ├── health.py # HealthResponse, ComponentHealth
│ │ └── title.py # TitleGenerationRequest/Response
│ ├── utils/ # Utilities
│ │ ├── errors.py # ErrorCode enum, error classification
│ │ └── retry.py # @with_retry decorator (exponential backoff)
│ ├── config.py # Settings management (multi-source)
│ └── main.py # FastAPI app factory, middleware setup
├── tests/
│ ├── unit/ # Unit tests (25 test files)
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── k8s/ # Kubernetes manifests
│ ├── deployment.yaml # Deployment with health probes
│ ├── service.yaml # Service definition
│ ├── configmap.yaml # Configuration
│ ├── secrets.yaml.example # Secret template
│ ├── pdb.yaml # Pod disruption budget
│ └── kustomization.yaml # Kustomize config
├── config/
│ └── config.yaml # Default configuration
├── Dockerfile # Multi-stage production build
└── pyproject.toml # Project configuration
| Category | Technologies |
|---|---|
| Language | Python 3.12+ |
| Framework | FastAPI 0.115+, Pydantic 2.10+ |
| AI | Pydantic AI 0.0.49+, OpenAI |
| HTTP | httpx 0.28+, Uvicorn 0.32+ |
| Testing | pytest, pytest-asyncio, respx |
| Linting | Ruff |
| Build | Hatchling, UV |
Mamba Server uses a Layered/N-Tier architecture with a streaming-first design:
┌─────────────────────────────────────────────────────────────┐
│ Client Request │
└────────────────────────────┬────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ Middleware Chain │
│ CORS → RequestID → Logging → Auth │
└────────────────────────────┬────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ API Layer │
│ (handlers, routes, deps) │
└────────────────────────────┬────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ Core Layer │
│ (ChatAgent, MambaAgentAdapter, streaming, tools) │
└────────────────────────────┬────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────┐
│ External APIs │
│ (OpenAI via pydantic-ai) │
└─────────────────────────────────────────────────────────────┘
Key Design Patterns:
- Factory Pattern -
create_app(),create_agent(),create_streaming_response()for testability - Discriminated Unions - Type-safe
StreamEventandMessagePartwith Pydantic discriminators - Adapter Pattern - Message format conversion between UI and OpenAI formats; MambaAgentAdapter for framework integration
- Decorator Pattern -
@with_retry()for exponential backoff on failures - Dependency Injection - FastAPI
Depends()withAnnotatedtypes
This project is licensed under the MIT License - see the LICENSE file for details.