1 unstable release

Uses new Rust 2024

new 0.2.2	Dec 29, 2025
0.2.1	~~Dec 16, 2025~~

#502 in Machine learning

AGPL-3.0

625KB
16K SLoC

reflex-server

reflex-server owns the HTTP gateway (Axum) and builds the reflex binary.

It depends on the core library crate (reflex-cache) for the cache tiers, storage, embeddings, and vector DB integration.

Run (Dev)

# Requires Qdrant running (gRPC on 6334)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant

RUST_LOG=debug cargo run -p reflex-server

Run (Release)

cargo run -p reflex-server --release

Docker Compose

From repo root:

docker compose up -d

Endpoints

GET /healthz
GET /ready
POST /v1/chat/completions (OpenAI-compatible)

Response Status

Responses include X-Reflex-Status:

hit-l1-exact: exact request match
hit-l3-verified: semantic hit verified by L3
miss: forwarded to provider and stored

The response choices[].message.content is Tauq-encoded.

Configuration

Most commonly used env vars:

Variable	Default	Notes
`REFLEX_PORT`	`8080`	HTTP port
`REFLEX_BIND_ADDR`	`127.0.0.1`	Bind address
`REFLEX_QDRANT_URL`	`http://localhost:6334`	Qdrant gRPC
`REFLEX_STORAGE_PATH`	`./.data`	Storage base path
`REFLEX_L1_CAPACITY`	`10000`	L1 capacity
`REFLEX_MODEL_PATH`	(unset)	Unset = stub embedder
`REFLEX_RERANKER_PATH`	(unset)	Optional reranker
`REFLEX_RERANKER_THRESHOLD`	`0.70`	L3 threshold
`REFLEX_MOCK_PROVIDER`	(unset)	Set to bypass real provider calls

Point Your Agent

export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=sk-your-key

Python (openai SDK)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-your-key")
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain quicksort"}],
)

GPU Features

Build/run with one of:

--features metal (Apple Silicon)
--features cuda (NVIDIA)
--features cpu (docs.rs-style CPU flag; usually you can omit)

Example:

cargo run -p reflex-server --release --no-default-features --features metal

Tests

cargo test -p reflex-server

Real integration tests (needs Qdrant):

REFLEX_QDRANT_URL=http://localhost:6334 cargo test -p reflex-server --test integration_real

Dependencies

~64–89MB
~1.5M SLoC