1 unstable release
Uses new Rust 2024
| new 0.2.2 | Dec 29, 2025 |
|---|---|
| 0.2.1 |
|
#502 in Machine learning
625KB
16K
SLoC
reflex-server
reflex-server owns the HTTP gateway (Axum) and builds the reflex binary.
It depends on the core library crate (reflex-cache) for the cache tiers, storage, embeddings, and vector DB integration.
Run (Dev)
# Requires Qdrant running (gRPC on 6334)
docker run -d -p 6334:6334 -p 6333:6333 qdrant/qdrant
RUST_LOG=debug cargo run -p reflex-server
Run (Release)
cargo run -p reflex-server --release
Docker Compose
From repo root:
docker compose up -d
Endpoints
GET /healthzGET /readyPOST /v1/chat/completions(OpenAI-compatible)
Response Status
Responses include X-Reflex-Status:
hit-l1-exact: exact request matchhit-l3-verified: semantic hit verified by L3miss: forwarded to provider and stored
The response choices[].message.content is Tauq-encoded.
Configuration
Most commonly used env vars:
| Variable | Default | Notes |
|---|---|---|
REFLEX_PORT |
8080 |
HTTP port |
REFLEX_BIND_ADDR |
127.0.0.1 |
Bind address |
REFLEX_QDRANT_URL |
http://localhost:6334 |
Qdrant gRPC |
REFLEX_STORAGE_PATH |
./.data |
Storage base path |
REFLEX_L1_CAPACITY |
10000 |
L1 capacity |
REFLEX_MODEL_PATH |
(unset) | Unset = stub embedder |
REFLEX_RERANKER_PATH |
(unset) | Optional reranker |
REFLEX_RERANKER_THRESHOLD |
0.70 |
L3 threshold |
REFLEX_MOCK_PROVIDER |
(unset) | Set to bypass real provider calls |
Point Your Agent
export OPENAI_BASE_URL=http://localhost:8080/v1
export OPENAI_API_KEY=sk-your-key
Python (openai SDK)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-your-key")
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain quicksort"}],
)
GPU Features
Build/run with one of:
--features metal(Apple Silicon)--features cuda(NVIDIA)--features cpu(docs.rs-style CPU flag; usually you can omit)
Example:
cargo run -p reflex-server --release --no-default-features --features metal
Tests
cargo test -p reflex-server
Real integration tests (needs Qdrant):
REFLEX_QDRANT_URL=http://localhost:6334 cargo test -p reflex-server --test integration_real
Dependencies
~64–89MB
~1.5M SLoC