CCProxy

Enterprise-Grade OpenAI-Compatible Proxy for Cost Optimization and Operational Excellence

Executive Summary

CCProxy is a high-performance, production-ready proxy service that enables organizations to leverage multiple AI model providers through a unified OpenAI-compatible interface. By implementing intelligent caching, request deduplication, and provider abstraction, CCProxy delivers significant cost reductions while maintaining enterprise-grade reliability and security.

Business Value

Cost Optimization

Recent analytics demonstrate substantial cost disparities across AI model providers. OpenAI GPT-5 and xAI Grok models offer significantly better cost efficiency compared to Anthropic Claude Opus 4.1 (approximately $11.25 versus $90 per million combined input/output tokens). CCProxy addresses this challenge by:

Eliminating Duplicate API Calls: Intelligent caching prevents redundant requests
Optimizing Transport Efficiency: HTTP/2 and connection pooling reduce overhead
Enabling Provider Flexibility: Seamlessly switch between providers without code changes
Standardizing Integration: Single API interface for all supported models

Pricing Overview

Model	Input Tokens ($/1M)	Output Tokens ($/1M)
OpenAI: GPT‑5	$1.25	$10.00
Anthropic: Claude Opus 4.1	$15.00	$75.00
Anthropic: Claude Sonnet 4.5 (≤200K)	$3.00	$15.00
Anthropic: Claude Sonnet 4.5 (>200K)	$6.00	$22.50
xAI: Grok 4 Fast (<=128K)	$0.20	$0.50
xAI: Grok 4 Fast (>128K)	$0.50	$1.00

GPT‑5 input and output rates are confirmed via Wired, OpenAI's own API pricing page, and TechCrunch
Claude Opus 4.1 pricing is stated directly on Anthropic's API pricing page
Claude Sonnet 4.5 has tiered pricing based on context length (≤200K tokens vs >200K tokens)
Grok Code Fast 1 pricing is from xAI's official OpenRouter listing
Grok 4 Fast pricing is from xAI's official OpenRouter listing

Model Token Limits

CCProxy enforces maximum output token limits for supported models:

Model	Context Window	Max Output Tokens
o3	200,000	100,000
o3-2025-04-16	200,000	100,000
o4-mini	128,000	100,000
gpt-5-2025-08-07	400,000	128,000
gpt-5	400,000	128,000
gpt-5-mini-2025-08-07	400,000	128,000
gpt-5-mini	400,000	128,000
deepseek-reasoner	163,840	65,536
deepseek-chat	163,840	8,192
x-ai/grok-code-fast-1	256,000	10,000
x-ai/grok-4-fast	2,000,000	30,720

Note: Models not listed in this table use their default maximum output token limits.

Performance Characteristics

CCProxy includes high-performance HTTP client optimizations for faster OpenAI API communication:

HTTP/2 Support: Enabled by default for request multiplexing
Enhanced Connection Pooling: 50 keepalive connections, 500 max connections
Compression: Supports gzip, deflate, and Brotli
Smart Retries: Automatic retry with exponential backoff
Response Caching: Prevents duplicate API calls and handles timeouts
Async Processing: Full async/await architecture with ThreadPoolExecutor for CPU-bound operations
Parallel Message Conversion: Concurrent processing of message batches for reduced latency
Non-blocking I/O: Async streaming with httpx for improved throughput

Performance Improvements

30-50% faster single request latency
2-3x better throughput for concurrent requests
Reduced connection overhead with persistent connections
40% reduction in message conversion time via async parallelization
Near-zero blocking on I/O operations with full async pipeline

See HTTP_OPTIMIZATION.md for details.

Architecture

CCProxy follows Clean Architecture (Hexagonal Architecture) principles with clear separation of concerns:

graph TB
    subgraph "External Clients"
        Client[Claude Code / API Clients]
    end

    subgraph "Interface Layer"
        HTTP[FastAPI HTTP Interface]
        Routes[Route Handlers]
        MW[Middleware Chain]
        Stream[SSE Streaming]
        Guard[Input Guardrails]
    end

    subgraph "Application Layer"
        Conv[Message Converters]
        Cache[Response Cache]
        Token[Tokenizer Service]
        Model[Model Selection]
        Valid[Request Validator]
        Error[Error Tracker]
    end

    subgraph "Domain Layer"
        DModel[Domain Models]
        DExc[Domain Exceptions]
        BLogic[Core Business Logic]
    end

    subgraph "Infrastructure Layer"
        Provider[Provider Abstraction]
        OpenAI[OpenAI Provider]
        HTTP2[HTTP/2 Client]
        Pool[Connection Pool]
    end

    subgraph "External Services"
        OAPI[OpenAI API]
        OR[OpenRouter API]
    end

    subgraph "Configuration & Monitoring"
        Config[Settings/Config]
        Log[JSON Logging]
        Monitor[Metrics & Health]
    end

    Client -->|Anthropic Messages API| HTTP
    HTTP --> Routes
    Routes --> MW
    MW --> Guard
    Guard --> Conv

    Conv --> Cache
    Conv --> Token
    Conv --> Model
    Conv --> Valid
    Conv --> Error

    Conv --> DModel
    Error --> DExc
    Model --> BLogic

    Conv --> Provider
    Provider --> OpenAI
    OpenAI --> HTTP2
    HTTP2 --> Pool
    Pool --> OAPI
    Pool --> OR

    Routes --> Stream
    Stream --> Provider

    Config -.->|Inject| HTTP
    Log -.->|Track| MW
    Monitor -.->|Observe| Cache
    Monitor -.->|Health| HTTP

    style Client fill:#e1f5fe
    style HTTP fill:#fff3e0
    style Routes fill:#fff3e0
    style MW fill:#fff3e0
    style Stream fill:#fff3e0
    style Guard fill:#fff3e0
    style Conv fill:#f3e5f5
    style Cache fill:#f3e5f5
    style Token fill:#f3e5f5
    style Model fill:#f3e5f5
    style Valid fill:#f3e5f5
    style Error fill:#f3e5f5
    style DModel fill:#e8f5e9
    style DExc fill:#e8f5e9
    style BLogic fill:#e8f5e9
    style Provider fill:#fce4ec
    style OpenAI fill:#fce4ec
    style HTTP2 fill:#fce4ec
    style Pool fill:#fce4ec
    style OAPI fill:#ffebee
    style OR fill:#ffebee
    style Config fill:#f5f5f5
    style Log fill:#f5f5f5
    style Monitor fill:#f5f5f5

Layer Responsibilities

🎯 Domain Layer (`ccproxy/domain/`)

Core Business Logic: Pure business rules independent of external concerns
Domain Models: Core entities and data structures
Domain Exceptions: Business-specific error handling

🔧 Application Layer (`ccproxy/application/`)

Use Cases: Orchestrates domain logic and infrastructure
Message Conversion: Anthropic ↔ OpenAI format translation
Caching Strategy: Response caching with de-duplication
Token Management: Async token counting with TTL cache (300s)
Model Mapping: Routes requests to appropriate models (Opus/Sonnet→BIG, Haiku→SMALL)
Request Validation: Cryptographic hashing with LRU cache (10k capacity)

🌐 Infrastructure Layer (`ccproxy/infrastructure/`)

Provider Integration: OpenAI/OpenRouter API communication
HTTP/2 Client: High-performance connection pooling (500 connections, 120s keepalive)
Circuit Breaker: Fault tolerance and resilience patterns
External Services: Handles all third-party integrations

📡 Interface Layer (`ccproxy/interfaces/`)

HTTP API: FastAPI application with dependency injection
Route Handlers: Request/response processing
SSE Streaming: Real-time response streaming
Middleware: Request tracing, logging, error handling
Input Validation: Security guardrails and sanitization

⚙️ Cross-Cutting Concerns

Configuration: Environment-based settings with Pydantic validation
Logging: Structured JSON logging with request correlation
Monitoring: Performance metrics, health checks, cache statistics
Error Tracking: Centralized error monitoring and alerting

Quickstart (uv + .env + Uvicorn)

Create your environment file from the template:

cp .env.example .env
# edit .env to set OPENAI_API_KEY, BIG_MODEL_NAME, SMALL_MODEL_NAME

Install Python dependencies into an isolated environment using uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv
. .venv/bin/activate
uv pip install -r requirements.txt

Start the server (pure Python with Uvicorn):

./run-ccproxy.sh

For local development, you can set IS_LOCAL_DEPLOYMENT=True in your .env file to use a single worker process for reduced resource usage.

Point your Anthropic client at the proxy:

export ANTHROPIC_BASE_URL=http://localhost:11434

Then start your coding session with Claude Code:

claude

Environment Variables

Required Variables

OPENAI_API_KEY: Your OpenAI API key (or use OPENROUTER_API_KEY)
BIG_MODEL_NAME: The OpenAI model to use for large Anthropic models (e.g., gpt-5-2025-08-07)
SMALL_MODEL_NAME: The OpenAI model to use for small Anthropic models (e.g., gpt-5-mini-2025-08-07)

Optional Variables

IS_LOCAL_DEPLOYMENT: Set to True to use a single worker process for local development (default: False)
HOST: Server host (default: 127.0.0.1)
PORT: Server port (default: 11434)
LOG_LEVEL: Logging level (default: INFO)
OPENAI_BASE_URL: OpenAI API base URL (default: https://api.openai.com/v1)

Author

CCWorkforce Engineers

Name		Name	Last commit message	Last commit date
Latest commit History 495 Commits
benchmarks		benchmarks
ccproxy		ccproxy
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.deepseek		.env.deepseek
.env.example		.env.example
.env.openai		.env.openai
.env.openrouter		.env.openrouter
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
HTTP_OPTIMIZATION.md		HTTP_OPTIMIZATION.md
LICENSE		LICENSE
README.md		README.md
RUN_INSTRUCTIONS.md		RUN_INSTRUCTIONS.md
ccproxy.service		ccproxy.service
docker-compose-run.sh		docker-compose-run.sh
docker-compose.yml		docker-compose.yml
image.png		image.png
locustfile.py		locustfile.py
main.py		main.py
pyproject.toml		pyproject.toml
run-ccproxy.sh		run-ccproxy.sh
run-tests.sh		run-tests.sh
setup.py		setup.py
start-lint.sh		start-lint.sh
supervisor.conf		supervisor.conf
test.py		test.py
uv.lock		uv.lock
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CCProxy

Executive Summary

Business Value

Cost Optimization

Pricing Overview

Model Token Limits

Performance Characteristics

Performance Improvements

Architecture

Layer Responsibilities

🎯 Domain Layer (`ccproxy/domain/`)

🔧 Application Layer (`ccproxy/application/`)

🌐 Infrastructure Layer (`ccproxy/infrastructure/`)

📡 Interface Layer (`ccproxy/interfaces/`)

⚙️ Cross-Cutting Concerns

Quickstart (uv + .env + Uvicorn)

Environment Variables

Required Variables

Optional Variables

Author

About

Uh oh!

Releases 32

Packages

Languages

License

CCWorkforce/CCProxy

Folders and files

Latest commit

History

Repository files navigation

CCProxy

Executive Summary

Business Value

Cost Optimization

Pricing Overview

Model Token Limits

Performance Characteristics

Performance Improvements

Architecture

Layer Responsibilities

🎯 Domain Layer (ccproxy/domain/)

🔧 Application Layer (ccproxy/application/)

🌐 Infrastructure Layer (ccproxy/infrastructure/)

📡 Interface Layer (ccproxy/interfaces/)

⚙️ Cross-Cutting Concerns

Quickstart (uv + .env + Uvicorn)

Environment Variables

Required Variables

Optional Variables

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 32

Packages 0

Languages

🎯 Domain Layer (`ccproxy/domain/`)

🔧 Application Layer (`ccproxy/application/`)

🌐 Infrastructure Layer (`ccproxy/infrastructure/`)

📡 Interface Layer (`ccproxy/interfaces/`)

Packages