Lightweight, fast alternative to litellm - A minimal unified interface for LLM providers.
- 🪶 Lightweight: Only ~2MB in memory vs 200MB for litellm
- ⚡ Fast: Minimal dependencies, optimized for performance
- 🔌 Compatible: Drop-in replacement for litellm in most cases
- 🎯 Focused: Only essential providers (OpenAI, Anthropic, Groq, AWS Bedrock)
- 🛠️ Modern: Built with httpx, Pydantic v2, and modern Python practices
# Basic installation
pip install ullm
# With AWS Bedrock support
pip install ullm[aws]
# Development installation
pip install ullm[dev]Or with uv (recommended):
uv pip install ullmimport ullm
# Same API as litellm
response = ullm.completion(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Streaming
response = ullm.completion(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
# Async support
import asyncio
async def main():
response = await ullm.acompletion(
model="groq/llama-3.1-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())| Provider | Model Format | Example |
|---|---|---|
| OpenAI | openai/model-name |
openai/gpt-4o |
| Anthropic | anthropic/model-name |
anthropic/claude-3-5-sonnet-20241022 |
| Groq | groq/model-name |
groq/llama-3.1-70b-versatile |
| AWS Bedrock | bedrock/model-id |
bedrock/anthropic.claude-3-sonnet |
response = ullm.completion(
model="openai/gpt-4",
messages=[{"role": "user", "content": "What's the weather in SF?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
)from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
response = ullm.completion(
model="openai/gpt-4",
messages=[{"role": "user", "content": "Tell me about Alice, age 30"}],
response_format=Person
)
# Response automatically validated against Person schemaEnvironment variables:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION_NAME=us-east-1Or pass directly:
ullm.completion(
model="openai/gpt-4",
api_key="sk-...",
messages=[...]
)ullm is designed as a drop-in replacement for litellm's core functionality:
# litellm code
import litellm
response = litellm.completion(model="gpt-4", messages=[...])
# ullm code (same API)
import ullm
response = ullm.completion(model="gpt-4", messages=[...])Compatible APIs:
- ✅
completion()/acompletion() - ✅
responses()/aresponses()(OpenAI responses API) - ✅ Streaming support
- ✅ Tool calling
- ✅ Structured output with Pydantic
- ✅ Retry logic with exponential backoff
- ✅ Standard exception types
Not included (to keep it lightweight):
- ❌ 100+ providers (only 4 core providers)
- ❌ Built-in caching (use dspy's cache or your own)
- ❌ Proxy server mode
- ❌ Spend tracking/analytics
- ❌ Legacy text completion API
Memory usage comparison (Python 3.11):
litellm: ~200MB
ullm: ~2MB (100x smaller)
Import time:
litellm: ~1.2s
ullm: ~0.05s (24x faster)
📚 Full documentation available at silvestrid.github.io/ullm
import dspy
import ullm
# Configure DSPy to use ullm instead of litellm
# Simply replace the import - API is compatible
lm = dspy.LM(model="openai/gpt-4o-mini", temperature=0.7)
dspy.configure(lm=lm)# Clone the repo
git clone https://github.com/silvestrid/ullm.git
cd ullm
# Install with uv
uv venv
source .venv/bin/activate # or `.venv\Scripts\activate` on Windows
uv pip install -e ".[dev]"
# Run tests
pytest
# Lint and format
ruff check .
ruff format .
# Type check
mypy ullmContributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass and code is formatted
- Submit a pull request
MIT License - see LICENSE file for details
Inspired by litellm - we're grateful for their pioneering work in unified LLM interfaces. ullm aims to provide a more lightweight alternative for users who need core functionality without the overhead.