Python implementation of Recursive Language Models for processing unbounded context lengths.
Based on the paper by Alex Zhang and Omar Khattab (MIT, 2025) | arXiv
RLM enables language models to process extremely long contexts (100k+ tokens) by:
- Storing context as a Python variable instead of in the prompt
- Allowing the LM to recursively explore and partition the context
- Avoiding "context rot" (performance degradation with long context)
Instead of this:
llm.complete(prompt="Summarize this", context=huge_document) # Context rot!RLM does this:
rlm = RLM(model="gpt-5-mini")
result = rlm.complete(
query="Summarize this",
context=huge_document # Stored as variable, not in prompt
)The LM can then peek, search, and recursively process the context adaptively.
Note: This package is not yet published to PyPI. Install from source:
# Clone the repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm
# Install in editable mode
pip install -e .
# Or install with dev dependencies
pip install -e ".[dev]"Future: Once published to PyPI, you'll be able to install with pip install recursive-llm
- Python 3.9 or higher
- An API key for your chosen LLM provider (OpenAI, Anthropic, etc.)
- Or a local model setup (Ollama, llama.cpp, etc.)
from rlm import RLM
# Initialize with any LLM
rlm = RLM(model="gpt-5-mini")
# Process long context
result = rlm.complete(
query="What are the main themes in this document?",
context=long_document
)
print(result)Set your API key via environment variable or pass it directly:
export OPENAI_API_KEY="sk-..." # or ANTHROPIC_API_KEY, etc.Or pass directly in code:
rlm = RLM(model="gpt-5-mini", api_key="sk-...")Works with 100+ LLM providers via LiteLLM:
# OpenAI
rlm = RLM(model="gpt-5")
rlm = RLM(model="gpt-5-mini")
# Anthropic
rlm = RLM(model="claude-sonnet-4")
rlm = RLM(model="claude-sonnet-4-20250514")
# Ollama (local)
rlm = RLM(model="ollama/llama3.2")
rlm = RLM(model="ollama/mistral")
# llama.cpp (local)
rlm = RLM(
model="openai/local",
api_base="http://localhost:8000/v1"
)
# Azure OpenAI
rlm = RLM(model="azure/gpt-4-deployment")
# And many more via LiteLLM...Use a cheaper model for recursive calls:
rlm = RLM(
model="gpt-5", # Root LM (main decisions)
recursive_model="gpt-5-mini" # Recursive calls (cheaper)
)For better performance with parallel recursive calls:
import asyncio
async def main():
rlm = RLM(model="gpt-5-mini")
result = await rlm.acomplete(query, context)
print(result)
asyncio.run(main())rlm = RLM(
model="gpt-5-mini",
max_depth=5, # Maximum recursion depth
max_iterations=20, # Maximum REPL iterations
# Optional LiteLLM params: temperature, timeout, etc.
)- Context is stored as a variable in a Python REPL environment
- Root LM gets only the query plus instructions
- LM can explore context using Python code:
# Peek at context context[:1000] # Search with regex import re re.findall(r'pattern', context) # Recursive processing recursive_llm("extract dates", context[1000:2000])
- Returns final answer via
FINAL(answer)statement
See the examples/ directory for complete working examples:
basic_usage.py- Simple complete with OpenAIollama_local.py- Using Ollama locallytwo_models.py- Cost optimization with two modelslong_document.py- Processing 50k+ token documentsdata_extraction.py- Extract structured data from textmulti_file.py- Process multiple documentscustom_config.py- Advanced configuration
Run an example:
# Set your API key first
export OPENAI_API_KEY="sk-..."
# Run example
python examples/basic_usage.pyOn OOLONG benchmark (132k tokens):
- GPT-5: baseline
- RLM(GPT-5-Mini): 33% better than GPT-5 at similar cost
Tested with GPT-5-Mini on structured data queries (counting, filtering) across 5 different test cases:
60k token contexts:
- RLM: 80% accurate (4/5 correct)
- Direct OpenAI: 0% accurate (0/5 correct, all returned approximations)
RLM wins on accuracy. Both complete requests, but only RLM gives correct answers.
150k+ token contexts:
- Direct OpenAI: Fails (rate limit errors)
- RLM: Works (processes 1M+ tokens successfully)
Token efficiency: RLM uses ~2-3k tokens per query vs 95k+ for direct approach, since context is stored as a variable instead of being sent in prompts.
# Clone repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ -v --cov=src/rlm --cov-report=term-missing
# Type checking
mypy src/rlm
# Linting
ruff check src/rlm
# Format code
black src/rlm tests examplesRLM
├── Core (async completion logic)
├── REPL Executor (safe code execution via RestrictedPython)
├── Prompt Builder (system prompts)
└── Parser (extract FINAL() answers)
Built on top of LiteLLM for universal LLM support.
- REPL execution is sequential (no parallel code execution yet)
- No prefix caching (future enhancement)
- Recursion depth is limited (configurable via
max_depth) - No streaming support yet
- Increase
max_iterationsparameter - Simplify your query
- Check if the model is getting stuck in a loop
- Set the appropriate environment variable (e.g.,
OPENAI_API_KEY) - Or pass
api_keyparameter to RLM constructor
- Check model name format for your provider
- See LiteLLM docs: https://docs.litellm.ai/docs/providers
- Make sure Ollama is running:
ollama serve - Pull a model first:
ollama pull llama3.2 - Use model format:
ollama/model-name
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass (
pytest tests/) - Follow code style (use
blackandruff) - Submit a pull request
This implementation is based on the RLM paper by Alex Zhang and Omar Khattab.
To cite this implementation:
@software{rlm_python,
title = {recursive-llm: Python Implementation of Recursive Language Models},
author = {Gvadzabia, Grisha},
year = {2025},
url = {https://github.com/ysz/recursive-llm}
}To cite the original paper:
@misc{zhang2025rlm,
title = {Recursive Language Models},
author = {Zhang, Alex and Khattab, Omar},
year = {2025},
month = {October},
url = {https://alexzhang13.github.io/blog/2025/rlm/},
eprint = {2512.24601},
archivePrefix = {arXiv}
}MIT License - see LICENSE file for details
Based on the Recursive Language Models paper by Alex Zhang and Omar Khattab from MIT CSAIL.
Built using:
- LiteLLM for universal LLM API support
- RestrictedPython for safe code execution
- Paper: https://alexzhang13.github.io/blog/2025/rlm/
- arXiv: https://arxiv.org/abs/2512.24601
- LiteLLM Docs: https://docs.litellm.ai/
- Issues: https://github.com/ysz/recursive-llm/issues