Recursive Language Models (RLMs) Implementation

This repository contains a proof of concept implementation of Recursive Language Models (RLMs) based on the paper "Recursive Language Models" by Alex L. Zhang, Tim Kraska, and Omar Khattab (https://arxiv.org/pdf/2512.24601). This implementation is also inspired by code in https://github.com/alexzhang13/rlm-minimal.

Overview

This implementation provides Recursive Language Model (RLM) system that allows LLMs to process arbitrarily long prompts through inference-time scaling by treating the prompt as part of an external environment. The system enables LLMs to programmatically examine, decompose, and recursively call themselves over snippets of the prompt.

Core Components

1. RLM Base Class (`rlm/rlm.py`)

Abstract base class defining the RLM interface
Methods: completion(), cost_summary(), reset()

2. RLM with REPL (`rlm/rlm_repl.py`)

Main RLM implementation using REPL environment
Context stored externally in REPL, not passed to model directly
Support for both root_model and sub_model
Cost tracking for root and sub-LLM calls
Iterative interaction loop with termination conditions

3. REPL Environment (`rlm/repl.py`)

Python execution sandbox
llm_query(prompt) function for recursive calls
State persistence across iterations
Output capture (stdout/stderr)
Variable management for intermediate results

4. Utilities (`rlm/utils/`)

llm.py: LLM client wrapper with cost tracking
prompts.py: System prompts from paper Appendix D
tracing.py: Detailed logging and tracing system

Key Features

Context Management

Context stored as external variable in REPL environment
Metadata provided to LLM about context size and structure
Enables processing of contexts beyond model context window

Recursive Sub-Calls

llm_query function allows recursive LLM calls
Separate tracking of root and sub-LLM costs
Enables chunking and processing of large contexts

Code Execution

Python code execution in REPL environment
Ability to examine, filter, and decompose context programmatically
State persistence across iterations

Final Answer Handling

FINAL() and FINAL_VAR() functions for terminating responses
Proper termination condition checking

Cost Tracking

Separate tracking of root and sub-LLM costs
Token counting for both input and output
Call counting for analysis

Configurable API Endpoint

Support for RLM_API_URL environment variable
Automatic model selection from available models endpoint
Default fallback to http://localhost:8080/v1

Improved Configuration Options

max_iterations: Maximum number of root LLM iterations before timeout (default: 20)
max_output_length: Maximum length of REPL output before truncation (default: 500,000 chars)
When max iterations reached, returns None instead of forcing an answer to align with paper's natural convergence
Increased output length limit to reduce impact on long-output tasks

Installation

Clone the repository:

git clone https://github.com/fullstackwebdev/rlm_repl
cd rlm_repl

Install dependencies (if any):

# This implementation uses standard Python libraries
# No additional dependencies required

Set up your local LLM server (e.g., using llama.cpp server, vLLM, etc.)

Configuration

The implementation supports configurable API endpoints:

Set the RLM_API_URL environment variable to point to your LLM server:

export RLM_API_URL="http://your-llm-server:port/v1"

If not set, the system will default to http://localhost:8080/v1
The system will automatically detect available models from the /models endpoint and use the first available model if none is specified.

Usage

Basic Usage

from rlm.rlm_repl import RLM_REPL

# Create RLM instance
rlm = RLM_REPL(
    model="auto",  # Automatically selects first available model
    recursive_model="auto",  # Automatically selects first available model
    max_iterations=10
)

# Process long context
result = rlm.completion(
    context="Very long context...",
    query="What is the answer to the question?"
)

# Get cost summary
costs = rlm.cost_summary()
print(f"Total cost: ${costs['total_cost']:.4f}")

With Specific Model

from rlm.rlm_repl import RLM_REPL

# Create RLM instance with specific model
rlm = RLM_REPL(
    model="Qwen3-Coder-REAP-25B-A3B.Q5_K_M.gguf",
    recursive_model="Qwen3-Coder-REAP-25B-A3B.Q5_K_M.gguf",
    max_iterations=10,
    max_output_length=500000  # Characters before truncation
)

# Process long context
result = rlm.completion(
    context="Very long context...",
    query="What is the answer to the question?"
)

# Note: result may be None if max_iterations reached without finding final answer
if result is None:
    print("RLM reached max iterations without finding a final answer")
else:
    print(f"Result: {result}")

Performance Characteristics

Handles contexts significantly beyond model context windows
Comparable or better quality than base LLMs and common long-context scaffolds
Comparable or cheaper cost per query compared to alternatives
Maintains strong performance as context length and task complexity increase

References

Zhang, A. L., Kraska, T., & Khattab, O. (2025). Recursive Language Models. arXiv preprint arXiv:2512.24601.
https://arxiv.org/pdf/2512.24601
https://github.com/alexzhang13/rlm-minimal

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
rlm		rlm
.gitignore		.gitignore
README.md		README.md
debug_test.log		debug_test.log
debug_test.py		debug_test.py
simple_verify_rlm.log		simple_verify_rlm.log
simple_verify_rlm.py		simple_verify_rlm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Recursive Language Models (RLMs) Implementation

Table of Contents

Overview

Core Components

1. RLM Base Class (`rlm/rlm.py`)

2. RLM with REPL (`rlm/rlm_repl.py`)

3. REPL Environment (`rlm/repl.py`)

4. Utilities (`rlm/utils/`)

Key Features

Context Management

Recursive Sub-Calls

Code Execution

Final Answer Handling

Cost Tracking

Configurable API Endpoint

Improved Configuration Options

Installation

Configuration

Usage

Basic Usage

With Specific Model

Performance Characteristics

References

About

Uh oh!

Releases

Packages

Languages

fullstackwebdev/rlm_repl

Folders and files

Latest commit

History

Repository files navigation

Recursive Language Models (RLMs) Implementation

Table of Contents

Overview

Core Components

1. RLM Base Class (rlm/rlm.py)

2. RLM with REPL (rlm/rlm_repl.py)

3. REPL Environment (rlm/repl.py)

4. Utilities (rlm/utils/)

Key Features

Context Management

Recursive Sub-Calls

Code Execution

Final Answer Handling

Cost Tracking

Configurable API Endpoint

Improved Configuration Options

Installation

Configuration

Usage

Basic Usage

With Specific Model

Performance Characteristics

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. RLM Base Class (`rlm/rlm.py`)

2. RLM with REPL (`rlm/rlm_repl.py`)

3. REPL Environment (`rlm/repl.py`)

4. Utilities (`rlm/utils/`)

Packages