Chainette is a tiny, typeβsafe way to compose LLM pipelines in Python.
- βοΈ β 2 k LOC core package β’ MIT
- π Works with any vLLM-served model (local vllm_local), OpenAI API, vLLM-Serve API, or Ollama β choose at runtime
- π Inputs & outputs are Pydantic models β no more brittle string parsing
- π― Automatic JSON guided decoding: the model must reply with the schema you declare
- ποΈ Filesystem first β every run leaves reproducible artefacts (
graph.json, step files, flattened view) - π₯οΈ Simple CLI:
warmup | run | kill | inspect
pip install chainette
pip install chainette[openai]
pip install chainette[ollama]
Define your models, register an engine, create steps, and build a chain:
from pydantic import BaseModel
from chainette import Step, Chain, SamplingParams, register_engine
# 1. Register an engine
register_engine(
name="llama3",
model="NousResearch/Meta-Llama-3-8B-Instruct",
dtype="bfloat16",
gpu_memory_utilization=0.6,
lazy=True, # only start when needed
)
# --- NEW: OpenAI HTTP backend (optional) ---
# Requires `pip install openai` and the env var `OPENAI_API_KEY`.
register_engine(
name="gpt4o",
model="gpt-4.1-mini", # or gpt-4o-2024-08-06
backend="openai",
endpoint="https://api.openai.com/v1", # default
)
# 1c. vLLM Serve HTTP backend
register_engine(
name="vllm_api",
model="gpt-4o-2024-08-06",
backend="vllm_api",
endpoint="http://localhost:8000/v1", # vllm --serve
)
# 1d. Ollama HTTP backend
register_engine(
name="qwen_local",
model="qwen2.5-instruct",
backend="ollama_api", # Requires `ollama serve`
)
# 2. Define input/output schemas
class Question(BaseModel):
text: str
class Answer(BaseModel):
answer: str
confidence: float
# 3. Create a step
qa_step = Step(
id="qa",
name="Question Answering",
input_model=Question,
output_model=Answer,
engine_name="llama3",
sampling=SamplingParams(temperature=0.2),
system_prompt="Answer questions accurately and concisely.",
user_prompt="Question: {{text}}",
)
# 4. Create a chain
chain = Chain(name="Simple QA", steps=[qa_step])
# 5. Run the chain
results = chain.run([
Question(text="What is the capital of France?"),
Question(text="How do transformers work?")
])A Step is a single LLM task with defined input and output models. Each step:
- Uses guided JSON decoding to ensure output follows your schema
- Handles batching for efficient processing
- Tracks ID, name, and other metadata for the run
A Chain executes a sequence of steps, handling:
- Data flow between steps
- Parallelism with branches
- Batching of inputs
- Output serialization
- Execution metadata
Use Branch for parallel workflows:
chain = Chain(
name="Translation Chain",
steps=[
extract_step,
[ # Parallel branches
Branch(name="fr", steps=[translate_french]),
Branch(name="es", steps=[translate_spanish]),
],
],
)Merge outputs from parallel branches back into the main flow with Branch.join(alias). The final output of each branch becomes accessible in later templates via the alias you provide:
fr_branch = Branch(name="fr", steps=[translate_french]).join("fr")
es_branch = Branch(name="es", steps=[translate_spanish]).join("es")
agg = Step(
id="agg",
input_model=Translation,
output_model=Summary,
engine_name="llama3",
sampling=SamplingParams(temperature=0),
system_prompt="Summarise both translations.",
user_prompt="FR: {{fr.translated}}\nES: {{es.translated}}",
)
chain = Chain(name="Translate & Summarise", steps=[[fr_branch, es_branch], agg])Inject pure Python functions with apply:
from chainette import apply
def filter_low_confidence(items):
return [item for item in items if item.confidence > 0.7]
chain = Chain(
name="QA with filtering",
steps=[qa_step, apply(filter_low_confidence)],
)Internally Chainette is now modelled as a directed acyclic graph:
inputs β Step/Apply β β¦ β Branch(es) β outputs
Key runtime components:
β’ Graph / GraphNode β tiny DATACLASS helpers to link nodes.
β’ Executor β walks the graph depth-first, handles batching & engine reuse.
β’ AsyncExecutor β same but with an async def run using anyio threads.
β’ EnginePool β LRU cache of live vLLM / Ollama engines.
β’ Result β wrapper object to propagate either a value or an error.
You still build chains exactly the same way; Chain.run() now proxies to the
executor under the hood, so existing code doesn't change.
Chainette now ships with a streaming execution Runner:
- Chunked batching via
Executor.run_iterβ constant-memory even on millions of rows. StreamWriterflushes each batch immediately and rolls files (000.jsonl,001.jsonl, β¦). Optional Parquet support (pyarrow).- Rich live loggerβASCII banner, DAG tree, progress barsβpowered by an EventBus.
- CLI additions:
--stream-writerflag (recommended for big runs).--quiet / --json-logsfor headless or scripted runs.inspect-dagcommand to visualise the graph without execution.
Quick demo:
poetry run chainette run examples/runner/huge_batch_demo.py demo_chain inputs_huge.jsonl out --stream-writerChainette ships with a Rich-powered UI:
Execution DAG
βββ parallel Γ 2
βββ fr_branch
β βββ π translate_fr
βββ es_branch
βββ π translate_es
qa_step ββββββββ 70% β’ 7/10 β’ 0:03
Key features
- Icons for node types (π Apply / π€ Step / πͺ’ Branch root, etc.).
- Collapse long parallel groups with "N moreβ¦".
- Flags:
--no-iconsβ plain ASCII.--max-branches 3β limit displayed branches.
- Live progress bars per step with
completed/totalbadge.
Print tree only:
poetry run chainette inspect-dag examples/translate.py my_chain --no-iconsChainette provides a simple CLI for managing LLM engines:
# Start non-lazy engines
chainette warmup -f engines.yml -e llama3
# Run a chain from a module
chainette run examples.qa:my_chain -i inputs.json --output_dir results
# Run a chain using huggingface datasets
chainette run examples.qa:my_chain -i dataset_name/split_name --columns input_column_name_1,input_column_name_2 --output_dir results
# Terminate engines
chainette kill --allConfigure engines in YAML:
engines:
- name: llama3
model: NousResearch/Meta-Llama-3-8B-Instruct
dtype: bfloat16
gpu_memory_utilization: 0.6
enable_reasoning: false
devices: [0]
lazy: true
- name: gpt4o
backend: openai
model: gpt-4.1-mini
endpoint: https://api.openai.com/v1
# `OPENAI_API_KEY` must be set in your environmentEach run creates:
graph.json: Chain execution metadata- Step output directories with data in JSON/CSV
- Optional flattened view combining all steps
Check the examples/ directory:
product_struct_extract.py: Extract product attributes with translationsmulti_doc_summary_eval.py: Document summarization with quality scoringjoin/inc_dec_join.py: Tiny pure-Python Join demo (no LLM needed)
| backend value | Description | Transport |
|---|---|---|
openai |
OpenAI Chat completions | HTTPS |
vllm_api |
vLLM OpenAI-compatible server (python -m vllm.entrypoints.openai.api_server) |
HTTP |
ollama_api |
Ollama REST (ollama serve) |
HTTP |
enable_reasoning is currently available only for vllm_api when the remote server supports reasoning flags.
Mandatory: Python β₯ 3.9, Pydantic v2, datasets.
Optional back-ends:
# In-process vLLM (GPU):
pip install vllm
# HTTP OpenAI:
pip install openai
# HTTP Ollama
brew install ollama # macOS helper
# or see https://ollama.aiMIT
A minimal ref-count abstraction ensuring engines spin up lazily and are flushed deterministically.
from chainette.engine.broker import EngineBroker as EB
with EB.acquire("gemma_ollama") as eng:
eng.generate(prompts, sampling)
# At end of run Executor calls
EB.flush(force=True) # frees any idle enginesKey points
- Context-manager β zero manual release in nodes.
- Reference counting prevents premature kills while branches share an engine.
- Idle engines auto-evict after 180 s or via
force=True.