llama.rs

A pure Rust implementation of the LLaMA model for inference and educational purposes. Supports LLaMA 1, 2, and 3 architectures.

This repository demonstrates how to run LLaMA inference with minimal dependencies, making it ideal for learning and understanding transformer internals.

Features

HF-aligned Architecture – Matches HuggingFace reference implementation with clean, structured codebase matching official model layouts
Parallel MHA – Multi-head attention parallelized with Rayon for 2-4x speedup on multi-core systems
Minimal Dependencies – Only uses byteorder, rayon, rand, and thiserror
Educational – Line-by-line readable transformer implementation with inline documentation
Type-safe – Leverages Rust's type system for memory safety without garbage collection overhead

Usage

cargo run --release -- <checkpoint> <tokenizer> [prompt] [options]

Options

Flag	Description	Default
`--temp <float>`	Sampling temperature (0 = greedy)	1.0
`--topp <float>`	Top-p (nucleus) sampling	0.9
`--steps <int>`	Max tokens to generate	256
`--seed <int>`	Random seed	0

Example

cargo run --release -- stories15M.bin tokenizer.bin "Once upon a time" --temp 0.8 --steps 128

The examples use small models trained by Andrej Karpathy for demonstration.

Related Work

If you're interested in LLaMA implementations in other languages:

llama.go – Pure Go implementation
llama.np – NumPy-based implementation
llama.cu – CUDA-accelerated implementation

License

This project is licensed under either of the following licenses, at your option:

Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in llama.rs by you, as defined in the Apache-2.0 license, shall be dually licensed as above, without any additional terms or conditions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
stories15M.bin		stories15M.bin
tokenizer.bin		tokenizer.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

llama.rs

Features

Usage

Options

Example

Related Work

License

About

Licenses found

Uh oh!

Releases

Uh oh!

Languages

License

Licenses found

gitctrlx/llama.rs

Folders and files

Latest commit

History

Repository files navigation

llama.rs

Features

Usage

Options

Example

Related Work

License

About

Resources

License

Licenses found

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages