llmx

Local-first codebase indexer with semantic search and chunk exports for agent consumption

Transform large codebases into searchable, intelligently-chunked datasets with real neural network embeddings running entirely in your browser via WebGPU. No server, no API calls, no data leaving your machine.

Demo

Proof: 7,147 files indexed in 31 MB -> 1,625 tokens retrieved (99.98% savings) | 180+ tests | MCP server

Key Features

Neural Semantic Search - Snowflake Arctic embeddings with WebGPU acceleration, same quality as server-side solutions
Hybrid Search - Combines BM25 + vector search with RRF (Reciprocal Rank Fusion) for best results
Smart Chunking - Deterministic chunking by file type (functions, headings, JSON keys)
Semantic Exports - Hierarchical outline format with function names and heading breadcrumbs
Privacy-First - Zero network calls, all processing in-browser via WASM
Fast - Sub-second indexing, ~50ms embedding inference with GPU acceleration
Agent-Ready - Exports designed for selective retrieval, not bulk ingestion

The Problem

LLMs have limited context windows. Loading an entire codebase is:

Token-expensive - Wastes context on irrelevant code
Slow - Reading hundreds of files takes time
Inefficient - Agents can't filter until after reading everything

The Solution

llmx builds a searchable index with semantic chunk exports that enable agents to:

Scan the manifest (llm.md) to understand structure
Search for relevant concepts using BM25
Retrieve only the specific chunks needed
Navigate via function names, heading hierarchies, and file paths

Semantic Outline Format

llmx exports token-efficient manifests (llm.md + manifest.llm.tsv) with semantic labels for intelligent chunk selection:

Code Files

### src/auth.js (js, 47 lines)
- c0001 (1-15) `loginUser()`
- c0002 (17-30) `validateToken()`
- c0003 (32-47) `logout()`

Markdown Documentation

### docs/api-reference.md (md, 234 lines)
- c0004 (1-45) API Reference
- c0005 (46-102) API Reference > Authentication
- c0006 (103-156) API Reference > Rate Limiting > Quotas

Agents can scan headings, function names, and file types to select relevant chunks--without opening any files.

Quick Start

Note

CLI is recommended for most use cases. Web UI requires WASM build.

CLI Usage

# Build CLI
cargo build --release --features cli

# Index a codebase
llmx index ./my-project

# Search with token budget
llmx search "authentication login" --limit 10 --max-tokens 4000

# Explore structure
llmx explore files
llmx explore symbols --path src/

# Export for agents
llmx export --format zip -o ./export.zip

Web UI Setup

Build WASM

cd ingestor-wasm
wasm-pack build --target web --out-dir ../web/pkg

Run Web UI

python3 -m http.server 8001 --bind 127.0.0.1 --directory web

Open http://127.0.0.1:8001 in your browser.

Index a Codebase (Web UI)

Select folder (Chromium) or drag files (Firefox/all browsers)
Wait for indexing (sub-second for typical repos)
Search using the query box
Export -> Download an export bundle named after the selected folder (e.g. my-repo.llmx-1a2b3c4d.zip)

Usage

For Agents

Give the agent an export bundle (compact, recommended):

llmx-export/
├── llm.md              # Compact pointer manifest (recommended)
├── manifest.llm.tsv    # Token-efficient chunk table for LLMs
└── chunks/
    ├── c0001.md        # Chunk body (minimal header + content)
    └── ...

Agent workflow:

Read llm.md for the compact workflow and artifact pointers
Scan manifest.llm.tsv to identify relevant files/chunks by label
Open only the matching chunks/<ref>.md files
Download *.index.json from the UI only if you need the full index structure

For Humans

Browse the web UI for real-time search
Export for offline analysis
Share the downloaded *.llmx-<id8>.zip bundle with team members (no server needed)

How It Works

flowchart LR
    A[Codebase] --> B[Chunker]
    B --> C[Index + BM25]
    C --> D[llm.md manifest]
    D --> E[Agent queries]
    E --> F[Relevant chunks only]

Chunking Strategy

llmx chunks files deterministically by type:

File Type	Chunking Method
JavaScript/TypeScript	Function/class declarations (via tree-sitter or fallback)
Markdown	Heading boundaries with ancestry preserved
JSON	Top-level keys or array ranges (max 50 elements)
HTML	Heading tags, scripts/styles stripped
Text	Paragraph boundaries
Images	Indexed by path, bytes included in export

Search

Hybrid search combining two approaches:

BM25 (Keyword Search):
- Term frequency (TF)
- Inverse document frequency (IDF)
- Document length normalization
- Fast lexical matching
Neural Semantic Search:
- Snowflake Arctic Embed (384 dimensions, INT8 quantized)
- WebGPU-accelerated inference (~50ms per query)
- Falls back to CPU → hash-based → BM25-only
- Understands meaning, not just keywords
RRF Fusion:
- Reciprocal Rank Fusion combines both rankings
- Weighted blending for optimal results
- Better than either method alone

Runs fully client-side in WASM with GPU acceleration.

Export Formats

Format	Contents	Use Case
llm.md	Semantic manifest with outline	Quick scanning, agent navigation
manifest.json	Optimized columnar format	Machine parsing, tooling
index.json	Full index + inverted index	Offline search, backup
export.zip	All above + chunk files + images	Complete portable package

Real-World Example: Apple HIG Corpus

Tested on the Apple Human Interface Guidelines archive (1980-2009):

Metric	Value
Files	7,147
Chunks	21,369
Raw size	31 MB (~7.8M tokens)

Token Savings

Access Method	Tokens	Savings
Read all files	~7,800,000	--
Scan manifest (`llm.md`)	~208,000	97%
Targeted search (3 queries)	~1,625	99.98%

Agent Workflow Demo

An agent answering "How did Apple's feedback guidelines evolve from Lisa to Aqua?" retrieved:

Search: "user feedback error message design"

[1] 1992 Macintosh Human Interface Guidelines
    Heading: Human Interface Design Principles > Feedback and Dialog
    "Keep users informed... Most people would not know what to do
    if they saw 'The computer unexpectedly crashed. ID = 13.'"

[2] 1980 Lisa UI Standards
    "The LISA User Interface has two main goals, simplicity and
    integration. We want LISA to be easy to learn and easy to use..."

[3] 2001 Aqua Human Interface Guidelines
    "Do not use sheets for dialogs that apply to several windows..."

The agent found relevant content spanning 4 decades using 0.02% of the total corpus tokens.

Technical Details

Language: Rust (core), JavaScript (WASM bindings, web UI)
Architecture: Client-only, no server required
ML Framework: Burn 0.20 (compiles to WASM)
Embedding Model: Snowflake Arctic Embed Small (384-dim, INT8 quantized to ~9MB)
Storage: IndexedDB (persistent) or in-memory
Search: Hybrid (BM25 + neural embeddings) with RRF fusion
Chunking: Deterministic, content-hash based IDs
Performance:
- Indexing: ~500ms for 10MB codebase
- Embeddings: ~50ms per query (WebGPU)
- Package: 2.4 MB WASM (model weights loaded separately from CDN)

Browser Compatibility

File Selection

Chromium (Chrome, Edge): Full support (showDirectoryPicker)
WebKit (Safari): Folder input via webkitdirectory
Firefox: File selection or drag-and-drop (no folder picker)

Semantic Search

WebGPU (Chrome 113+, Edge 113+): GPU-accelerated embeddings (~50ms)
CPU Fallback: All modern browsers with WASM support (~100-200ms)
Hash Fallback: Universal compatibility (deterministic, instant)
BM25 Only: Always available as final fallback

Module workers fall back to main thread if unavailable.

Development

Project Structure

llmx/
├── ingestor-core/      # Rust library (chunking, indexing, search, RRF)
│   ├── src/
│   │   ├── chunk.rs         # Chunking logic
│   │   ├── index.rs         # Indexing + hybrid search
│   │   ├── rrf.rs           # Reciprocal Rank Fusion
│   │   ├── embeddings.rs    # Embedding abstractions
│   │   └── mcp/             # MCP server tools
├── ingestor-wasm/      # WASM bindings + Burn embeddings
│   ├── src/
│   │   ├── lib.rs           # WASM exports
│   │   └── embeddings_burn.rs  # Burn-based neural embeddings
│   ├── build.rs         # ONNX model download & conversion
│   └── .cargo/          # WASM build configuration
├── web/                # Browser UI
│   ├── app.js          # Main UI logic
│   ├── worker.js       # Web Worker for WASM
│   └── pkg/            # Built WASM artifacts
└── docs/               # Specifications and usage guides

Build

# Set model URL (required for WASM builds with embeddings)
export LLMX_EMBEDDING_MODEL_URL="https://your-cdn.com/arctic-embed-s-q8.bin"

# Build WASM (includes neural embedding support)
cd ingestor-wasm
wasm-pack build --target web --release

# Development build (faster, larger)
wasm-pack build --target web --dev

# Run tests
cd ingestor-core
cargo test

Security Note on Model URLs:

The LLMX_EMBEDDING_MODEL_URL is embedded into the WASM binary at build time
Use only public, non-authenticated URLs (e.g., HuggingFace, public CDN)
Never use signed URLs or URLs with authentication tokens
The URL will be visible to anyone inspecting the WASM binary
For production, host models on a public CDN or use HuggingFace directly

Build-time Model Download: The build script automatically downloads and converts the model:

Downloads safetensors from HuggingFace (if not cached)
Converts to Burn binary format with INT8 quantization
Stores in ingestor-wasm/models/ directory
Model is loaded at runtime from the CDN URL specified above

Testing

# Core library tests
cargo test --package ingestor-core

# CLI integration tests
cargo test --features cli

# MCP protocol tests
cargo test --features mcp

180+ tests covering token savings, CLI commands, MCP protocol, edge cases, and all 30+ file types.

Privacy & Security

Runtime Privacy

Zero network calls during indexing - Your code never leaves your machine
No external dependencies for core indexing functionality
Content treated as untrusted - Prompt injection resistant UI
Deterministic output - Same input = same index every time
IndexedDB caching - Model weights cached locally after first download

Build-Time Security

Model URLs embedded at build time - URLs are visible in WASM binary
- Only use public, non-authenticated URLs for model sources
- Current setup uses public HuggingFace model repositories
- Never embed signed URLs or authentication tokens
Model integrity verification - SHA-256 validation prevents tampering (planned)
Supply chain security - Models loaded from trusted sources (HuggingFace)
Quantization - INT8 quantization reduces model size with minimal quality loss

Security Notes

⚠️ WASM binaries are inspectable - Any URLs or constants in the build are visible to users. This is by design for transparency, but means secrets must never be embedded. Our current architecture uses only public model repositories and is safe for production use.

Export Format Details

Export Details

llm.md Format (v1.0)

Header:

# llm.md (pointer manifest)

Index ID: <sha256>
Files: 42  Chunks: 187

Chunk files live under `chunks/` and are named `{ref}.md`.
Prefer search to find refs, then open only the referenced chunk files.

File sections:

### src/utils.ts (js, 89 lines)
- abc123def (1-20) `parseDate()`
- ghi456jkl (22-45) `formatCurrency()`

Chunk files (chunks/<ref>.md):

---
ref: abc123def
id: <full-sha256>
slug: parseDate
path: src/utils.ts
kind: java_script
lines: [1, 20]
token_estimate: 145
heading_path: []
symbol: parseDate
---

export function parseDate(input) {
  // ... function body
}

License

MIT License - See LICENSE file for details

Contributing

Contributions welcome! Please:

Read the INGESTION_SPEC.md for architecture
Check existing issues before opening new ones
Run tests before submitting PRs
Follow the existing code style

Roadmap

Phase 6 (Complete)

Neural semantic search with Burn framework
WebGPU-accelerated embeddings
Hybrid search with RRF fusion
WASM build pipeline
Model quantization (INT8)
IndexedDB caching for models
CLI for headless indexing (llmx index, search, explore, export)
MCP server for external agent retrieval

Phase 7 (Current - Security Hardening)

Implement SHA-256 model integrity verification
Add download size limits and rate limiting
Improve error handling in MCP server (remove panics)
Add cancellation support for async operations
Browser integration testing across all platforms

Future Phases

Performance optimizations (fused QKV, attention mask broadcasting)
Model configuration flexibility (support multiple model sizes)
Add LSP/tree-sitter symbol extraction for more languages
Support image OCR for screenshot indexing
MCP server hardening and production deployment
Support for more file types (Python, Go, Rust, etc.)

Made for agents, by humans

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/assets/icons		.github/assets/icons
docs		docs
ingestor-core		ingestor-core
ingestor-wasm		ingestor-wasm
llmx-export		llmx-export
scripts		scripts
web		web
.gitignore		.gitignore
.mise.toml		.mise.toml
AGENTS.md		AGENTS.md
BURN_CODE_REVIEW.md		BURN_CODE_REVIEW.md
BURN_CODE_REVIEW_FOLLOWUP.md		BURN_CODE_REVIEW_FOLLOWUP.md
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
llms.txt		llms.txt
test_mcp_server.sh		test_mcp_server.sh

johnzfitch/llmx

Folders and files

Latest commit

History

Repository files navigation