Local-first codebase indexer with semantic search and chunk exports for agent consumption
Transform large codebases into searchable, intelligently-chunked datasets with real neural network embeddings running entirely in your browser via WebGPU. No server, no API calls, no data leaving your machine.
Proof: 7,147 files indexed in 31 MB -> 1,625 tokens retrieved (99.98% savings) | 180+ tests | MCP server
Neural Semantic Search - Snowflake Arctic embeddings with WebGPU acceleration, same quality as server-side solutions
Hybrid Search - Combines BM25 + vector search with RRF (Reciprocal Rank Fusion) for best results
Smart Chunking - Deterministic chunking by file type (functions, headings, JSON keys)
Semantic Exports - Hierarchical outline format with function names and heading breadcrumbs
Privacy-First - Zero network calls, all processing in-browser via WASM
Fast - Sub-second indexing, ~50ms embedding inference with GPU acceleration
Agent-Ready - Exports designed for selective retrieval, not bulk ingestion
LLMs have limited context windows. Loading an entire codebase is:
- Token-expensive - Wastes context on irrelevant code
- Slow - Reading hundreds of files takes time
- Inefficient - Agents can't filter until after reading everything
llmx builds a searchable index with semantic chunk exports that enable agents to:
- Scan the manifest (
llm.md) to understand structure - Search for relevant concepts using BM25
- Retrieve only the specific chunks needed
- Navigate via function names, heading hierarchies, and file paths
llmx exports token-efficient manifests (llm.md + manifest.llm.tsv) with semantic labels for intelligent chunk selection:
### src/auth.js (js, 47 lines)
- c0001 (1-15) `loginUser()`
- c0002 (17-30) `validateToken()`
- c0003 (32-47) `logout()`
### docs/api-reference.md (md, 234 lines)
- c0004 (1-45) API Reference
- c0005 (46-102) API Reference > Authentication
- c0006 (103-156) API Reference > Rate Limiting > Quotas
Agents can scan headings, function names, and file types to select relevant chunks--without opening any files.
Note
CLI is recommended for most use cases. Web UI requires WASM build.
# Build CLI
cargo build --release --features cli
# Index a codebase
llmx index ./my-project
# Search with token budget
llmx search "authentication login" --limit 10 --max-tokens 4000
# Explore structure
llmx explore files
llmx explore symbols --path src/
# Export for agents
llmx export --format zip -o ./export.zipWeb UI Setup
cd ingestor-wasm
wasm-pack build --target web --out-dir ../web/pkgpython3 -m http.server 8001 --bind 127.0.0.1 --directory webOpen http://127.0.0.1:8001 in your browser.
- Select folder (Chromium) or drag files (Firefox/all browsers)
- Wait for indexing (sub-second for typical repos)
- Search using the query box
- Export -> Download an export bundle named after the selected folder (e.g.
my-repo.llmx-1a2b3c4d.zip)
Give the agent an export bundle (compact, recommended):
llmx-export/
├── llm.md # Compact pointer manifest (recommended)
├── manifest.llm.tsv # Token-efficient chunk table for LLMs
└── chunks/
├── c0001.md # Chunk body (minimal header + content)
└── ...
Agent workflow:
- Read
llm.mdfor the compact workflow and artifact pointers - Scan
manifest.llm.tsvto identify relevant files/chunks by label - Open only the matching
chunks/<ref>.mdfiles - Download
*.index.jsonfrom the UI only if you need the full index structure
- Browse the web UI for real-time search
- Export for offline analysis
- Share the downloaded
*.llmx-<id8>.zipbundle with team members (no server needed)
flowchart LR
A[Codebase] --> B[Chunker]
B --> C[Index + BM25]
C --> D[llm.md manifest]
D --> E[Agent queries]
E --> F[Relevant chunks only]
llmx chunks files deterministically by type:
| File Type | Chunking Method |
|---|---|
| JavaScript/TypeScript | Function/class declarations (via tree-sitter or fallback) |
| Markdown | Heading boundaries with ancestry preserved |
| JSON | Top-level keys or array ranges (max 50 elements) |
| HTML | Heading tags, scripts/styles stripped |
| Text | Paragraph boundaries |
| Images | Indexed by path, bytes included in export |
Hybrid search combining two approaches:
-
BM25 (Keyword Search):
- Term frequency (TF)
- Inverse document frequency (IDF)
- Document length normalization
- Fast lexical matching
-
Neural Semantic Search:
- Snowflake Arctic Embed (384 dimensions, INT8 quantized)
- WebGPU-accelerated inference (~50ms per query)
- Falls back to CPU → hash-based → BM25-only
- Understands meaning, not just keywords
-
RRF Fusion:
- Reciprocal Rank Fusion combines both rankings
- Weighted blending for optimal results
- Better than either method alone
Runs fully client-side in WASM with GPU acceleration.
| Format | Contents | Use Case |
|---|---|---|
| llm.md | Semantic manifest with outline | Quick scanning, agent navigation |
| manifest.json | Optimized columnar format | Machine parsing, tooling |
| index.json | Full index + inverted index | Offline search, backup |
| export.zip | All above + chunk files + images | Complete portable package |
Tested on the Apple Human Interface Guidelines archive (1980-2009):
| Metric | Value |
|---|---|
| Files | 7,147 |
| Chunks | 21,369 |
| Raw size | 31 MB (~7.8M tokens) |
| Access Method | Tokens | Savings |
|---|---|---|
| Read all files | ~7,800,000 | -- |
Scan manifest (llm.md) |
~208,000 | 97% |
| Targeted search (3 queries) | ~1,625 | 99.98% |
An agent answering "How did Apple's feedback guidelines evolve from Lisa to Aqua?" retrieved:
Search: "user feedback error message design"
[1] 1992 Macintosh Human Interface Guidelines
Heading: Human Interface Design Principles > Feedback and Dialog
"Keep users informed... Most people would not know what to do
if they saw 'The computer unexpectedly crashed. ID = 13.'"
[2] 1980 Lisa UI Standards
"The LISA User Interface has two main goals, simplicity and
integration. We want LISA to be easy to learn and easy to use..."
[3] 2001 Aqua Human Interface Guidelines
"Do not use sheets for dialogs that apply to several windows..."
The agent found relevant content spanning 4 decades using 0.02% of the total corpus tokens.
- Language: Rust (core), JavaScript (WASM bindings, web UI)
- Architecture: Client-only, no server required
- ML Framework: Burn 0.20 (compiles to WASM)
- Embedding Model: Snowflake Arctic Embed Small (384-dim, INT8 quantized to ~9MB)
- Storage: IndexedDB (persistent) or in-memory
- Search: Hybrid (BM25 + neural embeddings) with RRF fusion
- Chunking: Deterministic, content-hash based IDs
- Performance:
- Indexing: ~500ms for 10MB codebase
- Embeddings: ~50ms per query (WebGPU)
- Package: 2.4 MB WASM (model weights loaded separately from CDN)
- Chromium (Chrome, Edge): Full support (
showDirectoryPicker) - WebKit (Safari): Folder input via
webkitdirectory - Firefox: File selection or drag-and-drop (no folder picker)
- WebGPU (Chrome 113+, Edge 113+): GPU-accelerated embeddings (~50ms)
- CPU Fallback: All modern browsers with WASM support (~100-200ms)
- Hash Fallback: Universal compatibility (deterministic, instant)
- BM25 Only: Always available as final fallback
Module workers fall back to main thread if unavailable.
llmx/
├── ingestor-core/ # Rust library (chunking, indexing, search, RRF)
│ ├── src/
│ │ ├── chunk.rs # Chunking logic
│ │ ├── index.rs # Indexing + hybrid search
│ │ ├── rrf.rs # Reciprocal Rank Fusion
│ │ ├── embeddings.rs # Embedding abstractions
│ │ └── mcp/ # MCP server tools
├── ingestor-wasm/ # WASM bindings + Burn embeddings
│ ├── src/
│ │ ├── lib.rs # WASM exports
│ │ └── embeddings_burn.rs # Burn-based neural embeddings
│ ├── build.rs # ONNX model download & conversion
│ └── .cargo/ # WASM build configuration
├── web/ # Browser UI
│ ├── app.js # Main UI logic
│ ├── worker.js # Web Worker for WASM
│ └── pkg/ # Built WASM artifacts
└── docs/ # Specifications and usage guides
# Set model URL (required for WASM builds with embeddings)
export LLMX_EMBEDDING_MODEL_URL="https://your-cdn.com/arctic-embed-s-q8.bin"
# Build WASM (includes neural embedding support)
cd ingestor-wasm
wasm-pack build --target web --release
# Development build (faster, larger)
wasm-pack build --target web --dev
# Run tests
cd ingestor-core
cargo testSecurity Note on Model URLs:
- The
LLMX_EMBEDDING_MODEL_URLis embedded into the WASM binary at build time - Use only public, non-authenticated URLs (e.g., HuggingFace, public CDN)
- Never use signed URLs or URLs with authentication tokens
- The URL will be visible to anyone inspecting the WASM binary
- For production, host models on a public CDN or use HuggingFace directly
Build-time Model Download: The build script automatically downloads and converts the model:
- Downloads safetensors from HuggingFace (if not cached)
- Converts to Burn binary format with INT8 quantization
- Stores in
ingestor-wasm/models/directory - Model is loaded at runtime from the CDN URL specified above
# Core library tests
cargo test --package ingestor-core
# CLI integration tests
cargo test --features cli
# MCP protocol tests
cargo test --features mcp180+ tests covering token savings, CLI commands, MCP protocol, edge cases, and all 30+ file types.
- Zero network calls during indexing - Your code never leaves your machine
- No external dependencies for core indexing functionality
- Content treated as untrusted - Prompt injection resistant UI
- Deterministic output - Same input = same index every time
- IndexedDB caching - Model weights cached locally after first download
- Model URLs embedded at build time - URLs are visible in WASM binary
- Only use public, non-authenticated URLs for model sources
- Current setup uses public HuggingFace model repositories
- Never embed signed URLs or authentication tokens
- Model integrity verification - SHA-256 validation prevents tampering (planned)
- Supply chain security - Models loaded from trusted sources (HuggingFace)
- Quantization - INT8 quantization reduces model size with minimal quality loss
Export Format Details
Header:
# llm.md (pointer manifest)
Index ID: <sha256>
Files: 42 Chunks: 187
Chunk files live under `chunks/` and are named `{ref}.md`.
Prefer search to find refs, then open only the referenced chunk files.File sections:
### src/utils.ts (js, 89 lines)
- abc123def (1-20) `parseDate()`
- ghi456jkl (22-45) `formatCurrency()`Chunk files (chunks/<ref>.md):
---
ref: abc123def
id: <full-sha256>
slug: parseDate
path: src/utils.ts
kind: java_script
lines: [1, 20]
token_estimate: 145
heading_path: []
symbol: parseDate
---
export function parseDate(input) {
// ... function body
}MIT License - See LICENSE file for details
Contributions welcome! Please:
- Read the INGESTION_SPEC.md for architecture
- Check existing issues before opening new ones
- Run tests before submitting PRs
- Follow the existing code style
- Neural semantic search with Burn framework
- WebGPU-accelerated embeddings
- Hybrid search with RRF fusion
- WASM build pipeline
- Model quantization (INT8)
- IndexedDB caching for models
- CLI for headless indexing (
llmx index,search,explore,export) - MCP server for external agent retrieval
- Implement SHA-256 model integrity verification
- Add download size limits and rate limiting
- Improve error handling in MCP server (remove panics)
- Add cancellation support for async operations
- Browser integration testing across all platforms
- Performance optimizations (fused QKV, attention mask broadcasting)
- Model configuration flexibility (support multiple model sizes)
- Add LSP/tree-sitter symbol extraction for more languages
- Support image OCR for screenshot indexing
- MCP server hardening and production deployment
- Support for more file types (Python, Go, Rust, etc.)