#vector-database #ai-agents #semantic-search #vector-search #vector-embedding #embedding

bin+lib vectorlite

A high-performance, in-memory vector database optimized for AI agent workloads

6 releases

Uses new Rust 2024

0.1.5 Oct 24, 2025
0.1.4 Oct 22, 2025

#1928 in Database interfaces

Apache-2.0

180KB
3K SLoC

Rust 2K SLoC // 0.1% comments Shell 631 SLoC // 0.1% comments

VectorLite

Crates.io docs.rs License: Apache 2.0 Rust Tests

A tiny, in-process Rust vector store with built-in embeddings for sub-millisecond semantic search.

VectorLite is a high-performance, in-memory vector database optimized for AI agent and edge workloads.
It co-locates model inference (via Candle) with a low-latency vector index, making it ideal for session-scoped, single-instance, or privacy-sensitive environments.

Why VectorLite?

Feature Description
Sub-millisecond search In-memory HNSW or flat search tuned for real-time agent loops.
Built-in embeddings Runs all-MiniLM-L6-v2 locally using Candle, or any other model of your choice. No external API calls.
Single-binary simplicity No dependencies, no servers to orchestrate. Start instantly via CLI or Docker.
Session-scoped collections Perfect for ephemeral agent sessions or sidecars
Thread-safe concurrency RwLock-based access and atomic ID generation for multi-threaded workloads.
Instant persistence Save or restore collections snapshots in one call.

VectorLite trades distributed scalability for deterministic performance, perfect for use cases where latency mattters more than millions of vectors.

When to Use It

Scenario Why VectorLite fits
AI agent sessions Keep short-lived embeddings per conversation. No network latency.
Edge or embedded AI Run fully offline with model + index in one binary.
Realtime search / personalization Sub-ms search for pre-computed embeddings.
Local prototyping & CI Rust-native, no external services.
Single-tenant microservices Lightweight sidecar for semantic capabilities.

Quick Start

Run from Source

cargo run --bin vectorlite -- --port 3001

# Start with preloaded collection
cargo run --bin vectorlite -- --filepath ./my_collection.vlc --port 3001

Run with Docker

With default settings:

docker build -t vectorlite .
docker run -p 3001:3001 vectorlite

With a different embeddings model and memory-optimized HNSW:

docker build \
  --build-arg MODEL_NAME="sentence-transformers/paraphrase-MiniLM-L3-v2" \
  --build-arg FEATURES="memory-optimized" \
  -t vectorlite-small .

HTTP API Overview

Operation Method & Endpoint Body
Health GET /health
List collections GET /collections
Create collection POST /collections {"name": "docs", "index_type": "hnsw"}
Delete collection DELETE /collections/{name}
Add text POST /collections/{name}/text {"text": "Hello world", "metadata": {...}}
Search (text) POST /collections/{name}/search/text {"query": "hello", "k": 5}
Get vector GET /collections/{name}/vectors/{id}
Delete vector DELETE /collections/{name}/vectors/{id}
Save collection POST /collections/{name}/save {"file_path": "./collection.vlc"}
Load collection POST /collections/load {"file_path": "./collection.vlc", "collection_name": "restored"}

Index Types

Index Search Complexity Insert Use Case
Flat O(n) O(1) Small datasets (<10K) or exact search
HNSW O(log n) O(log n) Larger datasets or approximate search

See Hierarchical Navigable Small World.

Configuration profiles for HNSW

Profile Features Use Case
default balanced general workloads
memory-optimized reduced precision, smaller graph constrained devices
high-accuracy higher recall, more memory offline re-ranking or research
cargo build --features memory-optimized

Similarity Metrics

  • Cosine: Default for normalized embeddings, scale-invariant
  • Euclidean: Geometric distance, sensitive to vector magnitude
  • Manhattan: L1 norm, robust to outliers
  • Dot Product: Raw similarity, requires consistent vector scaling

Rust SDK Example

use vectorlite::{VectorLiteClient, EmbeddingGenerator, IndexType, SimilarityMetric};
use serde_json::json;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VectorLiteClient::new(Box::new(EmbeddingGenerator::new()?));

    client.create_collection("quotes", IndexType::HNSW)?;
    
    let id = client.add_text_to_collection(
        "quotes", 
        "I just want to lie on the beach and eat hot dogs",
        Some(json!({
            "author": "Kevin Malone",
            "tags": ["the-office", "s3:e23"],
            "year": 2005,
        }))
    )?;

    let results = client.search_text_in_collection(
        "quotes",
        "beach games",
        3,
        SimilarityMetric::Cosine,
    )?;

    for result in &results {
        println!("ID: {}, Score: {:.4}", result.id, result.score);
    }

    Ok(())
}

Testing

Run tests with mock embeddings (CI-friendly, no model files required):

cargo test --features mock-embeddings

Run tests with local models:

cargo test

Download ML Model

This downloads the BERT-based embedding model files needed for real embedding generation:

huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir models/all-MiniLM-L6-v2

The model files must be present in the ./models/{model-name}/ directory with the required files:

  • config.json
  • pytorch_model.bin
  • tokenizer.json

Using a different model

You can override the default embedding model at compile time using the custom-model feature:

DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo build --features custom-model

DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo run --features custom-model

License

Apache 2.0 License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Dependencies

~35–52MB
~897K SLoC