Crate embellama

Crate embellama 

Source
Expand description

§Embellama

A high-performance Rust library for generating text embeddings using llama-cpp-2.

§Features

  • High-performance embedding generation using llama.cpp backend
  • Support for batch processing with parallel pre/post-processing
  • Thread-safe model management with Arc/RwLock
  • Comprehensive error handling
  • Builder pattern for configuration
  • Optional HTTP server with OpenAI-compatible API

§Example

use embellama::{EmbeddingEngine, EngineConfig};

// Configure and build the engine
let config = EngineConfig::builder()
    .with_model_path("models/all-MiniLM-L6-v2.gguf")
    .with_model_name("minilm")
    .with_context_size(512)
    .with_n_threads(4)
    .build()?;

let engine = EmbeddingEngine::new(config)?;

// Generate a single embedding
let embedding = engine.embed("minilm", "Hello, world!")?;
println!("Embedding dimension: {}", embedding.len());

// Generate batch embeddings
let texts = vec![
    "First document",
    "Second document",
    "Third document",
];
let embeddings = engine.embed_batch("minilm", texts)?;
println!("Generated {} embeddings", embeddings.len());

Modules§

cache
Cache module for token and embedding caching
server
Server module (feature-gated) Server module for Embellama HTTP API

Structs§

BackendInfo
Backend information for diagnostics
BatchProcessor
Represents a batch of texts to be processed.
BatchProcessorBuilder
Builder for creating configured BatchProcessor instances.
CacheConfig
Configuration for caching system
CacheConfigBuilder
Builder for CacheConfig
EmbeddingEngine
The main entry point for the embellama library.
EmbeddingModel
Represents a loaded embedding model.
EngineConfig
Configuration for the embedding engine
EngineConfigBuilder
Builder for creating EngineConfig instances
GGUFMetadata
GGUF model metadata extracted from the file header
ModelConfig
Configuration for a single model
ModelConfigBuilder
Builder for creating ModelConfig instances
ModelInfo
Information about a loaded model.
VersionInfo
Get library version information

Enums§

BackendType
Available backend types for llama.cpp
Error
Custom error type for the embellama library
NormalizationMode
Normalization mode for embedding vectors
PoolingStrategy
Pooling strategy for combining token embeddings

Constants§

VERSION
Library version

Functions§

clear_metadata_cache
Clear the GGUF metadata cache
detect_best_backend
Detect the best available backend based on compile-time features and runtime capabilities
extract_gguf_metadata
Extract comprehensive metadata from a GGUF file
get_compiled_backend
Get the currently compiled backend based on build features
init
Initialize the library with default tracing subscriber
init_with_env_filter
Initialize the library with a custom environment filter
metadata_cache_size
Get the current size of the GGUF metadata cache
version_info
Get version information

Type Aliases§

Result
Type alias for Results in this crate