4 releases
| new 0.1.29 | Dec 29, 2025 |
|---|---|
| 0.1.22 | Dec 9, 2025 |
| 0.1.2 | Nov 26, 2025 |
| 0.1.1 | Nov 20, 2025 |
#2177 in Database interfaces
1.5MB
29K
SLoC
Ruvector CLI
Command-line interface and MCP server for high-performance vector database operations.
Professional CLI tools for managing Ruvector vector databases with sub-millisecond query performance, batch operations, and MCP integration.
🌟 Overview
The Ruvector CLI provides a comprehensive command-line interface for:
- Database Management: Create and configure vector databases
- Data Operations: Insert, search, and export vector data
- Performance Benchmarking: Test query performance and throughput
- Format Support: JSON, CSV, and NumPy array formats
- MCP Server: Model Context Protocol server for AI integrations
- Batch Processing: Efficient bulk operations with progress tracking
⚡ Quick Start
Installation
Install via Cargo:
cargo install ruvector-cli
Or build from source:
# Clone repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector
# Build CLI
cargo build --release -p ruvector-cli
# Install locally
cargo install --path crates/ruvector-cli
Basic Usage
# Create a new database
ruvector create --dimensions 384 --path ./my-vectors.db
# Insert vectors from JSON
ruvector insert --db ./my-vectors.db --input vectors.json --format json
# Search for similar vectors
ruvector search --db ./my-vectors.db --query "[0.1, 0.2, 0.3, ...]" --top-k 10
# Show database information
ruvector info --db ./my-vectors.db
# Run performance benchmark
ruvector benchmark --db ./my-vectors.db --queries 1000
📋 Command Reference
Global Options
All commands support these global options:
-c, --config <FILE> Configuration file path
-d, --debug Enable debug logging
--no-color Disable colored output
-h, --help Print help information
-V, --version Print version information
Commands
create - Create a New Database
Create a new vector database with specified dimensions.
ruvector create [OPTIONS] --dimensions <DIMENSIONS>
Options:
-p, --path <PATH> Database file path [default: ./ruvector.db]
-d, --dimensions <DIMENSIONS> Vector dimensions (required)
Examples:
# Create database for 384-dimensional embeddings (e.g., MiniLM)
ruvector create --dimensions 384
# Create database with custom path
ruvector create --dimensions 1536 --path ./embeddings.db
# Create for large embeddings (e.g., text-embedding-3-large)
ruvector create --dimensions 3072 --path ./large-embeddings.db
insert - Insert Vectors from File
Bulk insert vectors from JSON, CSV, or NumPy files.
ruvector insert [OPTIONS] --input <FILE>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-i, --input <FILE> Input file path (required)
-f, --format <FORMAT> Input format: json, csv, npy [default: json]
--no-progress Hide progress bar
Input Formats:
JSON (array of vector entries):
[
{
"id": "doc_1",
"vector": [0.1, 0.2, 0.3, ...],
"metadata": {"title": "Document 1", "category": "tech"}
},
{
"id": "doc_2",
"vector": [0.4, 0.5, 0.6, ...],
"metadata": {"title": "Document 2", "category": "science"}
}
]
CSV (id, vector_json, metadata_json):
id,vector,metadata
doc_1,"[0.1, 0.2, 0.3]","{\"title\": \"Document 1\"}"
doc_2,"[0.4, 0.5, 0.6]","{\"title\": \"Document 2\"}"
NumPy (.npy file with 2D array):
import numpy as np
vectors = np.random.randn(1000, 384).astype(np.float32)
np.save('vectors.npy', vectors)
Examples:
# Insert from JSON file
ruvector insert --input embeddings.json --format json
# Insert from CSV with progress
ruvector insert --input data.csv --format csv
# Insert from NumPy array
ruvector insert --input vectors.npy --format npy
# Batch insert without progress bar
ruvector insert --input large-dataset.json --no-progress
search - Search for Similar Vectors
Find k-nearest neighbors for a query vector.
ruvector search [OPTIONS] --query <VECTOR>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-q, --query <VECTOR> Query vector (comma-separated or JSON array)
-k, --top-k <K> Number of results to return [default: 10]
--show-vectors Show full vectors in results
Query Formats:
# Comma-separated floats
ruvector search --query "0.1, 0.2, 0.3, 0.4, ..."
# JSON array
ruvector search --query "[0.1, 0.2, 0.3, 0.4, ...]"
# From file (using shell)
ruvector search --query "$(cat query.json)"
Examples:
# Search for top 10 similar vectors
ruvector search --query "[0.1, 0.2, 0.3, ...]" --top-k 10
# Search with full vector output
ruvector search --query "0.1, 0.2, 0.3, ..." --show-vectors
# Search for top 50 results
ruvector search --query "[0.1, 0.2, ...]" -k 50
Output:
🔍 Search Results (top 10)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
#1 doc_42 similarity: 0.9876
#2 doc_128 similarity: 0.9543
#3 doc_89 similarity: 0.9321
...
Search completed in 0.48ms
info - Show Database Information
Display database statistics and configuration.
ruvector info [OPTIONS]
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
Examples:
# Show default database info
ruvector info
# Show custom database info
ruvector info --db ./embeddings.db
Output:
📊 Database Statistics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total vectors: 1,234,567
Dimensions: 384
Distance metric: Cosine
HNSW Configuration:
M: 16
ef_construction: 200
ef_search: 100
benchmark - Run Performance Benchmark
Test query performance with random vectors.
ruvector benchmark [OPTIONS]
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-n, --queries <N> Number of queries to run [default: 1000]
Examples:
# Quick benchmark (1000 queries)
ruvector benchmark
# Extended benchmark (10,000 queries)
ruvector benchmark --queries 10000
# Benchmark specific database
ruvector benchmark --db ./prod.db --queries 5000
Output:
Running benchmark...
Queries: 1000
Dimensions: 384
Benchmark Results:
Total time: 0.48s
Queries per second: 2083
Average latency: 0.48ms
export - Export Database to File
Export vector data to JSON or CSV format.
ruvector export [OPTIONS] --output <FILE>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-o, --output <FILE> Output file path (required)
-f, --format <FORMAT> Output format: json, csv [default: json]
Examples:
# Export to JSON
ruvector export --output backup.json --format json
# Export to CSV
ruvector export --output export.csv --format csv
# Export with custom database
ruvector export --db ./prod.db --output prod-backup.json
Note: Export functionality requires
VectorDB::all_ids()method. This feature is planned for a future release.
import - Import from Other Vector Databases
Import vectors from external vector database formats.
ruvector import [OPTIONS] --source <TYPE> --source-path <PATH>
Options:
-d, --db <PATH> Database file path [default: ./ruvector.db]
-s, --source <TYPE> Source database type: faiss, pinecone, weaviate
-p, --source-path <PATH> Source file or connection path
Examples:
# Import from FAISS index
ruvector import --source faiss --source-path ./index.faiss
# Import from Pinecone export
ruvector import --source pinecone --source-path ./pinecone-export.json
# Import from Weaviate backup
ruvector import --source weaviate --source-path ./weaviate-backup.json
Note: Import functionality for external databases is planned for future releases.
🔧 Configuration
Configuration File
Create a ruvector.toml configuration file for default settings:
[database]
storage_path = "./ruvector.db"
dimensions = 384
distance_metric = "Cosine" # Cosine, Euclidean, DotProduct, Manhattan
[database.hnsw]
m = 16
ef_construction = 200
ef_search = 100
[database.quantization]
type = "Scalar" # Scalar, Product, or None
[cli]
progress = true
colors = true
batch_size = 1000
[mcp]
host = "127.0.0.1"
port = 3000
cors = true
Configuration Locations
The CLI searches for configuration files in this order:
- Path specified via
--configflag ./ruvector.toml(current directory)./.ruvector.toml(current directory, hidden)~/.config/ruvector/config.toml(user config)/etc/ruvector/config.toml(system config)
Environment Variables
Override configuration with environment variables:
# Database settings
export RUVECTOR_STORAGE_PATH="./my-db.db"
export RUVECTOR_DIMENSIONS=384
export RUVECTOR_DISTANCE_METRIC="cosine"
# MCP server settings
export RUVECTOR_MCP_HOST="0.0.0.0"
export RUVECTOR_MCP_PORT=3000
# Run with environment overrides
ruvector info
🔌 MCP Server
The Ruvector CLI includes a Model Context Protocol (MCP) server for AI agent integration.
Start MCP Server
STDIO Transport (for local AI tools):
ruvector-mcp --transport stdio
SSE Transport (for web-based AI tools):
ruvector-mcp --transport sse --host 0.0.0.0 --port 3000
With Configuration:
ruvector-mcp --config ./ruvector.toml --transport sse --debug
MCP Integration Examples
Claude Desktop Integration (claude_desktop_config.json):
{
"mcpServers": {
"ruvector": {
"command": "ruvector-mcp",
"args": ["--transport", "stdio"],
"env": {
"RUVECTOR_STORAGE_PATH": "/path/to/vectors.db"
}
}
}
}
HTTP/SSE Client:
const evtSource = new EventSource('http://localhost:3000/sse');
evtSource.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
console.log('MCP Response:', data);
});
// Send search request
fetch('http://localhost:3000/mcp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
method: 'search',
params: {
query: [0.1, 0.2, 0.3],
k: 10
}
})
});
📊 Common Workflows
RAG System Setup
Build a retrieval-augmented generation (RAG) system:
# 1. Create database for your embedding model
ruvector create --dimensions 384 --path ./rag-embeddings.db
# 2. Generate embeddings and save to JSON
# (Use your preferred embedding model)
# 3. Insert embeddings
ruvector insert --db ./rag-embeddings.db --input embeddings.json
# 4. Query for relevant context
ruvector search --db ./rag-embeddings.db \
--query "[0.123, 0.456, ...]" \
--top-k 5
# 5. Start MCP server for AI agent access
ruvector-mcp --transport stdio
Semantic Search Engine
Build a semantic search system:
# Create database
ruvector create --dimensions 768 --path ./search-engine.db
# Batch insert documents
ruvector insert \
--db ./search-engine.db \
--input documents.json \
--format json
# Benchmark performance
ruvector benchmark --db ./search-engine.db --queries 10000
# Search interface via MCP
ruvector-mcp --transport sse --port 8080
Migration from Other Databases
Migrate from existing vector databases:
# 1. Export from source database
# (Use source database's export tools)
# 2. Create Ruvector database
ruvector create --dimensions 1536 --path ./migrated.db
# 3. Import data (planned feature)
ruvector import \
--db ./migrated.db \
--source pinecone \
--source-path ./pinecone-export.json
# 4. Verify migration
ruvector info --db ./migrated.db
ruvector benchmark --db ./migrated.db
Performance Testing
Test vector database performance:
# Create test database
ruvector create --dimensions 384 --path ./benchmark.db
# Generate synthetic test data
python generate_test_vectors.py --count 100000 --dims 384 --output test.npy
# Insert test data
ruvector insert --db ./benchmark.db --input test.npy --format npy
# Run comprehensive benchmark
ruvector benchmark --db ./benchmark.db --queries 10000
# Test search performance
time ruvector search --db ./benchmark.db --query "[0.1, 0.2, ...]" -k 100
🎯 Shell Completion
Generate shell completion scripts for faster command entry:
Bash
# Generate completion script
ruvector --help > /dev/null # Trigger clap completion
complete -C ruvector ruvector
# Or add to ~/.bashrc
echo 'complete -C ruvector ruvector' >> ~/.bashrc
Zsh
# Add to ~/.zshrc
autoload -U compinit && compinit
complete -o nospace -C ruvector ruvector
Fish
# Generate and save completion
ruvector --help > /dev/null
complete -c ruvector -f
⚙️ Performance Tips
Optimize Insertion
# Use larger batch sizes for bulk inserts (set in config)
[cli]
batch_size = 10000
# Disable progress bar for maximum speed
ruvector insert --input large-file.json --no-progress
Optimize Search
Configure HNSW parameters for your use case:
[database.hnsw]
# Higher M = better recall, more memory
m = 32
# Higher ef_construction = better index quality, slower builds
ef_construction = 400
# Higher ef_search = better recall, slower queries
ef_search = 200
Memory Optimization
Enable quantization to reduce memory usage:
[database.quantization]
type = "Product" # 4-8x memory reduction
Benchmarking Tips
# Run warm-up queries first
ruvector search --query "[...]" -k 10
ruvector search --query "[...]" -k 10
# Then benchmark
ruvector benchmark --queries 10000
# Test different k values
for k in 10 50 100; do
time ruvector search --query "[...]" -k $k
done
🔗 Related Documentation
- Rust API Reference - Core Ruvector API
- Getting Started Guide - Complete tutorial
- Performance Tuning - Optimization guide
- Main README - Project overview
🐛 Troubleshooting
Common Issues
Database file not found:
# Ensure database exists
ruvector info --db ./ruvector.db
# Or create it first
ruvector create --dimensions 384 --path ./ruvector.db
Dimension mismatch:
# Error: "Vector dimension mismatch"
# Solution: Ensure all vectors match database dimensions
# Check database dimensions
ruvector info --db ./ruvector.db
Invalid query format:
# Use proper JSON or comma-separated format
ruvector search --query "[0.1, 0.2, 0.3]" # JSON
ruvector search --query "0.1, 0.2, 0.3" # CSV
MCP server connection issues:
# Check if port is available
lsof -i :3000
# Try different port
ruvector-mcp --transport sse --port 8080
# Enable debug logging
ruvector-mcp --transport sse --debug
🤝 Contributing
Contributions welcome! Please see the Contributing Guidelines.
Development Setup
# Clone repository
git clone https://github.com/ruvnet/ruvector.git
cd ruvector/crates/ruvector-cli
# Run tests
cargo test
# Check formatting
cargo fmt -- --check
# Run clippy
cargo clippy -- -D warnings
# Build release
cargo build --release
📜 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
Built with:
- clap - Command-line argument parsing
- tokio - Async runtime
- serde - Serialization framework
- indicatif - Progress bars and spinners
- colored - Terminal colors
Dependencies
~35–53MB
~743K SLoC