The 5MB Alternative to Ollama

Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.

Drop-in OpenAI API Replacement for Local LLMs

Shimmy is a 5.1MB single-binary that provides 100% OpenAI-compatible endpoints for GGUF models. Point your existing AI tools to Shimmy and they just work - locally, privately, and free.

# Install and run in 30 seconds
cargo install shimmy --features huggingface
shimmy serve
# → Running on http://localhost:11435

🚀 Works with Your Existing Tools

No code changes needed - just change the API endpoint:

VSCode Extensions: Point to http://localhost:11435
Cursor Editor: Built-in OpenAI compatibility
Continue.dev: Drop-in model provider
Any OpenAI client: Python, Node.js, curl, etc.

⚡ Zero Configuration Required

Auto-discovers models from Hugging Face cache, Ollama, local dirs
Auto-allocates ports to avoid conflicts
Auto-detects LoRA adapters for specialized models
Just works - no config files, no setup wizards

🎯 Perfect for Local Development

Privacy: Your code never leaves your machine
Cost: No API keys, no per-token billing
Speed: Local inference, sub-second responses
Reliability: No rate limits, no downtime

Quick Start (30 seconds)

Installation

🪟 Windows

# RECOMMENDED: Use pre-built binary (no build dependencies required)
curl -L https://github.com/Michael-A-Kuykendall/shimmy/releases/latest/download/shimmy.exe -o shimmy.exe

# OR: Install from source (requires LLVM/Clang)
# First install build dependencies:
winget install LLVM.LLVM
# Then install shimmy:
cargo install shimmy --features huggingface

⚠️ Windows Notes:

Pre-built binary recommended to avoid build dependency issues

If Windows Defender flags the binary, add an exclusion or use cargo install

For cargo install: Install LLVM first to resolve libclang.dll errors

🍎 macOS / 🐧 Linux

# Install from crates.io
cargo install shimmy --features huggingface

Get Models

Shimmy auto-discovers models from:

Hugging Face cache: ~/.cache/huggingface/hub/
Ollama models: ~/.ollama/models/
Local directory: ./models/
Environment: SHIMMY_BASE_GGUF=path/to/model.gguf

# Download models that work out of the box
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf --local-dir ./models/
huggingface-cli download bartowski/Llama-3.2-1B-Instruct-GGUF --local-dir ./models/

Start Server

# Auto-allocates port to avoid conflicts
shimmy serve

# Or use manual port
shimmy serve --bind 127.0.0.1:11435

Point your AI tools to the displayed port - VSCode Copilot, Cursor, Continue.dev all work instantly!

📦 Download & Install

Package Managers

Rust: cargo install shimmy
VS Code: Shimmy Extension
npm: npm install -g shimmy-js (coming soon)
Python: pip install shimmy (coming soon)

Direct Downloads

GitHub Releases: Latest binaries
Docker: docker pull shimmy/shimmy:latest (coming soon)

🍎 macOS Support

Full compatibility confirmed! Shimmy works flawlessly on macOS with Metal GPU acceleration.

# Install dependencies
brew install cmake rust

# Install shimmy
cargo install shimmy

✅ Verified working:

Intel and Apple Silicon Macs
Metal GPU acceleration (automatic)
Xcode 17+ compatibility
All LoRA adapter features

Integration Examples

VSCode Copilot

{
  "github.copilot.advanced": {
    "serverUrl": "http://localhost:11435"
  }
}

Continue.dev

{
  "models": [{
    "title": "Local Shimmy",
    "provider": "openai", 
    "model": "your-model-name",
    "apiBase": "http://localhost:11435/v1"
  }]
}

Cursor IDE

Works out of the box - just point to http://localhost:11435/v1

Why Shimmy Will Always Be Free

I built Shimmy because I was tired of 680MB binaries to run a 4GB model.

This is my commitment: Shimmy stays MIT licensed, forever. If you want to support development, sponsor it. If you don't, just build something cool with it.

Shimmy saves you time and money. If it's useful, consider sponsoring for $5/month — less than your Netflix subscription, infinitely more useful.

Performance Comparison

Tool	Binary Size	Startup Time	Memory Usage	OpenAI API
Shimmy	5.1MB	<100ms	50MB	100%
Ollama	680MB	5-10s	200MB+	Partial
llama.cpp	89MB	1-2s	100MB	None

API Reference

Endpoints

GET /health - Health check
POST /v1/chat/completions - OpenAI-compatible chat
GET /v1/models - List available models
POST /api/generate - Shimmy native API
GET /ws/generate - WebSocket streaming

CLI Commands

shimmy serve                    # Start server (auto port allocation)
shimmy serve --bind 127.0.0.1:8080  # Manual port binding
shimmy list                     # Show available models  
shimmy discover                 # Refresh model discovery
shimmy generate --name X --prompt "Hi"  # Test generation
shimmy probe model-name         # Verify model loads

Technical Architecture

Rust + Tokio: Memory-safe, async performance
llama.cpp backend: Industry-standard GGUF inference
OpenAI API compatibility: Drop-in replacement
Dynamic port management: Zero conflicts, auto-allocation
Zero-config auto-discovery: Just works™

Community & Support

🐛 Bug Reports: GitHub Issues
💬 Discussions: GitHub Discussions
📖 Documentation: docs/
💝 Sponsorship: GitHub Sponsors

Star History

Quality & Reliability

Shimmy maintains high code quality through comprehensive testing:

Comprehensive test suite with property-based testing
Automated CI/CD pipeline with quality gates
Runtime invariant checking for critical operations
Cross-platform compatibility testing

See our testing approach for technical details.

License & Philosophy

MIT License - forever and always.

Philosophy: Infrastructure should be invisible. Shimmy is infrastructure.

Testing Philosophy: Reliability through comprehensive validation and property-based testing.

Forever maintainer: Michael A. Kuykendall
Promise: This will never become a paid product
Mission: Making local AI development frictionless

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.cargo		.cargo
.claude		.claude
.github		.github
assets		assets
benches		benches
coverage		coverage
deploy		deploy
docs		docs
libs		libs
packaging		packaging
release-artifacts		release-artifacts
scripts		scripts
shimmy-vscode		shimmy-vscode
src		src
test-huggingface-model		test-huggingface-model
test-safetensors-model		test-safetensors-model
tests		tests
.gitignore		.gitignore
.mailmap		.mailmap
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DCO.md		DCO.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README-DOCKER.md		README-DOCKER.md
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SPONSORS.md		SPONSORS.md
benchmark_results.json		benchmark_results.json
build.rs		build.rs
build_rs_cov.profraw		build_rs_cov.profraw
docker-compose.yml		docker-compose.yml
shimmy		shimmy
shimmy.exe		shimmy.exe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The 5MB Alternative to Ollama

Drop-in OpenAI API Replacement for Local LLMs

🚀 Works with Your Existing Tools

⚡ Zero Configuration Required

🎯 Perfect for Local Development

Quick Start (30 seconds)

Installation

🪟 Windows

🍎 macOS / 🐧 Linux

Get Models

Start Server

📦 Download & Install

Package Managers

Direct Downloads

🍎 macOS Support

Integration Examples

VSCode Copilot

Continue.dev

Cursor IDE

Why Shimmy Will Always Be Free

Performance Comparison

API Reference

Endpoints

CLI Commands

Technical Architecture

Community & Support

Star History

Sponsors

Quality & Reliability

License & Philosophy

About

Uh oh!

Releases

Packages

Languages

License

SimpleYj/shimmy

Folders and files

Latest commit

History

Repository files navigation

The 5MB Alternative to Ollama

Drop-in OpenAI API Replacement for Local LLMs

🚀 Works with Your Existing Tools

⚡ Zero Configuration Required

🎯 Perfect for Local Development

Quick Start (30 seconds)

Installation

🪟 Windows

🍎 macOS / 🐧 Linux

Get Models

Start Server

📦 Download & Install

Package Managers

Direct Downloads

🍎 macOS Support

Integration Examples

VSCode Copilot

Continue.dev

Cursor IDE

Why Shimmy Will Always Be Free

Performance Comparison

API Reference

Endpoints

CLI Commands

Technical Architecture

Community & Support

Star History

Sponsors

Quality & Reliability

License & Philosophy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages