Model Management

Name: Osaurus
Author: Osaurus

Osaurus supports a wide variety of MLX-optimized models and Apple Foundation Models. This guide covers model management and configuration.

Model Manager

Access the Model Manager through the Osaurus menu bar icon.

Downloading Models

Click the Osaurus menu bar icon
Select Model Manager
Browse or search for models
Click Download on your chosen model
Monitor progress in the download queue

Model Information

Each model displays:

Name — Model identifier
Size — Download and disk size
Quantization — Bit precision (4-bit, 8-bit)
Parameters — Model size in billions
Download Status — Current state

Managing Storage

Models are stored at:

~/MLXModels

Override this location with the OSU_MODELS_DIR environment variable.

To remove models:

Open Model Manager
Find the downloaded model
Click Delete
Confirm removal

Model Types

MLX Models

MLX models are optimized specifically for Apple Silicon. Osaurus supports a wide range of model architectures:

Supported Architectures:

Llama — Meta's Llama 3.2, Llama 3.1, and earlier versions
Qwen — Alibaba's Qwen 2.5 series
Gemma — Google's Gemma and Gemma 2 models
Mistral — Mistral AI's instruction-tuned models
DeepSeek — DeepSeek Coder and general models
Phi — Microsoft's Phi series

Quantization Options:

4-bit Quantization — Best speed/quality trade-off
8-bit Quantization — Higher quality, more memory
16-bit — Maximum quality, significant memory usage

Apple Foundation Models

Available on supported macOS versions:

# Use with model ID "foundation"
curl -X POST http://127.0.0.1:1337/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "foundation",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Features:

System-integrated model
No download required
Optimized for Apple Silicon
Privacy-focused design

Liquid Foundation Models

Osaurus supports Liquid AI's LFM family — on-device models built on a non-transformer architecture optimized for edge deployment. Fast decode, low memory footprint, and strong tool calling out of the box.

Features:

Non-transformer architecture tuned for edge inference
Fast token generation on Apple Silicon
Low memory footprint compared to equivalent-quality transformers
Strong tool calling performance

Cloud Providers

The harness is model-agnostic. Connect to cloud providers when you need more power — your agents, memory, and tools stay intact regardless of which provider you use.

Provider	Description
OpenAI	GPT-4o, GPT-4, and other OpenAI models
Anthropic	Claude family of models
Gemini	Google's Gemini models
xAI / Grok	xAI's Grok models
Venice AI	Privacy-focused, uncensored inference with no data retention
OpenRouter	Unified access to multiple model providers
Ollama	Local and remote Ollama instances
LM Studio	Local model serving via LM Studio

Context and memory persist across all providers. Switch freely without losing what the AI has learned about you.

Model Naming Convention

Osaurus uses consistent model naming:

{model-family}-{version}-{size}-{variant}-{quantization}

Examples:

llama-3.2-3b-instruct-4bit
mistral-7b-instruct-v0.2-4bit
deepseek-coder-7b-instruct-4bit

Performance Characteristics

Memory Requirements

Model Size	4-bit	8-bit	16-bit
2-3B	2-3GB	4-6GB	8-12GB
7-8B	4-5GB	8-10GB	16-20GB
13B	8-10GB	16-20GB	32-40GB
30B+	20-25GB	40-50GB	80-100GB

Speed Benchmarks

Typical tokens per second on M2:

Model	4-bit	8-bit
3B	40-60	30-45
7B	20-35	15-25
13B	12-20	8-15

Model Configuration

Context Length

Default context lengths by model family:

Llama 3.2 — 4096 tokens
Mistral — 8192 tokens
Qwen 2.5 — 32768 tokens
DeepSeek — 4096 tokens

Temperature Settings

Recommended temperature ranges:

Creative Writing — 0.7-1.0
Code Generation — 0.1-0.3
General Chat — 0.5-0.7
Factual Responses — 0.0-0.3

System Prompts

Configure default system prompts in Settings:

{
  "model": "llama-3.2-3b-instruct-4bit",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful, concise assistant."
    },
    {
      "role": "user",
      "content": "Explain quantum computing"
    }
  ]
}

Advanced Configuration

There are no global model aliasing or preloading options at this time. Control behavior per request via the OpenAI-compatible API.

Troubleshooting

Model Not Found

Verify model is downloaded in Model Manager
Check exact model name:
```
curl http://127.0.0.1:1337/v1/models
```
Ensure correct spelling and format

Slow Performance

Check Activity Monitor for memory pressure
Try smaller or more quantized models
Close unnecessary applications
Reduce context length in requests

Download Issues

Check internet connection
Verify available disk space
Try pausing and resuming download
Check Model Manager logs

Memory Errors

Monitor RAM usage during inference
Switch to more quantized versions
Reduce max_tokens in requests
Consider smaller models

Model Updates

Osaurus periodically updates available models:

New models appear automatically in Model Manager
Updated versions are marked with badges
Old versions remain usable until deleted
Check GitHub releases for model announcements

For model help, join our Discord community or check the benchmarks page.

Model Manager​

Downloading Models​

Model Information​

Managing Storage​

Model Types​

MLX Models​

Apple Foundation Models​

Liquid Foundation Models​

Cloud Providers​

Model Naming Convention​

Performance Characteristics​

Memory Requirements​

Speed Benchmarks​

Model Configuration​

Context Length​

Temperature Settings​

System Prompts​

Advanced Configuration​

Troubleshooting​

Model Not Found​

Slow Performance​

Download Issues​

Memory Errors​

Model Updates​