Skip to content

clocksmith/gamma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GAMMA

Game Analyzing Model Methods Attentively

An interactive game that teaches you how LLMs work by letting you predict what they'll say next.

Play in your browser - No installation required!

See AGENTS.md for the active code-writing agent profile.

The project has evolved providing tools to experiment with, and benchmark, local models in a variety of ways.


Main features

The Game:

Screenshot 2025-10-31 at 8 04 34 PM

Try to guess which word the AI will choose next. See the probabilities in real-time. Learn how temperature, top-k, and sampling actually work by playing with them.

Mind Meld (Experimental):

Screenshot 2025-10-31 at 8 21 05 PM

Watch multiple models collaborate on the same response, swapping control dynamically based on confidence, patterns, or strategy.

Natural Language Commands:

Screenshot 2025-10-31 at 8 32 19 PM

Describe what you want to do, and GAMMA generates the command (either with a local model or an agentic CLI, such as Claude Code)

"I want to play with Gemma 2B using temperature 0.9"

python gamma.py game --engine pytorch --model google/gemma-2-2b-it --temperature 0.9

"Compare Qwen and DeepSeek on a coding prompt"

python gamma.py game --comparison \
  --comparison-models \
    ollama:qwen3-coder:30b \
    ollama:deepseek-r1:32b \
  --prompt "Write a Python function to calculate fibonacci"

"Meld Gemma models with dynamic blending"

python tools/run_mind_meld_cli.py gemma-1b gemma-2b --blend dynamic

"Run the creative preset with a custom prompt"

python tools/run_mind_meld_cli.py --preset creative --prompt "Once upon a time"

Get Started

# Install
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt          # Base dependencies

# Install engine of your choice:
pip install -r requirements-pytorch.txt  # HuggingFace/Transformers (recommended)
pip install -r requirements-llamacpp.txt # GGUF models
pip install -r requirements-mlx.txt      # Apple Silicon (fastest on Mac)
pip install -r requirements-vllm.txt     # High-throughput (NVIDIA GPU)

# Play
python gamma.py game

See Engine Documentation for all engine options including CUDA/ROCm.

GAMMA also auto-detects your Ollama models and HuggingFace cache.

See Game Documentation for more details.


Engines & Models

GAMMA supports multiple inference engines for running local LLMs. Engine status:

See also: Model Formats & Engines

Production Ready

Engine Backend Models Notes
PyTorch MPS (Mac) / CUDA HuggingFace models Full feature support, recommended for HF models
MLX Metal (Apple Silicon) MLX-optimized models Fastest on M1/M2/M3 Macs, ~2x faster than PyTorch MPS
LlamaCpp Metal / CUDA / CPU GGUF quantized models Great for quantized models, low memory usage
Ollama llama.cpp Ollama library Easy setup, auto-detects installed models

Experimental

Engine Backend Status
JAX/Flax CPU / TPU JIT tracing issues with some models
vLLM CUDA Requires NVIDIA GPU with CUDA; not supported on macOS or ROCm in GAMMA
ONNX Runtime CPU / CUDA / CoreML Requires ONNX-exported models
TensorFlow CPU / GPU Limited model support

Quick Engine Selection

# Apple Silicon Mac (fastest)
python gamma.py game --engine mlx --model mlx-community/gemma-2-2b-it-4bit

# Any Mac/Linux with PyTorch
python gamma.py game --engine pytorch --model google/gemma-2-2b-it

# Quantized GGUF models (low memory)
python gamma.py game --engine llamacpp --model models/model.gguf

# Ollama models (use GGUF with llama.cpp for logits)
python gamma.py game --engine llamacpp --model /path/to/ollama-model.gguf

Benchmark Results (Apple M-series)

Engine Model Tokens/sec Latency p50
MLX gemma-2-2b-it-4bit 10.8 92ms
PyTorch phi-2 (2.7B) 5.8 146ms
LlamaCpp qwen2-0.5b-q4 4.4 174ms

See Engine Documentation and Core Documentation for details.

Logits availability (game/comparison/mind-meld)

The game, comparison, and mind-meld modes require real logits (full token probability distributions). Wrapper engines do not expose logits via HTTP APIs, so the CLI will refuse them.

Engines without logits:

  • openai
  • huggingface_inference
  • ollama

If you are using an OpenAI-compatible vLLM server, you still need the native vllm engine to access logits.

KV cache sharing (Mind Meld) prefers direct transfer when prompt prefixes match; otherwise it replays the missing suffix through the target model to rebuild a correct cache. Replay aligns full-token prefixes to avoid tokenizer boundary drift. KV cache translation remains experimental and is only attempted when --allow-kv-cache-translation is set; safety checks will skip translation unless --force-kv-cache-translation is provided, and it still falls back to replay if translation is incompatible or fails.


More Example Usage

# Interactive menu (recommended)
python gamma.py game

# Quick game with defaults
python gamma.py game --engine llamacpp --model models/model.gguf

# Chat
python gamma.py game --chat --model qwen3-coder:30b

# Compare models
python gamma.py game --comparison \
  --comparison-models model1 model2

# Mind meld (new CLI with presets and aliases)
python tools/run_mind_meld_cli.py --preset creative
python tools/run_mind_meld_cli.py gemma-1b gemma-2b --blend dynamic --steps 50
python tools/run_mind_meld_cli.py gemma-1b@Optimist gemma-2b@Skeptic --preset debate

# Other common options
--help                     # Detailed explanation of commands
--temperature 0.7          # Sampling randomness (0.1-2.0)
--top-k 40                 # Top-K filtering
--top-p 0.95               # Nucleus sampling
--sampling-strategy sample # sample or argmax/greedy
--steps 50                 # Max generation steps
--show-attention           # Show attention heatmaps
--verbose                  # Detailed explanations
--prompt-chat-template     # Use chat template for --prompt/--initial-prompt (auto for instruct models)
--no-prompt-chat-template  # Force raw --prompt (skip chat template)
--prompt-system "TEXT"     # System prompt for chat templates
--no-default-system        # Disable the default system prompt
--no-step-delay            # Mind Meld: disable per-step delay
--summary-only             # Mind Meld: show only final output and brief stats (no live per-round stats)
--max-sentences N          # Mind Meld: stop after N sentences in the generated output
--shared-chat-template     # Mind Meld: reuse one chat template across models (auto-enabled when templates differ; disable with --no-shared-chat-template)
--stop-text "TEXT"         # Mind Meld: stop when generated output contains TEXT (repeatable; common chat end markers are used automatically when templates are applied)
--translate-logits         # Mind Meld: translate logits into the next model's vocab during swaps (experimental)
--order-neutral            # Mind Meld: alias for --use-weighted-average to reduce swap-order sensitivity
--soft-swap                # Mind Meld: blend all models each step but keep swap cadence by boosting the active model
--soft-swap-weight W       # Mind Meld: weight multiplier for the active model in --soft-swap (default 1.5)
--force-kv-cache-translation  # Mind Meld: force KV cache translation even when safety checks fail (unsafe)
--repetition-penalty 1.1   # Reduce repeated tokens during sampling (>1.0)

KV cache sharing prefers direct transfer when prompt prefixes match. When they differ, Mind Meld replays the missing suffix through the target model to rebuild its cache (lossless, but more compute) instead of copying KV entries across incompatible tokenizations. KV cache translation is only attempted when --allow-kv-cache-translation is set; safety checks will skip translation unless --force-kv-cache-translation is provided, and it still falls back to replay if it fails.


Additional Features


License

MIT - See LICENSE

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published