GAMMA

Game Analyzing Model Methods Attentively

An interactive game that teaches you how LLMs work by letting you predict what they'll say next.

Play in your browser - No installation required!

See AGENTS.md for the active code-writing agent profile.

The project has evolved providing tools to experiment with, and benchmark, local models in a variety of ways.

Main features

The Game:

Try to guess which word the AI will choose next. See the probabilities in real-time. Learn how temperature, top-k, and sampling actually work by playing with them.

Mind Meld (Experimental):

Watch multiple models collaborate on the same response, swapping control dynamically based on confidence, patterns, or strategy.

Natural Language Commands:

Describe what you want to do, and GAMMA generates the command (either with a local model or an agentic CLI, such as Claude Code)

"I want to play with Gemma 2B using temperature 0.9"

python gamma.py game --engine pytorch --model google/gemma-2-2b-it --temperature 0.9

"Compare Qwen and DeepSeek on a coding prompt"

python gamma.py game --comparison \
  --comparison-models \
    ollama:qwen3-coder:30b \
    ollama:deepseek-r1:32b \
  --prompt "Write a Python function to calculate fibonacci"

"Meld Gemma models with dynamic blending"

python tools/run_mind_meld_cli.py gemma-1b gemma-2b --blend dynamic

"Run the creative preset with a custom prompt"

python tools/run_mind_meld_cli.py --preset creative --prompt "Once upon a time"

Get Started

# Install
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt          # Base dependencies

# Install engine of your choice:
pip install -r requirements-pytorch.txt  # HuggingFace/Transformers (recommended)
pip install -r requirements-llamacpp.txt # GGUF models
pip install -r requirements-mlx.txt      # Apple Silicon (fastest on Mac)
pip install -r requirements-vllm.txt     # High-throughput (NVIDIA GPU)

# Play
python gamma.py game

See Engine Documentation for all engine options including CUDA/ROCm.

GAMMA also auto-detects your Ollama models and HuggingFace cache.

See Game Documentation for more details.

Engines & Models

GAMMA supports multiple inference engines for running local LLMs. Engine status:

Production Ready

Engine	Backend	Models	Notes
PyTorch	MPS (Mac) / CUDA	HuggingFace models	Full feature support, recommended for HF models
MLX	Metal (Apple Silicon)	MLX-optimized models	Fastest on M1/M2/M3 Macs, ~2x faster than PyTorch MPS
LlamaCpp	Metal / CUDA / CPU	GGUF quantized models	Great for quantized models, low memory usage
Ollama	llama.cpp	Ollama library	Easy setup, auto-detects installed models

Experimental

Engine	Backend	Status
JAX/Flax	CPU / TPU	JIT tracing issues with some models
vLLM	CUDA	Requires NVIDIA GPU with CUDA; not supported on macOS or ROCm in GAMMA
ONNX Runtime	CPU / CUDA / CoreML	Requires ONNX-exported models
TensorFlow	CPU / GPU	Limited model support

Quick Engine Selection

# Apple Silicon Mac (fastest)
python gamma.py game --engine mlx --model mlx-community/gemma-2-2b-it-4bit

# Any Mac/Linux with PyTorch
python gamma.py game --engine pytorch --model google/gemma-2-2b-it

# Quantized GGUF models (low memory)
python gamma.py game --engine llamacpp --model models/model.gguf

# Ollama models (use GGUF with llama.cpp for logits)
python gamma.py game --engine llamacpp --model /path/to/ollama-model.gguf

Benchmark Results (Apple M-series)

Engine	Model	Tokens/sec	Latency p50
MLX	gemma-2-2b-it-4bit	10.8	92ms
PyTorch	phi-2 (2.7B)	5.8	146ms
LlamaCpp	qwen2-0.5b-q4	4.4	174ms

See Engine Documentation and Core Documentation for details.

Logits availability (game/comparison/mind-meld)

The game, comparison, and mind-meld modes require real logits (full token probability distributions). Wrapper engines do not expose logits via HTTP APIs, so the CLI will refuse them.

Engines without logits:

openai
huggingface_inference
ollama

If you are using an OpenAI-compatible vLLM server, you still need the native vllm engine to access logits.

KV cache sharing (Mind Meld) prefers direct transfer when prompt prefixes match; otherwise it replays the missing suffix through the target model to rebuild a correct cache. Replay aligns full-token prefixes to avoid tokenizer boundary drift. KV cache translation remains experimental and is only attempted when --allow-kv-cache-translation is set; safety checks will skip translation unless --force-kv-cache-translation is provided, and it still falls back to replay if translation is incompatible or fails.

More Example Usage

# Interactive menu (recommended)
python gamma.py game

# Quick game with defaults
python gamma.py game --engine llamacpp --model models/model.gguf

# Chat
python gamma.py game --chat --model qwen3-coder:30b

# Compare models
python gamma.py game --comparison \
  --comparison-models model1 model2

# Mind meld (new CLI with presets and aliases)
python tools/run_mind_meld_cli.py --preset creative
python tools/run_mind_meld_cli.py gemma-1b gemma-2b --blend dynamic --steps 50
python tools/run_mind_meld_cli.py gemma-1b@Optimist gemma-2b@Skeptic --preset debate

# Other common options
--help                     # Detailed explanation of commands
--temperature 0.7          # Sampling randomness (0.1-2.0)
--top-k 40                 # Top-K filtering
--top-p 0.95               # Nucleus sampling
--sampling-strategy sample # sample or argmax/greedy
--steps 50                 # Max generation steps
--show-attention           # Show attention heatmaps
--verbose                  # Detailed explanations
--prompt-chat-template     # Use chat template for --prompt/--initial-prompt (auto for instruct models)
--no-prompt-chat-template  # Force raw --prompt (skip chat template)
--prompt-system "TEXT"     # System prompt for chat templates
--no-default-system        # Disable the default system prompt
--no-step-delay            # Mind Meld: disable per-step delay
--summary-only             # Mind Meld: show only final output and brief stats (no live per-round stats)
--max-sentences N          # Mind Meld: stop after N sentences in the generated output
--shared-chat-template     # Mind Meld: reuse one chat template across models (auto-enabled when templates differ; disable with --no-shared-chat-template)
--stop-text "TEXT"         # Mind Meld: stop when generated output contains TEXT (repeatable; common chat end markers are used automatically when templates are applied)
--translate-logits         # Mind Meld: translate logits into the next model's vocab during swaps (experimental)
--order-neutral            # Mind Meld: alias for --use-weighted-average to reduce swap-order sensitivity
--soft-swap                # Mind Meld: blend all models each step but keep swap cadence by boosting the active model
--soft-swap-weight W       # Mind Meld: weight multiplier for the active model in --soft-swap (default 1.5)
--force-kv-cache-translation  # Mind Meld: force KV cache translation even when safety checks fail (unsafe)
--repetition-penalty 1.1   # Reduce repeated tokens during sampling (>1.0)

KV cache sharing prefers direct transfer when prompt prefixes match. When they differ, Mind Meld replays the missing suffix through the target model to rebuild its cache (lossless, but more compute) instead of copying KV entries across incompatible tokenizations. KV cache translation is only attempted when --allow-kv-cache-translation is set; safety checks will skip translation unless --force-kv-cache-translation is provided, and it still falls back to replay if it fails.

Additional Features

Mind Meld: Multi-model collaboration system
Benchmarks: Performance testing and DREAM suite
Comparison: Model comparison tools
Utilities: Profiling, caching, optimization
Integrations: OpenAI API, LangChain compatibility

License

MIT - See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.claude/commands		.claude/commands
.paws/cache		.paws/cache
configs/mind_meld		configs/mind_meld
docs		docs
examples/color_utils		examples/color_utils
flux		flux
gamma-core		gamma-core
mcp-server		mcp-server
mind_meld_results		mind_meld_results
models		models
output		output
results		results
scripts		scripts
sessions		sessions
src		src
tests		tests
tools		tools
web		web
.AGENTS		.AGENTS
.coverage		.coverage
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
EMOJI.md		EMOJI.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
benchmark_report.html		benchmark_report.html
gamma.py		gamma.py
mypy.ini		mypy.ini
requirements-base.txt		requirements-base.txt
requirements-core.txt		requirements-core.txt
requirements-cuda.txt		requirements-cuda.txt
requirements-jax.txt		requirements-jax.txt
requirements-llamacpp.txt		requirements-llamacpp.txt
requirements-mlx.txt		requirements-mlx.txt
requirements-onnx.txt		requirements-onnx.txt
requirements-pytorch.txt		requirements-pytorch.txt
requirements-rocm.txt		requirements-rocm.txt
requirements-tensorflow.txt		requirements-tensorflow.txt
requirements-vllm.txt		requirements-vllm.txt
requirements.txt		requirements.txt
run_gamma.sh		run_gamma.sh
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GAMMA

Main features

The Game:

Mind Meld (Experimental):

Natural Language Commands:

Get Started

Engines & Models

Production Ready

Experimental

Quick Engine Selection

Benchmark Results (Apple M-series)

Logits availability (game/comparison/mind-meld)

More Example Usage

Additional Features

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

clocksmith/gamma

Folders and files

Latest commit

History

Repository files navigation

GAMMA

Main features

The Game:

Mind Meld (Experimental):

Natural Language Commands:

Get Started

Engines & Models

Production Ready

Experimental

Quick Engine Selection

Benchmark Results (Apple M-series)

Logits availability (game/comparison/mind-meld)

More Example Usage

Additional Features

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages