AI-driven fuzz-testing toolkit that benchmarks LLM-generated fuzz drivers for C/C++ projects, detects hallucinations, analyses code quality & vulnerabilities, and orchestrates large-scale comparative experiments.
Modern fuzz testing of C/C++ libraries is bottlenecked by manual fuzz-driver authoring. Large Language Models can generate fuzz drivers automatically, but their output is noisy — hallucinated APIs, unsafe patterns, and code that doesn't even compile.
LLM Fuzz Monitor closes that gap by:
| Capability | What It Does |
|---|---|
| LLM Benchmarking | Runs 14+ models (Ollama, OpenAI, Anthropic, …) against 60+ C/C++ repos and compares quality metrics. |
| Hallucination Detection | Catches fabricated function calls, phantom imports, and semantic inconsistencies in generated code. |
| Code-Quality Analysis | Measures cyclomatic complexity, nesting depth, code smells, and computes an overall quality score (0–10). |
| Vulnerability Scanning | Pattern-based + taint-tracking detection of buffer overflows, command injection, use-after-free, and more. |
| Experiment Orchestration | Clones repos, invokes fuzzers via CI Fuzz, records every LLM interaction, and produces comparative reports. |
| Thread-Safe Storage | LZ4-compressed JSON/CSV/SQLite storage with file-locking, async write queues, and automatic log rotation. |
| Rich CLI Dashboard | 40+ commands, eight log-format parsers, real-time process monitoring, and export to JSON/CSV/HTML. |
┌───────────────┐
60+ C/C++ repos ──►│ Experiment │
│ Runner │
└──────┬────────┘
│ clone → invoke CI Fuzz w/ LLM
▼
┌───────────────┐
│ LLM Provider │ Ollama · OpenAI · Anthropic · …
│ Manager │
└──────┬────────┘
│ generated fuzz drivers
▼
┌──────────────────────────────────┐
│ Analysis Engines │
│ ┌──────────────────────────────┐ │
│ │ Hallucination Detector │ │
│ ├──────────────────────────────┤ │
│ │ Code-Quality Analyser │ │
│ ├──────────────────────────────┤ │
│ │ Vulnerability Analyser │ │
│ └──────────────────────────────┘ │
└──────────────┬───────────────────┘
│
▼
┌──────────────────────────────────┐
│ Storage Manager │
│ JSON · CSV · SQLite + LZ4 │
└──────────────┬───────────────────┘
│
▼
┌──────────────────────────────────┐
│ CLI / Dashboard │
│ Rich tables · progress bars │
│ 8 log parsers · HTML export │
└──────────────────────────────────┘
See docs/architecture.md for detailed design decisions.
git clone https://github.com/DARREN-2000/llm-integration.git
cd llm-integration
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
pip install -e .# Install Ollama: https://ollama.com/download
ollama pull deepseek-coder:33b
ollama pull codellama:34b-instruct# Single model, single repo
python -m llm_fuzz_monitor.experiments.runner \
--model deepseek-coder:33b \
--repo zlib
# Full comparative study (all 14 models × all repos)
python -m llm_fuzz_monitor.experiments.runner \
--config config/experiment_config.yaml \
--phase validationllm-fuzz-monitor --config config/config.yamlRun everything in containers — no local install required:
# Build & start (Ollama + monitor)
docker compose up -d
# Tail logs
docker compose logs -f monitorThe docker-compose.yml starts:
- Ollama — GPU-accelerated local inference on port
11434 - Monitor — the experiment runner and CLI
All configuration lives in config/:
| File | Purpose |
|---|---|
config.yaml |
LLM endpoints, monitoring intervals, storage settings |
models.yaml |
14 pre-configured Ollama models with token limits and priority |
experiment_config.yaml |
6 experiment phases from quick validation (20 min) to full study (12 h) |
repositories.yaml |
60+ C/C++ test repositories grouped by complexity |
llm_environments.yaml |
Per-model environment overrides |
Copy .env.example to .env to set API keys:
cp .env.example .env
# Edit .env with your keys (only needed for cloud providers)| Category | Models |
|---|---|
| Code Specialists | deepseek-coder:33b · codellama:34b · starcoder2:15b · qwen2.5-coder:32b · wizardcoder:33b · devstral |
| General Purpose | deepseek-r1:32b · qwen3:32b · yi:34b · gemma3:27b · mixtral · magistral:24b · phi4:14b · llama3 |
Click to expand the full list
| Group | Repos | Build Time |
|---|---|---|
| C — extra small | miniz, stb, minimp3, tinyexpr | ~1 min |
| C — small | zlib, libpng, libjpeg-turbo, libwebp, lz4, cJSON, libspng | 2–4 min |
| C — large | libxml2, curl, freetype, libarchive, … | 5–10 min |
| C++ — small | nlohmann/json, fmt, glm, spdlog, re2, cereal, rapidjson, … | 2–4 min |
| C++ — large | protobuf, abseil-cpp, leveldb, rocksdb, opencv, grpc, folly | 8–30 min |
# Install dev dependencies
pip install -r requirements-dev.txt
pip install -e ".[dev,perf]"
# Lint
make lint
# Tests (60 unit tests)
make test
# Tests with coverage
make test-covSee CONTRIBUTING.md for the full contribution workflow.
.
├── llm_fuzz_monitor/ # Python package
│ ├── __init__.py # Lazy imports, package metadata
│ ├── core/
│ │ └── models.py # Data models, enums, config, exceptions
│ ├── storage/
│ │ └── manager.py # Thread-safe storage, compression, SQLite
│ ├── analysis/
│ │ └── engines.py # Hallucination / quality / vuln analysers
│ ├── cli/
│ │ └── main.py # Rich CLI, log parsers, process monitor
│ └── experiments/
│ ├── runner.py # Automated experiment orchestration
│ └── monitor.py # Daemon / entry-point script
├── config/ # YAML configuration files
├── tests/ # Pytest test suite (60 tests)
├── docs/ # Architecture documentation
├── Dockerfile # Multi-stage production image
├── docker-compose.yml # Ollama + monitor stack
├── Makefile # Dev shortcuts
├── pyproject.toml # PEP 621 packaging
├── requirements.txt # Production dependencies
├── requirements-dev.txt # Dev / test dependencies
├── .github/workflows/ci.yml # CI pipeline (lint → test → docker)
├── LICENSE # MIT
└── CONTRIBUTING.md
| Provider | Transport | Auth | Notes |
|---|---|---|---|
| Ollama | HTTP REST | None | Local GPU inference — recommended |
| OpenAI | HTTP REST | API key | GPT-4, GPT-3.5-turbo |
| Anthropic | HTTP REST | API key | Claude 3 family |
| HuggingFace | HTTP REST | API token | Inference API |
| LocalAI | HTTP REST | None | OpenAI-compatible local server |
Install provider extras as needed:
pip install -e ".[llm]" # openai + anthropic
pip install -e ".[perf]" # lz4, ujson, scipy
pip install -e ".[docker]" # docker SDKThe CLI ships with eight built-in parsers:
- AFL —
execs_done,paths_total, crashes - LibFuzzer —
#N INITED,exec/s, coverage - HonggFuzz — iterations, speed, unique crashes
- CI Fuzz — structured JSON output
- LLM output — token counts, generation time
- Compiler — error / warning extraction
- Coverage —
gcov/lcovpercentage parsing - Crash — ASAN / MSAN / UBSAN reports
If you'd like a catchier name for the repository, here are some options:
| Name | Rationale |
|---|---|
| fuzzwise | "Wise" fuzzing powered by LLMs |
| fuzz-forge | Forging fuzz drivers with AI |
| llm-fuzz-bench | LLM Fuzzing Benchmark — descriptive |
| autofuzz-ai | Automated AI fuzzing |
MIT — use freely for research and production.