A curated collection of 195+ state-of-the-art open source AI capabilities, packaged in self-contained Docker environments. Find a recipe, test it, copy what you need.
CVlization stands on the shoulders of giants - we package and test amazing open source projects so you can use them with confidence.
git clone --depth 1 https://github.com/kungfuai/CVlization
cd CVlization
# Install CLI (optional - or just use bash scripts)
pip install .
# Optional: install with remote execution support
pip install .[remote] # SSH runner
pip install .[aws] # SageMaker runner
pip install .[deploy] # Serverless deployment (Cerebrium)
# Browse examples
cvl list # compact grouped overview
cvl list -k gpt # search by keyword
cvl list --format list # detailed view
# or browse examples/ on GitHub
# Run any example
cvl run nanogpt build
cvl run nanogpt train
# Copy into your project (bundled with cvlization)
cvl export perception/image_classification/torch -o your-project/That's it! Each example is self-contained with its own Dockerfile and dependencies. (We battled CUDA versions and dependency conflicts so you don't have to.)
Our examples/ directory is organized by capability (what the model does) rather than modality (what data it processes).
examples/
analytical/ # Prediction & forecasting (time series, tabular ML)
perception/ # Understand signals (vision, speech, multimodal)
generative/ # Create content (text, images, video, audio, avatars)
physical/ # Robotics & embodied AI (vision-language-action models)
agentic/ # AI agents (RAG, tool use, optimization, workflows)
Legend: ✅ = Tested and maintained | 🧪 = Experimental
| Capability | Example Directory | Implementations | Status |
|---|---|---|---|
Image Classification |
examples/perception/image_classification |
torch, cifar10-speedrun | ✅ |
Object Detection |
examples/perception/object_detection |
mmdet, torchvision, rt-detr, yolov13 | ✅ |
Segmentation |
examples/perception/segmentation |
instance (mmdet, torchvision), semantic (mmseg, torchvision), panoptic (detectron2, mmdet, torchvision), sam, sam_lora_finetuning, sam3, sam3_finetuning, flowrvs | ✅ |
Pose Estimation |
examples/perception/pose_estimation |
dwpose, mmpose | ✅ |
examples/perception/tracking |
global_tracking_transformer, soccer_visual_tracking | ✅ | |
Line Detection |
examples/perception/line_detection |
torch | ✅ |
Document AI |
examples/perception/doc_ai |
OCR (chandra_ocr, deepseek_ocr, docling, doctr, dots_ocr, nanonets_ocr, olmocr_2, paddleocr_vl, surya), VLMs (donut_doc_classification, donut_doc_parse, granite_docling, granite_docling_finetune), Layout (doclayout_yolo), Parsing (dolphin_v2, churro_3b, extract0, nvidia_nemotron_parse), Leaderboard (leaderboard) | ✅ |
Vision-Language Models |
examples/perception/vision_language |
florence_2 (+ finetune), gemma3_vision (+ grpo, sft), internvl3, joycaption_llava, kosmos2_grounded_ocr, lighton_ocr, llama3_vision, llava_next_video, minicpm_v_2_6, molmoe_1b, moondream2 (+ finetune), moondream3, owl_vit, paligemma2 (detection, segmentation), phi_3_5_vision_instruct, phi_4_multimodal_instruct, pixtral_12b, qwen3_vl | ✅ |
3D Reconstruction |
examples/perception/3d_reconstruction |
dust3r, mast3r, monst3r, hunyuanworld_mirror, map_anything, nerf_tf (experimental) | ✅ |
| Capability | Example Directory | Implementations | Status |
|---|---|---|---|
LLMs (text generation) |
examples/generative/llm |
Pretraining (nanogpt, modded_nanogpt, nanomamba), Full pipeline (nanochat: pretrain, sft, rl), Fine-tuning (peft_mistral7b_sft, trl_sft, miles_grpo, unsloth: gpt_oss_grpo, gpt_oss_sft, llama_3b_sft, qwen_7b_sft, gemma3_4b_sft), Inference (mixtral8x7b, sglang, vllm, dllm, nanbeige4_3b_thinking, nomos_1, rnj_1_instruct), Diffusion LLM (semicat), Interpretability (gemma_scope_2_270m_it) | ✅ |
Image Generation |
examples/generative/image_generation |
cfm, ddpm, diffuser_unconditional, dit, dreambooth, edm2, flux, next_scene_qwen, qwen_image_layered, rae, repa, stable_diffusion, uva_energy (experimental), vqgan | ✅ |
Video Generation |
examples/generative/video_generation |
animate_diff, animate_diff_cog, animate_x, cogvideox, ctrl_world, deforum, dove, flashvsr, framepack, hunyuan_video_1_5, kandinsky_5, krea_realtime_scope (experimental), longcat_video, ltx2, mimic_motion, minisora, phantom (experimental), propainter, real_video, reward_forcing, skyreals, svd_cog, svd_comfy, turbodiffusion, vace, vace_comfy (experimental), video2x, video_enhancement, video_in_between, wan2gp, wan2gp_wan, wan_animate, wan_comfy, worldcanvas | ✅ |
| Text-to-Speech (TTS) | examples/generative/audio |
cosyvoice3, vibevoice_realtime, voxcpm | ✅ |
| Avatar & Talking Head | examples/generative/video_generation/avatar |
anytalker, egstalker, fastavatar, flashportrait, hunyuanvideo_avatar, imtalker, lite_avatar, live_avatar, livetalk, longcat_video_avatar, personalive, wan_s2v | ✅ |
| Capability | Example Directory | Implementations | Status |
|---|---|---|---|
| Time Series Forecasting | examples/analytical/time_series |
Foundation models (chronos_zero_shot, moirai_zero_shot, uni2ts_finetune (experimental)), Supervised (patchtst_supervised), Statistical baselines (statsforecast_baselines), Hierarchical (hierarchical_reconciliation), Anomaly detection (anomaly_transformer, merlion_anomaly_dashboard) | ✅ |
| Tabular ML - AutoML | examples/analytical/tabular/automl |
autogluon_structured, pycaret_structured | ✅ |
| Tabular ML - Causal Inference | examples/analytical/tabular/causal |
causalml_campaign_optimization, dowhy_berkeley_bias, dowhy_policy_uplift, econml_heterogeneous_effects | ✅ |
| Tabular ML - Uncertainty Quantification | examples/analytical/tabular/uncertainty |
Conformal (conformal_lightgbm, mapie_conformal), Quantile (catboost_quantile, quantile_lightgbm), Bayesian (pymc_bayesian_regression) | ✅ |
| Tabular ML - Business Use Cases | examples/analytical/tabular |
Customer analytics (gbt_telco_churn), Marketing (gbt_upsell_propensity), Risk scoring (gbt_credit_default), Regression (gbt_housing_prices), Recommendation (ranking_lightgbm), Survival (pycox_retention), Anomaly detection (pyod_fraud_detection), Feature engineering (autofe_structured) | ✅ |
| Capability | Example Directory | Implementations | Status |
|---|---|---|---|
| Vision-Language-Action Models | examples/physical |
openvla_single_step, openvla_simplerenv | 🧪 |
| World Models | examples/physical |
stable_worldmodel_dmc | 🧪 |
| Capability | Example Directory | Implementations | Status |
|---|---|---|---|
| RAG & Knowledge | examples/agentic/rag |
langgraph_helpdesk, clara | 🧪 |
| LlamaIndex Agents | examples/agentic/llamaindex |
graph_rag_cognee, jsonalyze_structured_qa, react_finance_query_agent | 🧪 |
| Prompt Optimization | examples/agentic/optimization |
dspy_gepa_promptops, mcts_prompt_agent | 🧪 |
| Tool Use & Coding | examples/agentic |
Code (autogen_pair_programmer), Data analysis (smolagents_data_analyst), Data preparation (physio_signal_prep), Local AI (llamacpp_assistant) | 🧪 |
| Formal Reasoning | examples/agentic/formal |
nanoproof (Lean theorem proving), qed-nano (Olympiad level math proof generation) | 🧪 |
| Long Context (RLM) | examples/agentic/long_context |
rlm_needle (needle-in-haystack), rlm_doc_qa (document QA), rlm_claude_code (Claude Code native) | 🧪 |
Note: These examples are regularly updated and tested to ensure compatibility with the latest dependencies. We recommend starting with the nanogpt example.
Browse on GitHub: perception/ • generative/ • analytical/ • physical/ • agentic/
Browse via CLI: cvl list for a compact overview, or cvl list -k <keyword> to search
You can run examples using the cvl CLI or directly with bash scripts.
Option 1: Using cvl CLI (recommended)
cvl run nanogpt build
cvl run nanogpt trainOption 2: Using bash scripts directly
cd examples/generative/llm/nanogpt
bash build.sh
bash train.shMore examples:
# RL post-training with GRPO (cutting-edge reasoning model training)
cvl run miles_grpo train
# Generate video from text prompt (Tencent HunyuanVideo)
cvl run hunyuan-video-1-5 predict --prompt "A cat playing piano"
# Document AI extraction (IBM Granite-Docling)
cvl run granite-docling predict -i input_pdf=@document.pdf
# RAG agent with LangGraph
cvl run agentic-rag-langgraph-helpdesk predict --question "How do I list examples?"For detailed instructions and available options, see the README.md in each example directory.
License Note: Each example may reference projects with different licenses. Check the license file in each example directory.
Run examples on cloud infrastructure instead of locally:
# AWS SageMaker (managed training)
cvl run nanogpt train --runner sagemaker --spot --output-path s3://bucket/outputs
# SkyPilot (any cloud: AWS, GCP, Azure, Lambda Labs)
cvl run nanogpt train --runner skypilot --gpu A100:1 --cloud aws
# SSH (existing GPU server)
cvl run nanogpt train --runner ssh user@gpu-hostManage remote jobs:
cvl jobs list --runner sagemaker
cvl jobs logs <job-id>
cvl jobs kill <job-id>See cvl/runners/README.md for setup instructions.
Deploy inference endpoints to serverless GPU platforms:
# Deploy to Cerebrium
cvl deploy ltx2 --gpu L40
# Manage deployed services
cvl services list
cvl services status ltx2
cvl services logs ltx2
cvl services delete ltx2See cvl/deployers/README.md for setup and supported platforms.
Centralized Caching: All examples use ~/.cache/cvlization/ for models and datasets, avoiding re-downloads across examples:
- Automatic caching for HuggingFace Hub, PyTorch, and custom downloads
- Managed by build scripts - no manual setup required
- Shared across all examples to save disk space and bandwidth
Self-Contained Docker Environments: Each example is isolated with pinned dependencies:
- CUDA and dependency conflicts already resolved - saves hours or days of setup
- Source code is mounted at runtime (not baked into images) for easy iteration
- Dependency versions are pinned where possible for reproducibility
Production-Ready Patterns: Copy what works into your projects:
- Consistent build/train/predict script structure across all examples
- Battle-tested configurations for 190+ AI capabilities
- Examples regularly updated and tested for compatibility
- Docker (Install Docker)
- NVIDIA GPU (most examples require 16GB+ VRAM; A10 or better recommended)
- nvidia-docker for GPU access
# Ubuntu sudo apt-get install -y nvidia-container-toolkit
No installation needed - run examples directly in Google Colab: Colab notebook: CIFAR-10 classification
CVlization includes Claude Code skills for AI-assisted development and automated verification:
verify-training-pipeline- Validates training examples are properly structured, build successfully, train without errors, and log appropriate metricsverify-inference-example- Validates inference examples build correctly and run inference successfully
These skills enable Claude to automatically verify examples end-to-end, helping maintain code quality across the repository.
Contributing Guidelines:
- See CONTRIBUTING.md for standardization patterns and best practices
- All examples should follow the build/train/predict script pattern
- Use verification metadata in
example.yamlto track testing status
examples/: Dockerized AI examples (perception, generative, analytical, physical, agentic)cvl/: CLI tool source codecvlization/: Optional reusable library componentsdoc/: Project documentation, includingdoc/runners/for cloud setup guidestests/: Unit and integration tests
The cvlization/ directory provides optional reusable components:
Available:
- Training pipeline abstractions (image classification, object detection, LLMs, diffusion)
- Dataset builders with caching (PyTorch, TensorFlow, HuggingFace)
- Model factories (Torchvision, MMDetection, MMSegmentation)
- Utilities (metrics, logging, download helpers)
Installation: pip install -e .
Note: Examples are self-contained and don't require the library. For production use, copying example code directly is often simpler than depending on the library package.
Detailed documentation can be found in the doc/ directory:
- Computer vision model training workflow and quality checks
- Multi-task multi-input models: a common pattern
- Reusable model components
CVlization Library & CLI: MIT License
- The
cvlizationpackage andcvlCLI tool are released under the MIT License - Safe for commercial use
Examples Directory: Mixed Licenses
- Examples may reference projects with various licenses (copyleft, non-commercial, etc.)
- Examples are NOT included when you
pip install cvlization - Always check the license file in each example directory before using in production
Note: Each example packages different open-source projects with their own licenses. Review licenses carefully for your use case.









