Aryan Garg1, Sizhuo Ma2, Mohit Gupta1
1 University of Wisconsin-Madison
2 Snap, Inc
- Updates
- Installation
- Quick Start
- Pretrained Models and Dataset
- Inference
- Training
- Citation
- Acknowledgements
- Contact
- FAQs
9-Apr, 2026 gQIR is accepted to CVPR as a highlight! (Top 10% Accepted Papers)
8-Apr, 2026: SD3.5 VAE Code and weights also open-sourced now!
Recommended Hardware: NVIDIA RTX 4090 (CUDA version: 12.5)
conda env create -f environment.yml
conda activate gqirsh download_raft_weights.shMake sure ./pretrained_ckpts/models/raft-things.pth is a valid path & properly downloaded RAFT weight. (This is used in all burst pipelines. See L257 of infer_burst.py)
Launching the local gradio demo:
Once you are in the gqir env and the repo's root dir, run:
python gradio_app.py --single-config configs/inference/eval_sd2GAN.yaml --burst-config configs/inference/eval_burst_mosaic.yaml --device cudaOther optional args you can specify:
- --port
- --local
- --share (Creates a publicly share-able url for the demo)
See full Model card at HuggingFaces 🤗: aRy4n/gQIR
| Color-Model Name | Stage | Bit Depth | 🤗 Download Link |
|---|---|---|---|
| qVAE | Stage 1 | 1-bit | 1965000.pt |
| Adversarial Diffusion LoRA-UNet | Stage 2 | 1-bit | state_dict.pth |
| qVAE | Stage 1 | 3-bit | 0105000.pt |
| Adversarial Diffusion LoRA-UNet | Stage 2 | 3-bit | state_dict.pth |
| FusionViT | Stage 3 | 3-bit | fusion_vit_0050000.pt |
| Monochrome-Model | Stage | Bit Depth | 🤗 Download Link |
|---|---|---|---|
| qVAE | Stage 1 | 3-bit | 0150000.pt |
| Adversarial Diffusion LoRA-UNet | Stage 2 | 3-bit | state_dict.pth |
| FusionViT | Stage 3 | 3-bit | fusion_vit_0020000.pt |
The 390 videos large XD dataset and it's description can be found at HuggingFaces 🤗: aRy4n/eXtreme-Deformable.
The real color SPAD captures can be found at HuggingFaces 🤗: aRy4n/real-color-SPAD-indoor6.
Careful while downloading! The dataset is ~83Gb big.
Special thanks to Avery Gump for helping capture this dataset.
Image-datasets (for stages 1 & 2 only)
- DIV2K
- FFHQ (for enhancing facial reconstruction)
- Flickr2K
- LAION-170M HQ (smaller subset used still)
- Landscapes HQ
Video-datasets (for all 3 stages):
- Visionsim. Jungerman et al. Download dataset from the Single Photon Challenge Download-Page
- I2-2000fps. From the papers: QUIVER & QuDI.
- XVFI (4096×2160 cropped to 768x768. Each video capped at 5,000 frames/5 seconds)
- UDM10 - 10 videos @24fps (Testing only)
- SPMC Videos - 30 videos @24fps (Testing only)
- REDS - 240 videos @120fps
- YouHQ
- XD (Testing only)
test_txt_files for examples on how to create dataset txt files.
For Stage 1, simply add the --only_vae flag at the end of any of the shown below example commands.
python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_color.yaml --eval_single_image --single_img_path "<path-to-dataset>/DIV2K_valid_HR/0829.png"python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_mono.yaml --eval_gt_dir --gt_dir <path-to-gt-dir> python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_color.yaml --ds_txt ds_txt_real_captures.txt --real_capturesFor realistic (77 GT frames --> 77 binary frames) burst eval:
python3 infer_burst_realistic.py --config configs/inference/eval_burst_mosaic.yamlFor QUIVER-style (11 GT frames --> 77 binary frames) burst eval:
python3 infer_burst.py --config configs/inference/eval_burst_mosaic.yamlconda activate hypir 1-bit qVAE:
CUDA_VISIBLE_DEVICES=8,9,10,11,12,13,14,15 accelerate launch --main_process_port 29502 train_s1_mosaic.py --config configs/train/train_s1_mosaic_1bit.yaml3-bit qVAE:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --main_process_port 29503 train_s1_mosaic.py --config configs/train/train_s1_mosaic_3bit.yamlpython3 train_sd2GAN.py --config configs/train/train_sd2gan.yaml For comparison with QUIVER & QBP:
CUDA_VISIBLE_DEVICES=1 python3 train_burst.py --config configs/train/train_burst.yaml For color-burst model:
CUDA_VISIBLE_DEVICES=1 python3 train_burst.py --config configs/train/train_burst_mosaic.yaml Precomputing latents:
conda activate hypir && cd apgi/gQVR
python3 infer_sd2GAN_stage2.py --config configs/inference/eval_sd2GAN.yaml --device "cuda:0" --ds_txt dataset_txt_files/video_dataset_txt_files/combined_part00.txtPlease cite our work if you find it useful. Thanks! :)
@InProceedings{garg_2026_gqir,
author = {Garg, Aryan and Ma, Sizhuo and Gupta, Mohit},
title = {gQIR: Generative Quanta Image Reconstruction},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
}This project is based on XPixelGroup's projects: DiffBIR and HYPIR. Thanks for their amazing work.
Additionally, the project was supported by Ubicept for compute (cloud credits).
If you have any questions, please feel free to contact me at agarg54@wisc.edu or raise an issue here.
- What is the minmum average PPP where gQIR works?
1PPP average was tested. It might work below that limit as well. Contrast correction is needed. Note: Training PPP was fixed at 3.25PPP (alpha = 1.0).
- Can I increase the input size/temporal window for Stage 3?
Due to VRAM limitations we could only train with 11 3-bit block sums. With more VRAM, scaling is possible. A smarter block-summing strategy could potentially provide gains.
- Can you quantify or control hallucination?
gQIR has no such mechanism as of yet.
- Can I use prompts to guide the reconstruction since the base prior is a T2I model (SD2.1)?

