[CVPR 2026] gQIR: Generative Quanta Image Reconstruction (Highlight; Top 10%)

¹ University of Wisconsin-Madison
² Snap, Inc

Updates

9-Apr, 2026 gQIR is accepted to CVPR as a highlight! (Top 10% Accepted Papers)
8-Apr, 2026: SD3.5 VAE Code and weights also open-sourced now!

Installation

Recommended Hardware: NVIDIA RTX 4090 (CUDA version: 12.5)

conda env create -f environment.yml

conda activate gqir

RAFT setup:

sh download_raft_weights.sh

Make sure ./pretrained_ckpts/models/raft-things.pth is a valid path & properly downloaded RAFT weight. (This is used in all burst pipelines. See L257 of infer_burst.py)

Quick Start

Launching the local gradio demo:

Once you are in the gqir env and the repo's root dir, run:

python gradio_app.py --single-config configs/inference/eval_sd2GAN.yaml --burst-config configs/inference/eval_burst_mosaic.yaml --device cuda

Other optional args you can specify:

--port
--local
--share (Creates a publicly share-able url for the demo)

Pretrained Models and Dataset

Model Zoo:

See full Model card at HuggingFaces 🤗: aRy4n/gQIR

Color-Model Name	Stage	Bit Depth	🤗 Download Link
qVAE	Stage 1	1-bit	1965000.pt
Adversarial Diffusion LoRA-UNet	Stage 2	1-bit	state_dict.pth
qVAE	Stage 1	3-bit	0105000.pt
Adversarial Diffusion LoRA-UNet	Stage 2	3-bit	state_dict.pth
FusionViT	Stage 3	3-bit	fusion_vit_0050000.pt

Monochrome-Model	Stage	Bit Depth	🤗 Download Link
qVAE	Stage 1	3-bit	0150000.pt
Adversarial Diffusion LoRA-UNet	Stage 2	3-bit	state_dict.pth
FusionViT	Stage 3	3-bit	fusion_vit_0020000.pt

XD-Dataset:

The 390 videos large XD dataset and it's description can be found at HuggingFaces 🤗: aRy4n/eXtreme-Deformable.

Real Color-SPAD Dataset:

The real color SPAD captures can be found at HuggingFaces 🤗: aRy4n/real-color-SPAD-indoor6.

Careful while downloading! The dataset is ~83Gb big.

Special thanks to Avery Gump for helping capture this dataset.

Other Datasets used for training/testing:

Image-datasets (for stages 1 & 2 only)

DIV2K
FFHQ (for enhancing facial reconstruction)
Flickr2K
LAION-170M HQ (smaller subset used still)
Landscapes HQ

Video-datasets (for all 3 stages):

Visionsim. Jungerman et al. Download dataset from the Single Photon Challenge Download-Page
I2-2000fps. From the papers: QUIVER & QuDI.
XVFI (4096×2160 cropped to 768x768. Each video capped at 5,000 frames/5 seconds)
UDM10 - 10 videos @24fps (Testing only)
SPMC Videos - 30 videos @24fps (Testing only)
REDS - 240 videos @120fps
YouHQ
XD (Testing only)

Inference

Stage 1 & 2 - Single Image Reconstruction:

⚠️ Update input dataset/image paths in the respective config files first! See test_txt_files for examples on how to create dataset txt files.

For Stage 1, simply add the --only_vae flag at the end of any of the shown below example commands.

Example for Single Image with GT:

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_color.yaml --eval_single_image --single_img_path "<path-to-dataset>/DIV2K_valid_HR/0829.png"

Example for running directly on a GT Dir:

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_mono.yaml --eval_gt_dir --gt_dir <path-to-gt-dir>

For real world captures:

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_color.yaml --ds_txt ds_txt_real_captures.txt --real_captures

Stage 3 - Burst Reconstruction:

For realistic (77 GT frames --> 77 binary frames) burst eval:

python3 infer_burst_realistic.py --config configs/inference/eval_burst_mosaic.yaml

For QUIVER-style (11 GT frames --> 77 binary frames) burst eval:

python3 infer_burst.py --config configs/inference/eval_burst_mosaic.yaml

Training

Stage 1 - Training SPAD-CMOS Aligned VAE:

conda activate hypir

1-bit qVAE:

CUDA_VISIBLE_DEVICES=8,9,10,11,12,13,14,15 accelerate launch --main_process_port 29502 train_s1_mosaic.py --config configs/train/train_s1_mosaic_1bit.yaml

3-bit qVAE:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --main_process_port 29503 train_s1_mosaic.py --config configs/train/train_s1_mosaic_3bit.yaml

Stage 2 - Latent Space Enhancement - Adversarial Training with Diffusion Initialization:

python3 train_sd2GAN.py --config configs/train/train_sd2gan.yaml

Stage 3 - Burst Processing - Fidelity Upgrade

For comparison with QUIVER & QBP:

CUDA_VISIBLE_DEVICES=1 python3 train_burst.py --config configs/train/train_burst.yaml

For color-burst model:

CUDA_VISIBLE_DEVICES=1 python3 train_burst.py --config configs/train/train_burst_mosaic.yaml

Precomputing latents:

conda activate hypir && cd apgi/gQVR   

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_sd2GAN.yaml --device "cuda:0" --ds_txt dataset_txt_files/video_dataset_txt_files/combined_part00.txt

Citation

Please cite our work if you find it useful. Thanks! :)

@InProceedings{garg_2026_gqir,
    author    = {Garg, Aryan and Ma, Sizhuo and  Gupta, Mohit},
    title     = {gQIR: Generative Quanta Image Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
}

Acknowledgements

This project is based on XPixelGroup's projects: DiffBIR and HYPIR. Thanks for their amazing work.

Additionally, the project was supported by Ubicept for compute (cloud credits).

Contact

If you have any questions, please feel free to contact me at agarg54@wisc.edu or raise an issue here.

FAQs

What is the minmum average PPP where gQIR works?

1PPP average was tested. It might work below that limit as well. Contrast correction is needed. Note: Training PPP was fixed at 3.25PPP (alpha = 1.0).

Can I increase the input size/temporal window for Stage 3?

Due to VRAM limitations we could only train with 11 3-bit block sums. With more VRAM, scaling is possible. A smarter block-summing strategy could potentially provide gains.

Can you quantify or control hallucination?

gQIR has no such mechanism as of yet.

Can I use prompts to guide the reconstruction since the base prior is a T2I model (SD2.1)?

Absolutely yes!

Name		Name	Last commit message	Last commit date
Latest commit History 234 Commits
assets		assets
configs		configs
eval_burst		eval_burst
gqvr		gqvr
test_txt_files		test_txt_files
things_that_didnt_work		things_that_didnt_work
util_notebooks		util_notebooks
vlm_captioners		vlm_captioners
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
base_trainer.py		base_trainer.py
download_raft_weights.sh		download_raft_weights.sh
environment.yaml		environment.yaml
eval_gt_dir.sh		eval_gt_dir.sh
eval_real.sh		eval_real.sh
eval_single.sh		eval_single.sh
gradio_app.py		gradio_app.py
infer_burst.py		infer_burst.py
infer_burst_realistic.py		infer_burst_realistic.py
infer_sd2GAN_stage2.py		infer_sd2GAN_stage2.py
requirements_hypir.txt		requirements_hypir.txt
train_burst.py		train_burst.py
train_s1_mosaic.py		train_s1_mosaic.py
train_sd2GAN.py		train_sd2GAN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[CVPR 2026] gQIR: Generative Quanta Image Reconstruction (Highlight; Top 10%)

Table of Contents

Updates

Installation

RAFT setup:

Quick Start

Pretrained Models and Dataset

Model Zoo:

XD-Dataset:

Real Color-SPAD Dataset:

Other Datasets used for training/testing:

Inference

Stage 1 & 2 - Single Image Reconstruction:

Example for Single Image with GT:

Example for running directly on a GT Dir:

For real world captures:

Stage 3 - Burst Reconstruction:

Training

Stage 1 - Training SPAD-CMOS Aligned VAE:

Stage 2 - Latent Space Enhancement - Adversarial Training with Diffusion Initialization:

Stage 3 - Burst Processing - Fidelity Upgrade

Citation

Acknowledgements

Contact

FAQs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2026] gQIR: Generative Quanta Image Reconstruction (Highlight; Top 10%)

Table of Contents

Updates

Installation

RAFT setup:

Quick Start

Pretrained Models and Dataset

Model Zoo:

XD-Dataset:

Real Color-SPAD Dataset:

Other Datasets used for training/testing:

Inference

Stage 1 & 2 - Single Image Reconstruction:

Example for Single Image with GT:

Example for running directly on a GT Dir:

For real world captures:

Stage 3 - Burst Reconstruction:

Training

Stage 1 - Training SPAD-CMOS Aligned VAE:

Stage 2 - Latent Space Enhancement - Adversarial Training with Diffusion Initialization:

Stage 3 - Burst Processing - Fidelity Upgrade

Citation

Acknowledgements

Contact

FAQs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages