Skip to content

Aryan-Garg/gQIR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

234 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[CVPR 2026] gQIR: Generative Quanta Image Reconstruction (Highlight; Top 10%)

Static Badge arXiv Hugging Face Spaces

Aryan Garg1, Sizhuo Ma2, Mohit Gupta1

1 University of Wisconsin-Madison
2 Snap, Inc

ColorSPADs

Table of Contents

Updates

9-Apr, 2026 gQIR is accepted to CVPR as a highlight! (Top 10% Accepted Papers)
8-Apr, 2026: SD3.5 VAE Code and weights also open-sourced now!

Installation

Recommended Hardware: NVIDIA RTX 4090 (CUDA version: 12.5)

conda env create -f environment.yml

conda activate gqir

RAFT setup:

sh download_raft_weights.sh

Make sure ./pretrained_ckpts/models/raft-things.pth is a valid path & properly downloaded RAFT weight. (This is used in all burst pipelines. See L257 of infer_burst.py)

Quick Start

Launching the local gradio demo:

Once you are in the gqir env and the repo's root dir, run:

python gradio_app.py --single-config configs/inference/eval_sd2GAN.yaml --burst-config configs/inference/eval_burst_mosaic.yaml --device cuda

Other optional args you can specify:

  1. --port
  2. --local
  3. --share (Creates a publicly share-able url for the demo)

Pretrained Models and Dataset

Model Zoo:

See full Model card at HuggingFaces 🤗: aRy4n/gQIR

Color-Model Name Stage Bit Depth 🤗 Download Link
qVAE Stage 1 1-bit 1965000.pt
Adversarial Diffusion LoRA-UNet Stage 2 1-bit state_dict.pth
qVAE Stage 1 3-bit 0105000.pt
Adversarial Diffusion LoRA-UNet Stage 2 3-bit state_dict.pth
FusionViT Stage 3 3-bit fusion_vit_0050000.pt
Monochrome-Model Stage Bit Depth 🤗 Download Link
qVAE Stage 1 3-bit 0150000.pt
Adversarial Diffusion LoRA-UNet Stage 2 3-bit state_dict.pth
FusionViT Stage 3 3-bit fusion_vit_0020000.pt

XD-Dataset:

The 390 videos large XD dataset and it's description can be found at HuggingFaces 🤗: aRy4n/eXtreme-Deformable.

Real Color-SPAD Dataset:

The real color SPAD captures can be found at HuggingFaces 🤗: aRy4n/real-color-SPAD-indoor6.

Careful while downloading! The dataset is ~83Gb big.

Special thanks to Avery Gump for helping capture this dataset.

Other Datasets used for training/testing:

Image-datasets (for stages 1 & 2 only)

  1. DIV2K
  2. FFHQ (for enhancing facial reconstruction)
  3. Flickr2K
  4. LAION-170M HQ (smaller subset used still)
  5. Landscapes HQ

Video-datasets (for all 3 stages):

  1. Visionsim. Jungerman et al. Download dataset from the Single Photon Challenge Download-Page
  2. I2-2000fps. From the papers: QUIVER & QuDI.
  3. XVFI (4096×2160 cropped to 768x768. Each video capped at 5,000 frames/5 seconds)
  4. UDM10 - 10 videos @24fps (Testing only)
  5. SPMC Videos - 30 videos @24fps (Testing only)
  6. REDS - 240 videos @120fps
  7. YouHQ
  8. XD (Testing only)

Inference

Stage 1 & 2 - Single Image Reconstruction:

⚠️ Update input dataset/image paths in the respective config files first! See test_txt_files for examples on how to create dataset txt files.

For Stage 1, simply add the --only_vae flag at the end of any of the shown below example commands.

Example for Single Image with GT:

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_color.yaml --eval_single_image --single_img_path "<path-to-dataset>/DIV2K_valid_HR/0829.png"

Example for running directly on a GT Dir:

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_mono.yaml --eval_gt_dir --gt_dir <path-to-gt-dir> 

For real world captures:

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_3bit_color.yaml --ds_txt ds_txt_real_captures.txt --real_captures

Stage 3 - Burst Reconstruction:

For realistic (77 GT frames --> 77 binary frames) burst eval:

python3 infer_burst_realistic.py --config configs/inference/eval_burst_mosaic.yaml

For QUIVER-style (11 GT frames --> 77 binary frames) burst eval:

python3 infer_burst.py --config configs/inference/eval_burst_mosaic.yaml

Training

Stage 1 - Training SPAD-CMOS Aligned VAE:

conda activate hypir   

1-bit qVAE:

CUDA_VISIBLE_DEVICES=8,9,10,11,12,13,14,15 accelerate launch --main_process_port 29502 train_s1_mosaic.py --config configs/train/train_s1_mosaic_1bit.yaml

3-bit qVAE:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --main_process_port 29503 train_s1_mosaic.py --config configs/train/train_s1_mosaic_3bit.yaml

Stage 2 - Latent Space Enhancement - Adversarial Training with Diffusion Initialization:

python3 train_sd2GAN.py --config configs/train/train_sd2gan.yaml   

Stage 3 - Burst Processing - Fidelity Upgrade

For comparison with QUIVER & QBP:

CUDA_VISIBLE_DEVICES=1 python3 train_burst.py --config configs/train/train_burst.yaml  

For color-burst model:

CUDA_VISIBLE_DEVICES=1 python3 train_burst.py --config configs/train/train_burst_mosaic.yaml 

Precomputing latents:

conda activate hypir && cd apgi/gQVR   

python3 infer_sd2GAN_stage2.py --config configs/inference/eval_sd2GAN.yaml --device "cuda:0" --ds_txt dataset_txt_files/video_dataset_txt_files/combined_part00.txt

Citation

Please cite our work if you find it useful. Thanks! :)

@InProceedings{garg_2026_gqir,
    author    = {Garg, Aryan and Ma, Sizhuo and  Gupta, Mohit},
    title     = {gQIR: Generative Quanta Image Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
}

Acknowledgements

This project is based on XPixelGroup's projects: DiffBIR and HYPIR. Thanks for their amazing work.

Additionally, the project was supported by Ubicept for compute (cloud credits).

Contact

If you have any questions, please feel free to contact me at agarg54@wisc.edu or raise an issue here.

FAQs

  1. What is the minmum average PPP where gQIR works?

1PPP average was tested. It might work below that limit as well. Contrast correction is needed. Note: Training PPP was fixed at 3.25PPP (alpha = 1.0).

  1. Can I increase the input size/temporal window for Stage 3?

Due to VRAM limitations we could only train with 11 3-bit block sums. With more VRAM, scaling is possible. A smarter block-summing strategy could potentially provide gains.

  1. Can you quantify or control hallucination?

gQIR has no such mechanism as of yet.

  1. Can I use prompts to guide the reconstruction since the base prior is a T2I model (SD2.1)?

Absolutely yes! Cool-Prompt-Reconstruction

About

CVPR 2026 - Generative Quanta Image Reconstruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors