Paper | Sup. material | Video
This repo was forked from the code for the scene completion diffusion method proposed in the CVPR'24 paper: "Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion".
Their notes:
"Our method leverages diffusion process as a point-wise local problem, disentangling the scene data distribution during in the diffusion process, learning only the point local neighborhood distribution. From our formulation we can achieve a complete scene representation from a single LiDAR scan directly operating over the 3D points."
Installing python packages pre-requisites:
sudo apt install build-essential python3-dev libopenblas-dev
pip3 install -r requirements.txt
Installing MinkowskiEngine:
pip3 install -U MinkowskiEngine==0.5.4 --install-option="--blas=openblas" -v --no-deps
To setup the code run the following command on the code main directory:
pip3 install -U -e .
To run this code on NVIDIA Blackwell GPUs with CUDA 12.8, follow these steps:
- Install PyTorch 2.7+ built for CUDA 12.8, for example:
pip install torch==2.7.0+cu128 torchvision==0.15.0+cu128 --index-url https://download.pytorch.org/whl/cu128
- Build and install MinkowskiEngine from source for CUDA 12.8:
git clone https://github.com/NVIDIA/MinkowskiEngine.git cd MinkowskiEngine python3 setup.py install - Build and install PyTorch3D from source for CUDA 12.8:
pip install 'git+https://github.com/facebookresearch/pytorch3d.git@v0.7.5' --no-deps - Install remaining requirements and the package:
pip3 install -r requirements.txt pip3 install -U -e . - Verify GPU usage and performance, e.g.:
torchrun --nproc_per_node=<num_gpus> lidiff/tools/diff_completion_pipeline.py \ --diff DIFF_CKPT --refine REFINE_CKPT -T DENOISING_STEPS -s CONDITIONING_WEIGHT
- (Optional) To enable TF32 and cudnn.benchmark for extra throughput on Blackwell:
import torch torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark = True
Note: The codebase has been updated to support modern dependencies including PyTorch 2.0+, PyTorch Lightning 2.1+, and diffusers 0.24+. See the Recent Improvements section for details.
The SemanticKITTI dataset has to be download from the official site and extracted in the following structure:
./lidiff/
└── Datasets/
└── SemanticKITTI
└── dataset
└── sequences
├── 00/
│ ├── velodyne/
| | ├── 000000.bin
| | ├── 000001.bin
| | └── ...
│ └── labels/
| ├── 000000.label
| ├── 000001.label
| └── ...
├── 08/ # for validation
├── 11/ # 11-21 for testing
└── 21/
└── ...
To generate the ground complete scenes you can run the map_from_scans.py script. This will use the dataset scans and poses to generate the sequence map to be used as ground truth during training:
python3 map_from_scans.py --path Datasets/SemanticKITTI/dataset/sequences/
Once the sequences map is generated you can then train the model.
For training the diffusion model, the configurations are defined in config/config.yaml, and the training can be started with:
python3 train.py
For training the refinement network, the configurations are defined in config/config_refine.yaml, and the training can be started with:
python3 train_refine.py
An improved version with modern diffusion techniques and optimizations is available:
python3 train_improved.py --config config/config_improved.yaml
This version includes:
- Multiple scheduler types (DDPM, DDIM, DPM-Solver, Euler)
- Mixed precision training (16-bit) for ~30% faster training
- Better memory management and gradient accumulation
- Modern PyTorch Lightning 2.0+ features
- Weights & Biases logging support
You can download the trained model weights and save then to lidiff/checkpoints/:
For running the scene completion inference we provide a pipeline where both the diffusion and refinement network are loaded and used to complete the scene from an input scan. You can run the pipeline with the command:
python3 tools/diff_completion_pipeline.py --diff DIFF_CKPT --refine REFINE_CKPT -T DENOISING_STEPS -s CONDITIONING_WEIGHT
We provide one scan as example in lidiff/Datasets/test/ so you can directly test it out with our trained model by just running the code above.
The codebase has been modernized with several key improvements:
- PyTorch 2.0+ with improved performance
- PyTorch Lightning 2.1+ with modern training features
- Diffusers 0.24+ for state-of-the-art schedulers
- Added transformers, accelerate, einops, and torchmetrics
- Support for multiple noise schedulers (DDPM, DDIM, DPM-Solver, Euler, etc.)
- Variance-preserving noise schedules
- Classifier-free guidance with configurable scales
- V-prediction and epsilon prediction support
- Proper timestep embeddings with sinusoidal encoding
- Mixed precision (16-bit) training for ~30% speedup
- Gradient accumulation for larger effective batch sizes
- Modern callbacks: RichProgressBar, EarlyStopping, ModelSummary
- Weights & Biases integration for experiment tracking
- Optimized multi-GPU training with DDP
- Efficient data loading with LightningDataModule
- Improved memory management (removed excessive cache clearing)
- Better MinkowskiEngine tensor handling
- Configurable data augmentation pipeline
- Fixed hardcoded CUDA device issues
- Resolved intensity feature dimension handling
- Fixed deprecated PyTorch Lightning APIs
- Better handling of test sequences without ground truth
All improvements are backward compatible with existing checkpoints. The original training scripts continue to work with minor fixes applied.
If you use this repo, please cite as :
@inproceedings{nunes2024cvpr,
author = {Lucas Nunes and Rodrigo Marcuzzi and Benedikt Mersch and Jens Behley and Cyrill Stachniss},
title = {{Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion}},
booktitle = {{Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)}},
year = {2024}
}