AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (CVPR 2024 - USM Workshop ) [Paper] [Pre-trained Model] [TensorRT Model]

This repository contains the official implementation of AsymFormer, a novel network for real-time RGB-D semantic segmentation.

Achieves efficient and precise RGB-D semantic segmentation
Allows effective fusion of multimodal features at low computational cost
Minimizes superfluous parameters by optimizing computational resource distribution
Enhances network accuracy through feature selection and multi-modal self-similarity features
Utilizes Local Attention-Guided Feature Selection (LAFS) module for selective fusion
Introduces Cross-Modal Attention-Guided Feature Correlation Embedding (CMA) module for cross-modal representations

📊 Results

AsymFormer achieves competitive results on the following datasets:

NYUv2: 54.1% mIoU
NYUv2 (Multi-Scale): 55.3% mIoU
SUNRGBD: 49.1% mIoU

Notably, it also provides impressive inference speeds:

Inference speed of 65 FPS on RTX3090
Inference speed of 79 FPS on RTX3090 (FP16)
Inference speed of 29 FPS on Tesla T4 (FP16)

🔄 Changelog

Date	Update	Summary
2026-02-13	SCC Module Optimization	Comprehensively optimized SCC module (LAFS + CMA + fusion pipeline) for ~14.6% speedup on Apple M3 Max @ 480×640 (cumulative ~24% total speedup from original baseline). Full weight compatibility maintained.
2026-02-13	MLPDecoder Code Quality	Refactored MLPDecoder to improve code readability and reduce redundant operations. No performance impact Full weight compatibility maintained.
2026-02-13	LAFS Code Quality	Refactored LAFS module to reduce tensor operations and improve code readability. No performance impact (within measurement error < 1%). Full weight compatibility maintained.
2026-02-13	ConvNeXt LayerNorm Optimization	Optimized ConvNeXt LayerNorm with native `F.layer_norm` for ~1.8% additional speedup on Apple M3 Max (cumulative ~5.4% total speedup with SDPA)
2026-02-12	SDPA Optimization	Attention modules (CMA + MixTransformer) optimized with `F.scaled_dot_product_attention` for ~3.6% speedup on Apple M3 Max @ 480×640 (zero accuracy loss, full weight compatibility)
2026-02-11	API Modernization	PyTorch 2.x compatibility, `torchvision.transforms.v2` migration, advanced augmentations (CutMix/MixUp/Mosaic), multi-platform support

📄 View Full Changelog →

🛠️ Installation

To run this project, we suggest using Ubuntu 20.04, PyTorch 2.0.1, and CUDA version higher than 12.0.

Other necessary packages for running evaluation and TensorRT FP16 quantization inference:

pip install timm
pip install scikit-image
pip install opencv-python-headless==4.5.5.64
pip install thop
pip install onnx
pip install onnxruntime
pip install tensorrt==8.6.0
pip install pycuda

📁 Data Preparation

~~We used the same data source as the ACNet. The processed NYUv2 data (.npy) can be downloaded by Google Drive.~~

We found the former NYUv2 data has some mistakes. So we re-generated the training data from the original NYUv2 matlab .mat file: Google Drive.

SUNRGBD Dataset: Google Drive

🏋️ Train

To train AsymFormer on the NYUv2 dataset, you need to download the processed png format dataset Google Drive and unzip the file to the current folder. After that, the folder should look like:

├── data
│   ├── images
│   ├── depths
│   ├── labels
│   ├── train.txt
│   └── test.txt
├── utils
│   ├── __init__.py
│   └── utils.py
├── src
│   └── model files
├── NYUv2_dataloader.py
├── train.py
└── eval.py

Then run the train.py script:

python train.py

Note: The training process with batch size 8 requires 19GB GPU VRAM. We will release a mixed-precision training script soon which will require about 12GB of VRAM. However, mixed-precision training will only work on Linux platform.

After training completes, the model will automatically be evaluated using the latest checkpoint. Results will be saved to a JSON file in the checkpoint directory.

Eval

Run the eval.py script to evaluate AsymFormer on the NYUv2 Dataset:

python eval.py

If you wish to run evaluation in multi-scale inference strategy, run the MS5_eval.py script:

python MS5_eval.py

Model Exporting and Quantization

Currently, we have provided ONNX model and TensorRT FP16 model for evaluation and inference.

FP16 Inference (RTX3090 Platform)

The TensorRT inference notebook can be found in the Inference folder. You can test AsymFormer on your local environment by:

Download the 'Inference' folder
Download the TensorRT FP16 model, which was generated and optimized for RTX 3090 platform. [AsymFormer FP16 TensorRT Model]
Download the NYUv2 Dataset NYUv2
Put 'AsymFormer.engine' in the 'Inference' folder
Modify the dataset path to your own path:

val_data = Data.RGBD_Dataset(transform=torchvision.transforms.Compose([scaleNorm(),
                                                                       ToTensor(),
                                                                       Normalize()]),
                             phase_train=False,
                             data_dir='Your Own Path',  # The file path of the NYUv2 dataset
                             txt_name='test.txt'
                             )

Run the Jupyter Notebook

Optimize AsymFormer for Your Own Platform

You can generate your own TensorRT engine from the ONNX model. We provide the original ONNX model and a corresponding notebook to help you generate the TensorRT model:

The ONNX model is exported on v17 operation, and it can be downloaded from [AsymFormer ONNX Model]
The Jupyter notebook contains loading ONNX model, checking numeric overflow and generating mixed-precision TensorRT model, which can be downloaded from Generate TensorRT.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Acknowledgements

Part of the code implementation was adapted from ACNet's repository.

If you find this repository useful in your research, please consider citing:

@misc{du2023asymformer,
      title={AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation}, 
      author={Siqi Du and Weixi Wang and Renzhong Guo and Shengjun Tang},
      year={2023},
      eprint={2309.14065},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

For any inquiries, please contact siqi.du1014@outlook.com. Home page of the author: Siqi.DU's ResearchGate

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
AsymFormer 26-02-11		AsymFormer 26-02-11
Image		Image
Inference		Inference
Notebooks		Notebooks
SUNRGBD		SUNRGBD
src		src
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MS5_eval.py		MS5_eval.py
NYUv2_dataloader.py		NYUv2_dataloader.py
README.md		README.md
check_grad.py		check_grad.py
eval.py		eval.py
eval_enhanced.py		eval_enhanced.py
reproduce_v1_colab.py		reproduce_v1_colab.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (CVPR 2024 - USM Workshop ) [Paper] [Pre-trained Model] [TensorRT Model]

📊 Results

🔄 Changelog

🛠️ Installation

📁 Data Preparation

🏋️ Train

Eval

Model Exporting and Quantization

FP16 Inference (RTX3090 Platform)

Optimize AsymFormer for Your Own Platform

License

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Languages

License

Fourier7754/AsymFormer

Folders and files

Latest commit

History

Repository files navigation

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation (CVPR 2024 - USM Workshop ) [Paper] [Pre-trained Model] [TensorRT Model]

📊 Results

🔄 Changelog

🛠️ Installation

📁 Data Preparation

🏋️ Train

Eval

Model Exporting and Quantization

FP16 Inference (RTX3090 Platform)

Optimize AsymFormer for Your Own Platform

License

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages