Implementation of the GaussianNoise transform for uint8 inputs by diaz-esparza · Pull Request #9169 · pytorch/vision

diaz-esparza · 2025-08-06T14:30:15Z

Implements #9148, aka an implementation for the GaussianNoise transform for uint8 inputs.

Several configurations have been both benchmarked and validated to do this. Two main approaches have been tested:

The first method tested transforms the input to the float data type, adds the gaussian noise and transforms the output back into uint8.
- Note that we don't necessarily have to move the input image's representation range from [0, 255] to [0, 1] and back, as those conversions are performed per-pixel and can be slow on big images. Instead, as the noise is going to be multiplied by the sigma parameter anyways, we can use the sigma * 255 as a coefficient instead (and then add mean * 255), thus saving two floating-point array-wide operations.
The second method involves less floating-point operations, opting to convert both the noise and the input image to an intermediate data type (int16) to then perform the addition and finally transform the result back into uint8.
- Again, we multiply the noise by sigma * 255 and add mean * 255 before transforming the output to int16.
- Using a signed, bigger integer dtype is essential here, as we need to both cover the legitimate [-255, 255] range that the noise tensor would theoretically generate, and also have a margin to be able to clamp pixels that might lie outside said range. in16 offers a range of [-32_768, 32_767], which is more than enough for our use case.
- We have also tested some other configurations for this setup, like converting the noise to int16 before adding the mean (thus performing addition of ints instead of floats), rounding the result (to have more accurate results at a performance cost) and performing implicit conversions of data types instead of an explicit ones.

I've created a separate repo to host both the benchmark and the validation code (and results!) for these implementations. Here's the TL;DR:

The 'intermediate int' approach is significantly faster on the consumer CPUs tested (ryzen 2600 & ryzen 5600), being at least 1.3x faster than converting the input image to a floating-point format and back.
- Note: the speedup here is done with the largest input image my computer was able to process (8500x8500). Differences with tiny inputs (100x100) are negligible (1.05x), but results with 1000x1000 images can exceed a 2x speedup (!?!?)
- Also, we've tested different dtypes for every method (e.g. {float16, float32, float64} for the floats and {int16, int32, int64} for the ints). We're taking the best result out of every combination of dtypes.
Results on the GPUs tested (gtx1070ti & rtx3060ti) are a bit more nuanced. On the 1070ti, this same method is 1.17x faster, while on the 3060ti it's 0.95x as fast (so just a bit slower).
- That's @ 8500x8500. On 1000x1000 once again results are better (1.3x on 1070ti, 1.06x on 3060ti).
Generating the noise with 32-bit floats is fastest on CPUs. On GPUs, using 16-bit floats is generally a bit more performant. int16 is almost always the best intermediate int dtype.
Implicit conversion for some reason is faster on GPUs, while there's next to no difference on CPUs. Outputs are identical on both settings.
In the 'intermediate int' approach, rounding the noise before converting to int16 is slightly slower on all hardware (0.93x speed), and outputs are indistinguishable in spite of this better numerical precision. Leaving this off only leaves a very slight bias that lowers the effective standard deviation by about 0.3 * (1/255).
Finally, also in the 'intermediate int' approach, converting the noise to int16 before adding a rounded mean parameter doesn't get us a significant speedup (even some slowdown was measured depending on hardware).

Given these results, the solution provided uses:

Generated noise in the float32 dtype. It's faster on CPUs, slightly slower on GPUs. Decided to prioritize the first one given the typical use case of offloading augmentation to it, and the fact that it's slower by orders of magnitude.
An approach where the N(0, 1)-distributed noise gets multiplied by 255*sigma, added by 255*mean and converted to int16 to add to the input image.
An implicit conversion of the input image from uint8.
After adding the noise, an optional clamp and a mandatory cast back to uint8.

The repo I mentioned before contains all the results that validate this methodology, as well as visualizations that display the very slight differences in results from the proposed implementation w.r.t. a more traditional one.

The PR also updates the documentation on the GaussianNoise class (validated the HTML doc render), updates one exception message, implements a new suite of tests for this new functionality and passes all the ones that were already there (and also the flake8 ones). There are two tests that fail, but I've checked and those are test configuration errors for the RandomIoUCrop class, which it has not been touched in these changes.

Also important to mention that it's basically inevitable not to cause integer overflowing when setting the clamp flag to False, but I've documented that too!

Very open to criticism and/or improvements, hope you consider this PR!!

cc @vfdev-5

pytorch-bot · 2025-08-06T14:30:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9169

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://api.github.com/repos/pytorch/pytorch timed out.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2025-08-06T14:30:21Z

Hi @diaz-esparza!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2025-08-06T15:17:30Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

torchvision/transforms/v2/functional/_misc.py

NicolasHug

@diaz-esparza thank you so much for the fantastic PR! The new tests and the docs are great.

(And congrats on your degree!!)

Implemented GaussianNoise compatibility for uint8 inputs

2535edf

diaz-esparza mentioned this pull request Aug 6, 2025

GaussianNoise uint8 [0,255] compatibility #9148

Open

meta-cla bot added the cla signed label Aug 6, 2025

diaz-esparza changed the title ~~Implements #9148~~ Implementation of the GaussianNoise transform for uint8 inputs Aug 7, 2025

Merge branch 'main' into GaussianNoiseUINT8

9df3cd3

NicolasHug added enhancement module: transforms labels Sep 2, 2025

NicolasHug reviewed Sep 2, 2025

View reviewed changes

torchvision/transforms/v2/functional/_misc.py Show resolved Hide resolved

Apply suggestion from @NicolasHug

17da55f

NicolasHug approved these changes Sep 2, 2025

View reviewed changes

NicolasHug merged commit b208f7f into pytorch:main Sep 2, 2025
43 of 45 checks passed

diaz-esparza deleted the GaussianNoiseUINT8 branch December 28, 2025 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of the GaussianNoise transform for uint8 inputs#9169

Implementation of the GaussianNoise transform for uint8 inputs#9169
NicolasHug merged 3 commits intopytorch:mainfrom
diaz-esparza:GaussianNoiseUINT8

diaz-esparza commented Aug 6, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 6, 2025 •

edited

Loading

Uh oh!

meta-cla bot commented Aug 6, 2025

Uh oh!

meta-cla bot commented Aug 6, 2025

Uh oh!

Uh oh!

NicolasHug left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

diaz-esparza commented Aug 6, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9169

❗ 1 Active SEVs

Uh oh!

meta-cla bot commented Aug 6, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Aug 6, 2025

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diaz-esparza commented Aug 6, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 6, 2025 •

edited

Loading