Implementation of the GaussianNoise transform for uint8 inputs#9169
Implementation of the GaussianNoise transform for uint8 inputs#9169NicolasHug merged 3 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9169
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @diaz-esparza! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
NicolasHug
left a comment
There was a problem hiding this comment.
@diaz-esparza thank you so much for the fantastic PR! The new tests and the docs are great.
(And congrats on your degree!!)
Implements #9148, aka an implementation for the GaussianNoise transform for uint8 inputs.
Several configurations have been both benchmarked and validated to do this. Two main approaches have been tested:
floatdata type, adds the gaussian noise and transforms the output back intouint8.[0, 255]to[0, 1]and back, as those conversions are performed per-pixel and can be slow on big images. Instead, as the noise is going to be multiplied by thesigmaparameter anyways, we can use thesigma * 255as a coefficient instead (and then addmean * 255), thus saving two floating-point array-wide operations.int16) to then perform the addition and finally transform the result back intouint8.sigma * 255and addmean * 255before transforming the output toint16.[-255, 255]range that the noise tensor would theoretically generate, and also have a margin to be able to clamp pixels that might lie outside said range.in16offers a range of[-32_768, 32_767], which is more than enough for our use case.int16before adding the mean (thus performing addition of ints instead of floats), rounding the result (to have more accurate results at a performance cost) and performing implicit conversions of data types instead of an explicit ones.I've created a separate repo to host both the benchmark and the validation code (and results!) for these implementations. Here's the TL;DR:
int16is almost always the best intermediate int dtype.int16is slightly slower on all hardware (0.93x speed), and outputs are indistinguishable in spite of this better numerical precision. Leaving this off only leaves a very slight bias that lowers the effective standard deviation by about0.3 * (1/255).int16before adding a rounded mean parameter doesn't get us a significant speedup (even some slowdown was measured depending on hardware).Given these results, the solution provided uses:
float32dtype. It's faster on CPUs, slightly slower on GPUs. Decided to prioritize the first one given the typical use case of offloading augmentation to it, and the fact that it's slower by orders of magnitude.255*sigma, added by255*meanand converted toint16to add to the input image.uint8.uint8.The repo I mentioned before contains all the results that validate this methodology, as well as visualizations that display the very slight differences in results from the proposed implementation w.r.t. a more traditional one.
The PR also updates the documentation on the
GaussianNoiseclass (validated the HTML doc render), updates one exception message, implements a new suite of tests for this new functionality and passes all the ones that were already there (and also the flake8 ones). There are two tests that fail, but I've checked and those are test configuration errors for theRandomIoUCropclass, which it has not been touched in these changes.Also important to mention that it's basically inevitable not to cause integer overflowing when setting the
clampflag to False, but I've documented that too!Very open to criticism and/or improvements, hope you consider this PR!!
cc @vfdev-5