This repository contains a study that compares the use of three different batch normalization methods (BN, BNRS, and BRN) on image classification datasets with small and large batch sizes.
BN is the same as the PyTorch implementation and uses batch statistics for normalization during training, with running statistics reserved for inference only.
BNRS always uses running statistics during both training and inference, and only uses batch statistics to update running statistics without computing any gradients.
Method 3: Batch ReNormalization w/o Clipping (BRN)
BRN appears to use running statistics in both training and inference, but it also employs the gradients of batch statistics during training. This version does not implement the value clipping for
Method 4: Batch ReNormalization w/ Clipping (BRNC)
BRNC is the original version introduced in the Batch ReNormalization paper, with value clipping for
BRWoS mimics the behavior of batch norm used with data parallel, where the batch stats is computed independently for each device and only the running stats on the first device will be kept to the end.
| BS=2 | 16 | 128 |
|---|---|---|
![]() |
![]() |
![]() |
| BS=2 | 16 | 128 |
|---|---|---|
![]() |
![]() |
![]() |
In general, BRN(C) performs better than BN only when the batch size is small (e.g. 2) and the task is challenging (e.g. CIFAR100). Otherwise, it is best to simply use BN. BNRS tends to converge much slower and should be avoided. BNWoS can fail when batch size is small, but performs similarly to BN when the batch size is sufficiently large.
- https://arxiv.org/pdf/1702.03275.pdf
- ChatGPT for refining the
README.md





