Why don't we simply use running statistics for batch normalization during training?

This repository contains a study that compares the use of three different batch normalization methods (BN, BNRS, and BRN) on image classification datasets with small and large batch sizes.

Methods

Method 1: BatchNorm (BN)

BN is the same as the PyTorch implementation and uses batch statistics for normalization during training, with running statistics reserved for inference only.

Method 2: BatchNorm with Running Statistics (BNRS)

BNRS always uses running statistics during both training and inference, and only uses batch statistics to update running statistics without computing any gradients.

Method 3: Batch ReNormalization w/o Clipping (BRN)

BRN appears to use running statistics in both training and inference, but it also employs the gradients of batch statistics during training. This version does not implement the value clipping for $r$ and $d$.

Method 4: Batch ReNormalization w/ Clipping (BRNC)

BRNC is the original version introduced in the Batch ReNormalization paper, with value clipping for $r$ and $d$.

Method 5: BatchNorm without synchronization across devices (BRWoS)

BRWoS mimics the behavior of batch norm used with data parallel, where the batch stats is computed independently for each device and only the running stats on the first device will be kept to the end.

Results

MNIST

BS=2	16	128

CIFAR100

BS=2	16	128

Summary

In general, BRN(C) performs better than BN only when the batch size is small (e.g. 2) and the task is challenging (e.g. CIFAR100). Otherwise, it is best to simply use BN. BNRS tends to converge much slower and should be avoided. BNWoS can fail when batch size is small, but performs similarly to BN when the batch size is sufficiently large.

Credits

https://arxiv.org/pdf/1702.03275.pdf
ChatGPT for refining the README.md

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
bns		bns
results		results
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Why don't we simply use running statistics for batch normalization during training?

Methods

Method 1: BatchNorm (BN)

Method 2: BatchNorm with Running Statistics (BNRS)

Method 3: Batch ReNormalization w/o Clipping (BRN)

Method 4: Batch ReNormalization w/ Clipping (BRNC)

Method 5: BatchNorm without synchronization across devices (BRWoS)

Results

MNIST

CIFAR100

Summary

Credits

About

Uh oh!

Releases

Packages

Languages

enhuiz/batchnorms

Folders and files

Latest commit

History

Repository files navigation

Why don't we simply use running statistics for batch normalization during training?

Methods

Method 1: BatchNorm (BN)

Method 2: BatchNorm with Running Statistics (BNRS)

Method 3: Batch ReNormalization w/o Clipping (BRN)

Method 4: Batch ReNormalization w/ Clipping (BRNC)

Method 5: BatchNorm without synchronization across devices (BRWoS)

Results

MNIST

CIFAR100

Summary

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages