untitledunmastered1998/DistillationLab
experiment environment
- python3.8.12
- pytorch1.10.1
| dataset | #train samples | #test samples | #classes | resolution |
|---|---|---|---|---|
| CIFAR100 | 50000 | 10000 | 100 | low |
| MNIST | 60000 | 10000 | 10 | low |
| vggface2 | 2763078 | 548208 | 9131 | low |
| ImageNet | 1281167 | 50000 | 1000 | high |
| ImageNet_subset | 12610 | 5000 | 100 | high |
| ImageNet32 | 1281167 | 50000 | 1000 | low |
| ImageNet32_reduced | 384631 | 15000 | 300 | low |
| Tiny-ImageNet | 100000 | 10000 | 200 | low |
| Cars | 8144 | 8041 | 196 | high |
| flowers102 | 2040 | 6149 | 102 | high |
| stanford_dogs | 12601 | 8519 | 120 | high |
| aircrafts | 6667 | 3333 | 100 | high |
Available teacher and student networks including:
'resnet32', 'ResNet18', 'ResNet34', 'ResNet50', 'ResNet101', 'ResNet152',
'mobilenet_v2',
'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0',
'squeezenet1_0', 'squeezenet1_1'
| networks | parameters |
|---|---|
| resnet32 | |
| ResNet18 | |
| ResNet34 | |
| ResNet50 | |
| ResNet101 | |
| ResNet152 | |
| mobilenet_v2 | |
| shufflenet_v2_x0_5 | |
| shufflenet_v2_x1_0 | |
| shufflenet_v2_x1_5 | |
| shufflenet_v2_x2_0 | |
| squeezenet1_0 | |
| squeezenet1_1 |
① knowledge distillation [Distilling the Knowledge in a Neural Network] (https://arxiv.org/abs/1503.02531)
② L2
③ FitNets [FitNets: Hints for Thin Deep Nets] (https://arxiv.org/abs/1412.6550)
④ PKT [Learning Deep Representations with Probabilistic Knowledge Transfer] ECCV2018 (https://arxiv.org/abs/1803.10837)
⑤ RKD [Relational Knowledge Distillation] CVPR 2019(https://arxiv.org/abs/1904.05068)
Baseline performance follows standard image classification training procedures.
| tricks | performance |
|---|---|
| baseline | |
| +xavier init / kaiming init | |
| +pretrained weights | |
| +no bias decay | |
| +label smoothing | |
| +random erasing | |
| +linear scaling learning rate | |
| +cutout | |
| +dropout | |
| +cosine learning rate decay | |
| +warm up stage | |
| +mixup | |
| +Zero γ | |
| data augmentation | |
| Learning Rate Schedule |
| teacher | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
|---|---|---|---|---|---|
| student | mobilenet_v2 | ||||
| t_baseline | |||||
| s_baseline | |||||
| KD | |||||
| FitNets | |||||
| RKD | |||||
| PKT | |||||
| L2 | |||||
| AT | |||||
| overhaul |
| teacher | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
|---|---|---|---|---|---|
| student | mobilenet_v2 | shufflenet_v1 | squeezenet_v0 | shufflenet_v2 | WRN-16-2 |
| t_baseline | |||||
| s_baseline | |||||
| KD | |||||
| FitNets | |||||
| RKD | |||||
| PKT | |||||
| L2 | |||||
| AT | |||||
| overhaul |
| teacher | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
|---|---|---|---|---|---|
| student | resnet8×4 | resnet32 | resnet18 | resnet34 | resnet50 |
| t_baseline | |||||
| s_baseline | |||||
| KD | |||||
| FitNets | |||||
| RKD | |||||
| PKT | |||||
| L2 | |||||
| AT | |||||
| overhaul |
| datasets | student | KD | AT | L2 | FitNets | CRD | RKD | PKT | teacher |
|---|---|---|---|---|---|---|---|---|---|
| CIFAR100→STL-10 | |||||||||
| CIFAR100→Tiny-ImageNet |