- 34318 train and 5509 test images with different resolutions.
- Resize to 64x128, normalize each channel to [0, 1] and keep HWC format.
- 3 x Convolution Blocks (CONV - CONV - CONV - BATCH NORM - LEAKY RELU - DROPOUT)
- CONV(label_length)
- Dense(hidden_Size)
- Dense(vocab_size)
- Softmax(-1)
- Adaptive Momentum Estimation with 1e-3 and Constant Scheduling Strategy
- batch_size = 512 on DGX A100 for 20 epochs (250 seconds)
- Reshuffle at each epoch
- Minimize Categorical Cross Entropy between one-hot-encoded network output and one-hot-encoded label
-
Accuracy defined as number of correctly predicted pictures.
Model Parameters : [MB] Train Test 32-512 4,039,403 : 15.4MB 93.2 % 70.1 %