This repo partially implements the MNIST experiment in the paper: Learning Structured Output Representation using Deep Conditional Generative Models.
pip install .
tzq config/baseline.yml train
tzq config/cvae.yml train
All models are trained for 20 epochs with batch size 32 and learning rate 1e-3. CVAE by default is not conditioned on the masked input (i.e. p(y|z) instead of p(y|z, x)).
| Method | NCLL via Importance Sampling (S = 100) |
|---|---|
| Baseline | 112.382 |
| CVAE (w/ conditioned decoder, w/o baseline) | 83.745 |
| CVAE (w/ conditioned decoder) | 79.524 |
| CVAE (w/o conditioned prior) | 76.024 |
| CVAE | 72.255 |
| CVAE (w/o baseline) | 70.868 |
| CVAE (w/ jointly trained baseline from the pretrained) | 69.352 |
| CVAE (w/ jointly trained baseline from scratch) | 67.813 |
- Baseline seems not helpful when decoder is not conditioned on the masked image.
- Conditioning on the decoder harms NCLL.
More details can be found here.

