TVAE (and presumably other models using gradient optimizers) shows significantly different convergence behavior on sequential and distributed execution; for models using analytical results for the M-steps (e.g. BSC) similar effects are not observed.
For example in a bars test, TVAE and BSC showed differences between converged and ground-truth lower bounds approximately in the range [0,1] when being executed sequentially. On distributed execution, on the other hand,lower bound differences w.r.t. ground-truth remained similar for BSC while they increased to approximately [0,7] for TVAE (i.e., TVAE in this case fails to approach good optima).