VAE loss

According to the expression in line 95, the KL-divergence term is calculated from
0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
but I think the code in line 96-97 represents
0.5 * sum(1 + log(sigma^2) - mu^2 - sigma)

This might not be essential because whether the last term is squared or not, the loss descending behavior stays unchanged.