-   Notifications  You must be signed in to change notification settings 
- Fork 9.8k
Closed

Description
According to the expression in line 95, the KL-divergence term is calculated from
 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
 but I think the code in line 96-97 represents
 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma)
This might not be essential because whether the last term is squared or not, the loss descending behavior stays unchanged.
Metadata
Metadata
Assignees
Labels
No labels
