- Notifications
You must be signed in to change notification settings - Fork 2k
Description
tf.moments(tf.initializers.leCunNormal().apply([1, 1, 1, 1000000], 'float32')).variance.print()
prints something like:
0.7735087871551514
while it should be close to 1.0
The stddev passed to truncated normal distribution should be scaled by 1.0 / 0.87962566103423978
like it is done in https://github.com/keras-team/tf-keras/blob/v2.19.0/tf_keras/initializers/initializers.py#L662 .
This is because truncated normal distribution has different variance to stdev relationship from regular normal distribution.
The downscaling constant (0.87962566103423978
in this case) can be calculated like:
sqrt(1 - 4 / sqrt(2 * pi) * exp(-0.5 * x**2) / erf(x / sqrt(2))), where x is the bound of the distribution over stddev (x = 2 in the code)
unfortunately the documentation is also ignoring that fact, so it is not clear what to change