- Notifications
You must be signed in to change notification settings - Fork 197
Description
Based on discussions in #113, #3, #128, I would like to propose the following addition to stdlib_experimental_stats:
var - variance of array elements
Description
Returns the variance of all the elements of array, or of the elements of array along dimension dim if provided, and if the corresponding element in mask is true.
The variance is defined as the best unbiased estimator and is computed as:
var(x) = 1/(n-1) sum_i (array(i) - mean(array))^2 Syntax
result = var(array [, mask])
result = var(array, dim [, mask])
Arguments
array: Shall be an array of type integer, or real.
dim: Shall be a scalar of type integer with a value in the range from 1 to n, where n is the rank of array.
mask (optional): Shall be of type logical and either by a scalar or an array of the same shape as array.
Return value
If array is of type real, the result is of the same type as array.
If array is of type integer, the result is of type double precision.
If dim is absent, a scalar with the variance of all elements in array is returned. Otherwise, an array of rank n-1, where n equals the rank of array, and a shape similar to that of ar ray with dimension dim dropped is returned.
If mask is specified, the result is the variance of all elements of array corresponding to true elements of mask. If every element of mask is false, the result is IEEE NaN.
Example
program demo_mean use stdlib_experimental_stats, only: var implicit none real :: x(1:6) = [ 1., 2., 3., 4., 5., 6. ] print *, var(x) !returns __TOBECOMPLETED__ print *, var( reshape(x, [ 2, 3 ] )) !returns __TOBECOMPLETED__ print *, var( reshape(x, [ 2, 3 ] ), 1) !returns [__TOBECOMPLETED__] print *, var( reshape(x, [ 2, 3 ] ), 1,& reshape(x, [ 2, 3 ] ) > 3.) !returns [__TOBECOMPLETED__] end program demo_meanTo be discussed (not exhaustive):
-
Based on discussions in Style guide #3, I suggest to first implement a two-pass algorithm. Other algorithms can be implemented later, as proposed in Trade-off between efficiency and robustness/accuracy #134. Allowing
dimandmaskin the API will not lead to a function as simple as in #3 comment. -
The centering of an array along a dimension (e.g.,
x(:, i) - mean(x, 2)) will most likely require a loop. To have a clean implementation of the functionvar, I propose to add a functioncenterto perform the different centering of an arrayx, andvarwould call it for the centering. However, I am afraid about efficiency (especially memory usage since an additional temporary array could be needed for the functioncenter) with this proposition. -
The proposed name for the variance function is
var. But what aboutvariance(or other propositions)?
Others:
Octave var
R var
Julia var
Numpy var
Requesting feedback from (at least) @certik @milancurcic @ivan-pi @aradi @leonfoks