Recently I ran into a situation, where I wanted to calculate a mean of an unknown size set. My first naive idea was to calculate the average value between current mean and the new value:
xs = [3, 7, 6] # Assuming we don't know the length mean = x[0] n = 1 while n < len(xs): mean = (mean + xs[n])/2 n += 1 Which quickly reveals to be simply wrong:
(3 + 7 + 6)/3 = 5.3333 ((5/2 + 7/2)/2 + 6/2) = 5.5 In order to find the actual relation between current mean and the i'th value, I started comparing mean from 2 and the mean from 3 values:
(3 + 7)/2 = 3/2 + 7/2 (3 + 7 + 6)/3 = (3 + 7)/3 + 6/3 From here, it's possible to rewrite mean from 3 values in terms of mean from 2 values:
(3 + 7)/3 + 6/3 = = (3 + 7)/2 * 2/3 + 6/3 = = (3/2 + 7/2)*2/3 + 6/3 Where I noticed the pattern:
mean(i) = mean(i-1) * (i-1)/i + x(i-1)/i Which gives us the correct algorithm for iterative calculation of the mean:
xs = [3, 7, 6] # Assuming we don't know length mean = xs[0] n = 1 while n < len(xs): n += 1 mean = mean*(i-1)/i + xs[i-1]/i While I know that in this example iterative calculation is unnescessary, I found this real handy for implementing a segment growing algorithm, where I decide what pixels to add to the segment based on current segment's mean value.
Top comments (1)
The normal way is to keep a count of, and the sum of, the numbers so far. The sum divided by the count is the mean at any point
Further statistics can calculate other values in a similar way, such as standard deviations.