Wednesday, June 24, 2009

Ninthers, Tukey and statistical approximation

A response to a post on The Endeavour about Ninthers and Tukey:

There are 2 useful things in the ninther concept: approximations to bulk statistics and the calculation of a median without any sorting.



Generalising from 3 groups of 3 to n groups of m, we could still calculate a median from a series of chunks of the dataset, but we would need to sort.



This would suggest problems when working on Very Large Datasets, but consider the case of the most annoying dataset - randomised data. If we take a chunk of data from this dataset, we can approximate the statistics of the bulk with the statistics of the chunk, or a series of chunks.



The answer may be, then, to randomly pick out a series of 9-value chunks, and calculate a series of ninthers. That way the number of comparisons per total values can be less than 1.



O(1) to O(N), depending on how accurate you require your statistics to be.

No comments: