Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To be clear: are you saying denglish's rationale is incorrect? (I ask because it feels legit, but, alas, that doesn't mean it is.)


His rationale is not incorrect but incomplete.

Essentially, he's arguing that since the Benford distribution of leading digits is the sole fixed point under the scaling operation, it's the most natural distribution to expect in large collection of measurements. Since units of measurement (e.g. dollars, meters, miles) represent arbitrary quantities, and the data set could be examined using literally any unit of measure (a unit of measure being a scaling operation, e.g. meters -> feet multiplies each datum by 3.26), a sufficiently large set of measured data (e.g. an almanac) can be expected to obey Benford's distribution.

Benford's Law is also not true of specific distributions that are very tight. Consider IQ. That the mean is 100 is completely arbitrary, but the standard deviation of ~15% is not. Observed ratio IQs in healthy children are log-normal with a multiplicative standard deviation of 1.15-1.16; in other-words, the 85th-percentile 6-year-old will have the cognitive maturity of an average 7-year-old, a fact that is independent of the unit of measure. (Adult "deviation IQs" are a different matter entirely, as they are "forced" to conform to a normal distribution, e.g. a person who scores in the 99.0th percentile will be "assigned" a z-score of 2.33, corresponding to an IQ of 135.) Obviously, with 50% of IQs having a leading digit of 1 and almost none having a leading digit of 2 or 3, this is not a Benford distribution. You could use a different arbitrary scaling factor, setting the median to 50 instead of 100, but then leading digits of 5 and 6 would be overrepresented, with virtually no 1s or 2s. The issue, of course, is that normal IQs are very tightly distributed in the log-space and don't span nearly an order of magnitude, so we will never get a Benford distribution no matter what scaling factor we choose.

The other problem with the OP's argument is that it doesn't apply to figures like fatality figures in natural disasters, or sizes of cities, neither of which involves an arbitrary unit, but both of which exhibit Benford-esque distributions, due to the multiplicative rather than additive compilation of the variables involved. An additive compilation (e.g. sum) of a large number of variables (e.g. height from genes) exhibits a normal distribution, for which Benford's Law does not apply. However, a multiplicative compilation (e.g. product) of a large number of random variables will have a log-normal distribution, and if the variation of X is over many orders of magnitude, its distribution will be locally flat enough (in the log-space) that Y - floor(Y), where Y = log X, will be approximately a uniform choice out of [0, 1), leading to the Benford distribution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: