Guesstimation and Benford's Law

Nice senior colloquium presentation today (9/22/08) by Richard McDowell ’09 on Benford’s Law. This law explains why, when looking at numbers that come to us from newspapers, stock quotes, and various data sources, the digit 1 appears most often as the leading digit. Benford’s Law derives from the principle that these worldly numbers tend to crop up in such a way that their logarithms are evenly distributed.

This has an interesting connection with an idea in the new book Guesstimation, by Lawrence Weinstein and John A. Adams. The book is an exposition on back-of-the napkin type calculations where you want to estimate some quantity within and order of magnitude. (Quick: what is the average annual rainfall in the Amazon basin?)

They claim that if you guess upper and lower bounds for your unknown quantity, then your best estimate for the quantity should be the geometric mean (the square root of the product) of the two bounds, rather than the more intuitive arithmetic mean. Indeed, if you think some quantity is bigger than 10 and less than 100, then the arithmetic mean of 55 is a poor guess: it’s five times larger than your lower bound and only about half of your upper bound. Whereas the geometric mean of 31.6 is about three times your lower bound and  about a third of your upper bound. (The average annual rainfall in the Amazonian basin has got to be more than 10 and less than 1000 inches. Geometric mean = 100. Actual average, reported on various web sources, is 80 inches.)

The connection with Benford’s Law is this: suppose the number you are trying to estimate is from a pool of numbers whose logarithms are uniformly distributed on some interval. It’s a nice exercise to show that the median value of all such numbers between two given bounds is (drum roll….) the geometric mean of the two bounds.

In short, Benford’s Law tells us that the numbers that come to us from worldly sources tend to have uniformly distributed logarithms. If you are trying to estimate such a number and have a guess for the upper and lower bounds, then a good guess for the number itself is the geometric mean (square root of the product) of the two numbers.