In case you haven’t heard, it’s Math Awareness Month and this year the Mathematics Association of America (MAA) has chosen as the 2012 theme — the Data Deluge that surrounds us all. Because this Big Data theme obviously involves statistics, the MAA has teamed with the American Statistical Association (ASA) for many of their events. One of those was the monthly lecture at the MAA’s Carriage House in Washington D.C. (http://www.maa.org/dist-lecture/past-lectures.html). I was invited to talk about data mining, the area that’s been my focus for quite some time now. The audience, a mix of high school math and stat teachers, retired mathematicians and other math fans provided a friendly atmosphere as I gave them a whirlwind tour of some the methods and the kinds of problems that data mining can and can’t solve. During the talk I confessed that until December I thought that I knew the world of “Big” Data pretty well – that is until I heard talks by two groups of physicists. The first group, from Switzerland, is hunting for the Higgs Boson, the elusive “God particle” that so far has escaped detection. The other group, from Australia, is looking for evidence of gravitational waves. Both groups collect an enormous amount of data, from Petabytes (1000 Terrabytes) of data a day for the wave group to Petabytes per hour for the particle group! That’s a lot of data! As I listened, I quickly changed the title of my upcoming talk from “Challenges of Big Data” to “Challenges of er… Medium Sized Data”, since even the kinds of data that I’m used to seeing — maybe 10 million customer records with 500 or 1000 variables on each amounts to “only” about 100 Gigabytes, a mere 1/1000^{th} of a Petabyte. As I listened to the physicists talk about their data and their searches, it occurred to me that in both cases, they analyze massive amounts of data, but as of yet, haven’t found what they’re looking for! When I questioned them on this, the particle group admitted that what I was saying was true, but one of the wave scientists got very excited and said “No. No. We’ve seen the wave. We’ve seen it, definitely. We just can’t quite separate it from the noise yet”. I’m still thinking about that one…

