Data-mine this!

(The term “data mining” is overloaded these days. People commonly use the term “data mining” to refer to the act of using OLAP to interactively explore a data warehouse, and the term has become so popularized that it is even used synonymously with “data analysis”, regardless of the method or the context. In this post, when I use the term “data mining”, I am referring specifically to the application of machine learning and statistics to explore data sets in a discovery (not hypothesis-driven) fashion. Sometimes referred to as “knowledge discovery in data” (or KDD). Anyway, moving on …)

When data mining is taught, a certain pedagogical example is commonly used to illustrate how it can uncover unusual but useful trends. This example is:

Among items purchased at grocery stores, the item most-frequently associated with the purchase of diapers is beer.

(I.e., diapers and beer are commonly purchased together, like chips and salsa, or milk and cookies. A theory to explain this correlation is that new fathers, while running the paternal errand, are rewarding themselves since — as a new father — they can’t go out drinking with their friends as often.)

Many variations of the story behind this “beer/diapers correlation” exist. The most-common allegation seems to be that it was the result of a WalMart market-basket analysis. Some variants claim that these beer/diapers purchases were made mainly on Friday afternoons, by men between the ages of 25 and 35. Furthermore, some versions say that the store capitalized on this trend by putting a beer display next to the diapers, and thus sold more beer than ever.

Alas, many doubt this story’s veracity. (You can read more about the diapers/beer correlation and its likely origin here.) While this doesn’t stop it from being a good illustrative example, it does sadden my heart a little, knowing that such an interesting trend might not be true.

Anyway: I had an errand to run today: we ran out of cloth diapers, so I needed to go to the grocery store to pick up an emergency pack of disposable diapers, to keep us supplied until the resupply visit from the diaper service. There was nothing else on the shopping list except diapers … could it be more obvious what needed to be done?

Mine this!

I say, if this isn’t an actual trend, let’s make one! New fathers everywhere: unite! Become a statistic! Always purchase beer with diapers!

(Plus, do you really need an excuse? I don’t know about you, but I don’t need sophisticated data mining techniques to learn: good beer + new dad = happy.)

