The parable of the beer and diapers

Never let the facts get in the way of a good story


Database myths and legends (Part 3) BI (Business Intelligence) is about extracting information from data and data mining is an important part of that process. Data mining is a process that looks for patterns in data, so in a sense it is like querying the data. The crucial differences between simply querying the data and data mining can be summed up as intent and scale.

When humans query data we start with an idea, such as: "I think that we sell more DVDs to males than to females." And then we run a query to test the idea and the answer either confirms or disproves our hypothesis. A data mining algorithm doesn't have ideas. It has no intention of testing ideas for the simple reason that it doesn't have any.

What it does have is huge processing power at its disposal, so it simply tests a very large number of possible correlations. This can be done by firing a very large number of queries at the database, but that approach can be very slow.

The algorithms used nowadays often use more sophisticated approaches; they can, for example, create a multi-dimensional data structure and then examine it looking for patterns, and/or outliers. When they find something of interest, they flag it for attention.

It is easier to illustrate the difference between querying and data mining with a good example and, already firmly enshrined in BI mythology, is the "beer and diapers" story.

It goes (with minor variations) like this:

Some time ago, Wal-Mart decided to combine the data from its loyalty card system with that from its point of sale systems. The former provided Wal-Mart with demographic data about its customers, the latter told it where, when and what those customers bought. Once combined, the data was mined extensively and many correlations appeared. Some of these were obvious; people who buy gin are also likely to buy tonic. They often also buy lemons. However, one correlation stood out like a sore thumb because it was so unexpected.

On Friday afternoons, young American males who buy diapers (nappies) also have a predisposition to buy beer. No one had predicted that result, so no one would ever have even asked the question in the first place. Hence, this is an excellent example of the difference between data mining and querying.

The story goes on that, once the correlation was uncovered, it was easy to back extrapolate from the effect to the cause.

  • Young American males frequently indulge in ritualised carousing behaviour with friends of Friday nights.
  • Carousing usually involves the consumption of beer.
  • Most young American males only buy diapers after they have fathered offspring.
  • Offspring acquisition is a known carousing inhibitor.

So the proud new father is walking around the store on Friday afternoon. He knows there is no way that he is going to get out of the house to join his mates at the bar. However, there is nothing to stop him from drinking beer at home. All he needs is to be reminded of that fact. After seeing the results of the data mining, Wal-Mart moved the beer next to the diapers and beer sales went up.

Like all good myths, the beer and diapers story does have its origins in fact; but sadly, almost all of the detail (and specifically the detail that makes it a great BI story) is probably fabrication. We are all indebted to Daniel Power for uncovering the origins of the story. He provides all of the detail here.

In short, he traced the story back much further than I would have believed, way back to 1992. At that time, Thomas Blischok was the manager of a group at Teradata. His group looked at point-of-sale data from Osco Drug stores - 1.2 million baskets worth in all. It isn't clear what tools the team was using. They are described as "state-of-the-art, query generation tools", which were undoubtedly leading edge in 1992.

Those queries revealed that, between 5pm and 7pm, customers tended to co-purchase beer and diapers. No correlation with age or gender was established; although it isn't clear whether these questions were ever asked.

In addition, the store chain does not appear to have exploited the information by moving the products around. So we have a correlation between beer, diapers and time, but no correlations with age, gender or day. Worst of all, it is very much open to question whether the techniques used would qualify as data mining in its modern sense.

So, where does all that leave our beer and diapers story as an example of data mining? Well, I for one am prepared to accept that it didn't really happen in the way my grandfather described it to me as I sat upon his knee. Sigh.

However, that doesn't mean that we have to consign the story to Room 101 (for those readers outside the UK, also a popular TV comedy). The story is so popular because it is a good illustration of the difference between querying and data mining. The facts don't change that at all.

The image of a forlorn young man trudging round a supermarket on Friday night; his day suddenly brightened by the sight of a stack of beer sitting incongruously next to a pile of diapers is somehow wonderfully compelling. And none of us would have asked that question in the first place.

There is no reason we cannot continue to use it as an illustrative story; as long as we are aware that it is simply an allegory or fable. In fact, given that it is a fable about a set of data, we could call it a table fable. ®


Other stories you might like

  • Prisons transcribe private phone calls with inmates using speech-to-text AI

    Plus: A drug designed by machine learning algorithms to treat liver disease reaches human clinical trials and more

    In brief Prisons around the US are installing AI speech-to-text models to automatically transcribe conversations with inmates during their phone calls.

    A series of contracts and emails from eight different states revealed how Verus, an AI application developed by LEO Technologies and based on a speech-to-text system offered by Amazon, was used to eavesdrop on prisoners’ phone calls.

    In a sales pitch, LEO’s CEO James Sexton told officials working for a jail in Cook County, Illinois, that one of its customers in Calhoun County, Alabama, uses the software to protect prisons from getting sued, according to an investigation by the Thomson Reuters Foundation.

    Continue reading
  • Battlefield 2042: Please don't be the death knell of the franchise, please don't be the death knell of the franchise

    Another terrible launch, but DICE is already working on improvements

    The RPG Greetings, traveller, and welcome back to The Register Plays Games, our monthly gaming column. Since the last edition on New World, we hit level cap and the "endgame". Around this time, item duping exploits became rife and every attempt Amazon Games made to fix it just broke something else. The post-level 60 "watermark" system for gear drops is also infuriating and tedious, but not something we were able to address in the column. So bear these things in mind if you were ever tempted. On that note, it's time to look at another newly released shit show – Battlefield 2042.

    I wanted to love Battlefield 2042, I really did. After the bum note of the first-person shooter (FPS) franchise's return to Second World War theatres with Battlefield V (2018), I stupidly assumed the next entry from EA-owned Swedish developer DICE would be a return to form. I was wrong.

    The multiplayer military FPS market is dominated by two forces: Activision's Call of Duty (COD) series and EA's Battlefield. Fans of each franchise are loyal to the point of zealotry with little crossover between player bases. Here's where I stand: COD jumped the shark with Modern Warfare 2 in 2009. It's flip-flopped from WW2 to present-day combat and back again, tried sci-fi, and even the Battle Royale trend with the free-to-play Call of Duty: Warzone (2020), which has been thoroughly ruined by hackers and developer inaction.

    Continue reading
  • American diplomats' iPhones reportedly compromised by NSO Group intrusion software

    Reuters claims nine State Department employees outside the US had their devices hacked

    The Apple iPhones of at least nine US State Department officials were compromised by an unidentified entity using NSO Group's Pegasus spyware, according to a report published Friday by Reuters.

    NSO Group in an email to The Register said it has blocked an unnamed customers' access to its system upon receiving an inquiry about the incident but has yet to confirm whether its software was involved.

    "Once the inquiry was received, and before any investigation under our compliance policy, we have decided to immediately terminate relevant customers’ access to the system, due to the severity of the allegations," an NSO spokesperson told The Register in an email. "To this point, we haven’t received any information nor the phone numbers, nor any indication that NSO’s tools were used in this case."

    Continue reading

Biting the hand that feeds IT © 1998–2021