Published in TDAN.com January 2006
There is a famous, although perhaps apocryphal, story regarding data mining about beer and diapers. The story relates how some supermarket analyzed their product sales and noticed a correlation
between the purchase of beer and diapers on Friday afternoons and evenings. Noticing this correlation, the store management determined that young wives were sending their husbands off to the store
on Friday afternoons to pick up diapers, and while the husbands were there, they decided to buy their weekend refreshments at the same time. Because of the correlation, the store managers decided
to place beer next to he diapers on the shelves to encourage other young fathers toward the same behavior.
Whether or not the story is true, it is often used to demonstrate different aspects of the power of data mining, and there are several key elements to this story:
- The ability to use data mining to determine correlation between different operational transactions,
- The business acumen to identify root causes for the correlation, and most importantly,
- The ability to take action based on what has been learned.
While I am sure that there is some truth in the story, the thing that makes me wonder the most is how it perfectly reflects what is supposed to happen, which differs greatly from what usually
happens during a data analysis project. The first tip-off is that business managers are proactively looking to discover intelligence out of their data; the second is that it doesn’t discuss
the minimum 6-9 month lag between expressing the desire to analyze transactions and actually getting results, and the third is that there are managers poised and ready to actually take some action
based on what they learned.
However, the value of the story lies in its simplicity, as it conveys some very powerful messages. The first, and probably most important one, is the concept of the association rule, which
indicates a dependence of one set of attribute characteristics based on the values of some other set of attribute characteristic, with some level of support and confidence. For example, we might
say that 20% of the time that someone buys diapers on a Friday afternoon, they also buy beer. The 20% is the confidence, which is the percentage of the time that the association exists. The support
describes the percentage of the overall transactions in which the rule is observed.
Association rules are often used in an application called “market-basket analysis,” which is used to review how often certain events take place at the same time. The simplest example,
much like our beer and diapers story, is the market basket one uses at the supermarket. The object is to determine which products are purchased at the same time in order to exploit the correlation,
either by attempting to encourage the same behavior by making it easier to take place (e.g., moving the products together on the same shelf) or perhaps to prevent it from taking place, if
purchasing the correlated products is undesirable.
More generally, association rule mining is used to find events that take place at or near the same time. Other examples include network analysis (looking for sentinel network events that might be
precursors to network failures), attrition analysis (what activities lead up to a customer’s decision to resign their affiliation with an organization), or even to determine business decision
strategies, such as predicting the success of commercial real estate purchases, marketing channels, or advertising approaches. Association rule mining has also been deployed in text mining and
entity extraction applications to help determine guidelines for knowledge discovery.
But let’s quickly go back to the beer and diapers story. So far we have only touched on one of our key points – the ability to mine the data to discover the association rules. The two
important points not yet addressed involve not the technical side, but the business side. Without having a sound basis for cooperation between the technologists and the business clients, it is
almost impossible to achieve any kind of success using these approaches. First, there must be an agreement as to the value of organizational data and the potential for both discovering and
exploiting actionable knowledge, and this can only exist when sound data management principles are in place, including:
- A data quality management program, since without high quality data the ability to rely on discovered knowledge is severely limited;
- Sound and repeatable ROI models for data exploitation projects, since the business clients taking the chance on funding knowledge discovery should be assured of the potential to be gained from
taking that risk; and - Nimbleness in taking action based on what is learned, since having the knowledge without being to act on it is worse than not having the knowledge at all.
Realize that the value to be gained from a data mining program may be largely strategic during its early days, and recognize that the early value to be gained is the ability to change the ways that
individuals interoperate within an organization to enhance collaboration to ensure greater success as the program matures.
Copyright © 2006 Knowledge Integrity, Inc.