Most experienced data warehouse builders will tell you that data quality is a critical success factor for high return on investment. A data warehouse without quality data requirements, quality data
design, quality data ETL (extraction / transformation / load), quality data access, etc. – is very often a failed data warehouse. I have often said that low quality meta data to support a decision
support environment can be more dangerous than no meta data at all because it promotes ill-informed (as compared to un-informed) decisions. When it comes to building a successful warehouse, data
quality is a necessity and not a luxury.
Then along comes Joe-clerk to foul everything up. Let me explain…
The other day (in the waning hours before the holidays) I stepped into one of my favorite stores that sells everything from electronics to appliances to computers to software to music (and many
items that cross the five boundaries). My mission was to buy a pair of speakers. The pair that I wanted was to arrive on a truck later in the day. On my way out I grabbed a soft drink out of the
cooler and went to the register to pay by cash.
At this particular store, the first question they ask at the check out line is “What is your zip-code?” I am used to this by now. This store knows exactly how many people on my street own
computers and the games that our kids (yeah right) play the most. This by itself is not frightening. It is the grocery store across the way that knows my food buying habits that I worry about (at
least they don’t sell beer in the supermarkets).
On this day, instead of asking for my zip code, Joe-clerk entered in “15102”. Joe did not ask me for my zip code. I do not live in 15102. I asked Joe why he did that. He said the
zip code did not matter on small cash sales. He continued, “THEY” were only tracking the large sales (business rule #1: enter 15102 for all cash sales under 20.00? – I think not!). Joe told me
that he was trained this way and his colleagues did this all the time as the “default” (I saw him hand-enter it). I did not have the time or reason to explain to the young guy that his company
had a reason for collecting this data. I left the store wondering about the people in 15102. I left the store wondering about what Joe’s actions meant to his company.
A large percentage of the traffic in this store this holiday season came from 15102. The people in 15102 bought a lot of pop and magazines. In fact a high percentage of the people in 15102 bought
little of anything else. It was very evident that the people in 15102 preferred the “real thing” and spread their interests between game and music magazines. Repeatedly, the people who lived in
15102 just did not buy the expensive stuff.
Now lets take a look into the future… The decision makers at this company looked at the data and decided to stop conducting mass mailings to 15102 because such a high percentage of the people
from that zip-code did not buy high-ticket items. Being that 15102 was right next door to this store’s community, sales go down, the store closes and there are posters plastered all over the wood
boards that cover the windows.
All of this because Joe entered my zip code as 15102? In reality, Joe and his friends messed up. No biggie! Right? Wrong! Joe’s company had just spent more than a million dollars on a data
warehouse. Joe’s company altered their front-line programs to collect the zip code because they use this information for their direct marketing campaigns. Joe and his friends cost themselves their
jobs and now I will have to shoot across town to find a similar store. End of story.
Embellished? Certainly! But you get the point.
Does your company ever make a mistake like this one? Do your systems provide default values that skew statistics? Do the people on the front-line understand why they are entering what they are
entering? Are there unwritten policies and rules for entering data? Is data quality an after-thought or a “document it now and we’ll take care of it later” issue? Does data quality get the
attention it requires? Answer these questions quietly to yourself because you may not like your responses.
Data quality is not something to be taken lightly. The quality of data and information from inception to intelligence is critical to the success of most companies. This means that Joe-clerk should
be taught the importance of his job (down to the data entry) in the grand scheme of corporate success. Joe-clerk’s work should be monitored and problems should be collected and corrected. There
are too many touch-points with data to let these controllable instances go by without notice and action.
Perhaps I should think about giving the CIO at this company a call. I am thinking that the reason-code for returns and reason-code for customer complaints should be checked out too! And that
Joe-clerk probably needs some education in data quality.