Over the last 12 years, I have been involved in many customer data integration projects, a number of which have been data quality audits. Although it would be nice to think that most of them were
regularly scheduled ‘health checks’, the reality is that many were initiated only when an organization was convinced that poor data quality was negatively impacting its business — they just
didn’t know how much.
Some of the most damaging data quality problems I’ve witnessed were exposed during a data quality audit of the operational customer system of an international blue-chip company. Soon after
spending millions of dollars to implement their new system, the company suspected that there was a problem with its data quality. Unfortunately, their suspicions were correct. A rigorous data audit
revealed that only 48% of the records within the new operational customer system were clean. A crisis to be sure, regardless of the system, but a monumental crisis for an operational
system – a world where error rates of a half of a percent can literally cost the organization millions.
Although some of these data audit projects eventually revealed that the project team simply overlooked the importance of ensuring quality data, in most cases the project teams had done their due
diligence in reviewing, evaluating and ultimately selecting data quality technologies.
When audits revealed low levels of data quality after such pains had been taken to select data quality tools to prevent the ‘garbage in, garbage out’ phenomena, the project team couldn’t easily
identify what had gone wrong. Interestingly, the failure point often was that the team didn’t follow through to ensure that they put their data quality investment to work.
To avoid a similar mistake, it is important that any enterprise concerned with the integrity of their customer data review the eight critical components of a customer data management environment.
These are:
1. Business-driven accuracy definitions and thresholds
Accuracy can be defined in many different ways. What’s critical is that you make sure that your definitions are relevant and appropriate for your business and its intended use of the data.
Some data quality gurus will tell you that zero tolerance is the way to go. However, in an imperfect world of limited resources and budget constraints, it is important to recognize that there are
some accuracy targets that can add cost with virtually no ROI benefits.
For example, the best data quality audit result I’ve seen from a large operational system was an accuracy rate of 99.87% for a database of 30 million customers. An earlier audit of the data came
up with a score of 99.5%, also a very impressive rate. However, since the use of the data was for the call center, the organization invested significant time and effort to improve the rate to the
highest level possible.
The initial audit revealed that the offending .5% were due to notes within the customer record inserted by call center representatives, notes such as “difficult customer” in the name line –
something the company did not want to be read out loud during a customer interaction. These types of issues are not as simple to detect and repair as many other error conditions associated with
name and address data. In this case, the time and effort the company spent to improve its score from 99.5% to 99.87% was about the same as the time and effort it took to achieve the original
accuracy rate of 99.5%. The ROI of such an investment can legitimately be questioned, particularly since the likelihood of a customer service representative divulging this data to a customer is
comparatively low.
If the use of this data did not involve direct customer interaction, perhaps the company would have been satisfied with an accuracy rate of 99.5%, or perhaps not. But what is critical is to define
the relevant accuracy definitions for your organization and use.
Just as uses for the data vary, so do the cost implications for working with ill-considered accuracy definitions. If you have three Beth Miller customer records instead of one, you have two
duplicate records, and the negative impact to your operations varies by purpose. If you are using this data for direct mail, you will have two wasted mailings. If you are using this data for
sophisticated predictive modeling to decide what product to market to these three different customers, then you have three fragmented profiles – and you will likely make 3 uninformed, perhaps even
costly decisions. On average, one duplicate record corresponds to 1.8 fragmented profiles, so a 6% duplication rate implies that more than 10% of these records are unfit for modeling.
Also make sure that each definition has either revenue growth associated with it, or a cost savings. Cost savings can be broadly defined to include ‘cost of doing business’ activities, such as
compliance. For compliance applications, the cost savings might be the avoidance of penalties and fines, or damage to corporate brand.
Finally, these accuracy definitions and thresholds should be reviewed regularly to ensure that they keep pace with your changing business environment.
2. Data investigation and analysis
It seems obvious that to plan for any successful journey, you need to know both where you are going and from where you are starting. Yet, many data integration projects fail because the team never
establishes a clear and accurate understanding of the starting point – the state of the legacy data.
A customer database is like a growing, evolving organism. Information is added, deleted and updated … structures, relationships and fields are altered to meet changing business and customer
information needs … and databases designed for one application are shared with others.
In an ideal world, all of these updates and adjustments would be documented and available for reference when a problem or question arises. But we all know that this usually isn’t the case, for a
variety of reasons. Unfortunately, problems that are hidden or patched-over in daily operations suddenly become major stumbling blocks during enterprise customer data integration.
With such unknowns, the result is often disastrous. In fact, studies show that 88% of all major data integration projects run over schedule or budget – or fail altogether. The main reason is the
lack of knowledge about the source data.
Without such knowledge, the mapping specifications for the integration must be based on assumptions, outdated information, and educated guesses. No one knows if the mapping specifications are
correct until the testing phase. If the test fails, the data analysts have to go back to the beginning, revise their assumptions, develop new specifications, and test again.
In major data integration projects, this trial-and-error process frequently takes up to 10 iterations before the integration is successful. The added cost can easily run into hundreds of thousands
of dollars — and seriously impact the ROI of the initiative that prompted the integration.
But the potential risk in terms of cost overruns is small compared to the possible impact of months of implementation delays. If the integration is related to a mission-critical CRM initiative, the
window of opportunity to achieve a competitive advantage may be lost. Organizations today simply don’t have the luxury of making a data integration mistake. They have to succeed the first time –
or be left far behind.
The increasing use of customer data across the enterprise is creating a serious dilemma. The more ways the data is used, the more valuable it becomes – but also the more susceptible it is to
degradation. Entering into a major data integration initiative without identifying the integrity of the database puts the project at substantial risk of failure.
The most precise, cost-efficient way to determine a database’s true content, quality, structure and relationships is with a data profiling system. Data profiling identifies the existing data
problems and, consequently, saves significant time, money and IT resources in making the necessary repairs before the integration takes place.
3. Comprehensive conversion plans – and dress rehearsals
A successful conversion plan requires two dress rehearsals, but only one is usually done. The common test that most project teams conduct is one measuring volume performance using all of the data.
The forgotten test is the one that should be conducted much earlier in the project on statistically significant samples that are representative of the larger data set.
The results of such samples are necessary to establish acceptable thresholds for error rates. Working with manageable samples, and the assistance of an automated review capability, it’s possible
to manually count errors in the records from ‘clean’ systems to establish a ‘passed error rate’. Only when the passed error rate for each definition is within acceptable thresholds should the
volume performance test occur.
There is a shortsighted bias against using review tools to perform manual review on error records. In an effort to avoid any sort of manual review, you can tune a scrubbing tool to automatically
define a name such as ‘John Francis Bank’ as an individual rather than an organization. However, the unintended result of such over-tuning may mean that records like Chase Manhattan Bank might
also be treated as an individual because ‘Chase’ can be considered a first name, usually male.
By using a two-tier testing approach, you ensure that business representatives understand what decisions they need to make on data standards and de-duplication rules – and that these rules were
based on a true understanding of the data.
4. Symmetrical update routines for batch and online worlds
In the case of data updates, vendor APIs built into your own software updates, rather than vendor batch tools, are typically used.
In order to ensure the ongoing quality and consistency of your data, you must make sure that the business rules determined in the conversion testing (Step 3) are also incorporated into these online
update routines.
For example, if a call center operator wants to add a new customer, the online update routine should treat the record similarly to the batch update process.
5. Preventative Maintenance
No matter how good a job you do when building the system, no matter how good the update functions, data will degrade over time.
Why? Even with the best software in the world, 1,000 online operators will not make completely consistent decisions. And without manual intervention, overnight batch routines tend to add duplicates
rather than risk deleting customers in error.
It is important to include the preventative maintenance process into the system at the time it is built, or it will likely not be added until several years into the project when a C-level executive
realizes that the data is poor and begins to ask questions about the wasted time, money and effort that went into building and maintaining an inaccurate system. Integrating new processes and
software after “go-live” is always much more expensive than addressing the issues early on.
6. Data Audits
If you can’t measure it, you can’t manage it. It is amazing how many organizations give lip service to the value and importance of their customer systems — such as CRM — yet have not conducted
any quantifiable internal or external data audits in years – if ever. Worse yet, many organizations expend significant resources on data integrity, yet fail to monitor the efficacy of such
investments over time.
Audits should be conducted regularly against the accuracy targets and thresholds identified in Step 1. They can help to determine what periodic maintenance is required, and can also be used as
input to improve update routines and staff training.
7. Enhance customer data
You should always be looking for ways to improve your customer information — whether by appending useful demographic information to expand your understanding … by building and monitoring
traditional households … or by creating extended “network households” based on some other customer grouping criteria.
Keep in mind that the ‘R’ in ‘CRM’ should be seen as a multi-dimensional concept that encompasses more than just the direct relationship that exists between a buyer and a seller. Relationships
between buyers can be significant factors in determining appropriate pricing and service levels for customers and suppliers alike.
Much like business rules built into the online and batch update functions should be consistent, any data augmentation or enhancement strategies should mirror your established rules and definition.
Remember that data augmentation can provide priceless insight into your customers’ behavior patterns, but it can also distort the validity of customer models through the use of inaccurate or
incorrectly applied information.
8. Data Stewardship
The data management model and steps outlined in this article may seem fundamental and simple, but they are not easy to implement.
Forgetting about data integrity is easy once a major project has completed. Yet it is imperative that a data ownership and oversight function be established at the outset of a major project. Such a
group should be charged with helping to define the business-driven accuracy definitions and thresholds, ensuring that audits and periodic maintenance happen, and that processes are continually
improved.
Maintaining such a data oversight function — with a direct report to the executive level — typically occurs only when data quality is measured and tracked in terms of ROI and costs savings.
What’s Next …
Now that we’ve set the necessary foundation for our customer data integration projects with a closed-loop data management environment, we can move on to the core of most customer data integration
project failures: poor data quality.
The next article in this series will address this high-profile, but often low-budget, low priority issue. It will provide ways to score your organization’s true commitment to data quality, and
practical steps for obtaining real executive buy-in for funding this important issue.