What I Learned from Executing Data Quality Projects

Getting to great data quality need not be a blood sport! But there must be a method to the madness, and you need to be persistent and properly involve relevant stakeholders to achieve long lasting improvements. This article aims to provide some practical insights gained from enterprise master data quality projects undertaken within the past ten years at global medical device and pharma companies.

Diverse projects included company acquisitions and spinoffs, warehouse network consolidations, ERP implementations, and other digitalization initiatives requiring foundations in master data management. I hope that some of these observations will resonate with fellow business analysts, data stewards, business function leaders, ERP systems program managers, MDM-managers, internal and external business and technology consultants, and any other readers of TDAN.com intrigued by the subject matter. I also hope that you’ll be inspired by some of the graphics which you might find useful for conveying these ideas in your own presentations.

More Carrot, Less Stick

Since my background is in regulated medical related industries, it doesn’t come as a surprise to me that most data quality initiatives are still initiated out of fear of non-compliance or rather of the consequences of being caught out as non-compliant. Yet, if we only spend our time patching up broken processes and keeping the wolf from the door, we can’t expect to unlock competitive advantage from our data. It’s time for organizations to move beyond compliance such as the traditional beating stick to encourage people to take data quality seriously. I have found the sporting world analogy of ‘defense and offense’ to be particularly useful when trying to get across these concepts to all levels in an organization. Most people will understand that defense involves protecting what’s important, upholding laws, ensuring reputation, and avoiding situations which can get us into trouble and cause fines to be payable. The offense side of data quality hasn’t received as much attention- largely because it has been somewhat intangible or perhaps, we lacked knowledge of the means to calculate it directly.

We need to think in terms of:

How much quicker could we process products in our distribution network if the item master was accurate?
How many more sales can we expect because we have defined and described products properly in online catalogs?
How much sooner can we integrate an acquired company because we have established and repeatable methods of analyzing data quality, and the capabilities to onboard data into our systems?

Whatever your specific reasons, I recommend summarizing them on one slide similar to the example provided below:

Further reading suggestions:
*Infonomics* by Douglas Laney.
*Click to view larger.*

Turn Data Quality Knowledge Into Action

Another thing I’ve learned over the years is to invest in myself and to take personal control of improving my data quality-related skills as well as soft skills needed to manage stakeholders and run projects. I recommend seeking out training opportunities consequently and always with the intent of putting into practice what you have learned. If you are looking for ideas, there are plenty of helpful people out there participating in data forums on LinkedIn or ask around at some of the conferences such as EDW (Enterprise Data World), DGIQ (Data Governance and Information Quality Conference), or some of the more specialty online events such as the Master Data Marathon. Also sign up for focused newsletters such as TDAN – The Data Administration Newsletter.

If you are a manager, I recommend you take a regular skills inventory of your team to figure out development needs. Or seek some consulting help to do so. Most team members will be glad you did, while others will need gentle nudging. In these times of the so-called ‘great resignation,’ putting your money where your mouth is in terms of training budgets will help retain and attract staff who value lifelong learning.

WIIFT (What’s in It for Them)

WIIFT (What’s in It for Them) is a play on the more common term WIIFM (What’s in It for Me). Let’s face it, someone is footing the bill for the effort to improve data quality. They need to know that they are getting a decent return on investment (ROI) for the money they are pumping into salaries, consulting fees, and potentially software to enable data quality improvement initiatives. Executives don’t need to know all about your impressive counts of data records cleansed or merged, but by all means, produce these stats when appropriate (or asked). Instead, focus front-and-center on the improved business outcomes made tangible by achieving greater levels of data quality. Most managers will listen if you can help them:

Reach the same or better levels of quality with fewer controls (less people involved, fewer secondary checks).
Avoid scrambling or rework by having the right data maintained, at the right time prior to when it is needed in a business process.
Trust reports because of confidence in the underlying data.

Data Quality Specifications

Obtaining current and relevant data specifications can be a time-consuming affair. Yet we need to know what ‘good’ is supposed to look like. Where they don’t exist (or can’t be found), we may have to re(write) specs before continuing with any planned data profiling exercises. The task will be much easier if the organization has already had some level of data governance implementation – whereby these artifacts may be both available and actively curated. If Enterprise Architecture is a well-established discipline in your organization– there is a good chance your EA people will know where to find what you need as prerequisites. Befriend them! And whatever you do, don’t discount legacy system DBAs (Database Administrators) who usually have a wealth of knowledge to mine.

While my personal style is that I will ask to interview current business and IT SMEs anyway, it’s imperative early on to gauge the level of available (and valuable) documentation around data specifications. This advice goes for internal teams who need to work out the actual level of work that lies ahead and then set management expectations, but it equally applies to any data consultants who would be wise to get a feel for the state of this topic in any potential client organization before they commit to assisting with a cleanup project.

Perfection is a Roadblock to Progress

People enthusiastic about data quality tend to be perfectionists, yet great can be the enemy of the good (enough). While you should be glad to have such folk, we need to be mindful that quality has a cost. There is a point at which incremental gains in data quality outweigh the costs to maintain that level of perfection. Taking the item master in an ERP system, several hundred fields exist andyet, they are not all equally important and depend significantly on your business and process configurations. One of the ways to focus data quality improvement efforts is to spend time defining CDEs (Critical Data Elements), socializing them among stakeholders, and ultimately deciding what quality level is acceptable for each and finally which should be tackled first for improvement.

How Can We Better Control Our Test Data?

One sure-fire way to improve test performance is to have dedicated business-savvy data quality resources involved early on in software development and validation projects, and then throughout as necessary.

I have participated in several ERP projects where a small band of data quality warriors supplied top quality material master test data. To get this right, you need at least some SMEs who not only know the data inside out, but are well versed in the product and service supply chain. I can’t stress highly enough how often this is overlooked. I’m not suggesting that we handpick perfect data merely to ensure tests can pass; however, we need to know how the data ought to look and fit the new business processes. This can only be secured by involvement in process design and stage-gate reviews.

Some project managers appear to think that ‘data people’ should stay in their lane and leave business process design to functional consultants and their assigned Business Process Owners. I beg to differ. It’s a partnership because knowledge acquired about to-be process can inform any data cleansing that needs to take place. I can recall several situations where we discovered that a field that had less significance in a legacy application, would now become a primary driver for Revenue Recognition processes in the new global ERP system. What previously only caused minor reporting errors when incorrect could now impact quarterly financial results.

Business Process Workarounds

We should not be entirely inflexible to business process workarounds. After all, the alternative can’t be to do nothing and sit on our hands like spoiled children until we get what we want. Workarounds are effective when they are a temporary Band-Aid, such as a short-term solution for a few weeks after a system GoLive. It should have a defined expected time for a more permanent solution, and the follow-up to make it happen. In terms of master data management, an example could be that a new field was introduced into the material master of the new system. Data migration took care of filling it pre-GoLive. But when the first new materials are created, we quickly find that the field is left blank yet is critical to a business process. The root cause is that during the build phase, no group was assigned responsibility to maintain the field going forward. Somehow a discussion never happened between the functional team who added the field and the master data governance team who should have known about it and have already agreed to a RACI with an appropriate business function and embedded the new duties into existing procedures. The workaround, in this case might be that someone on the project team is temporarily assigned the duty to maintain the field for up to six weeks, pending clarification as to who will ultimately be responsible operationally in the business.

Cost-Benefit Analysis

One time, I created a very elaborate Cost-Benefit Analysis using a custom form that I developed specifically for a data quality project I was trying to get funded. It was very pretty and comprehensive but took way too long. I needn’t have spent that time had I just spoken sooner to our department controller. Not only did a form already exist, but it was mandatory to use it. Any requests sent in without it – would get rejected at the first hurdle, no matter how elaborately compelling they might otherwise have been!

Most firms of any size will have such a form. You might have to ask a few questions to get to the right form because it might be called something else. Some examples I have seen over the years are:

Business Improvement Request (BIR) Form
CapEx (Capital Expenditure) Form
Continuous Improvement Idea Form
Business Case Identification Form

Much More Than “Sign Here and You Are Done”

It takes more than ‘sign here, and you are done’ training to embed a data quality ethos. There is an immense potential to reassess the way we deliver and assess training for those who need to maintain accurate data. It is no longer appropriate or reasonable to assume that tracking evidence of reading procedures is sufficient to assume someone is trained or even deemed minimally capable. A quiz with a few trick questions also doesn’t prove proficiency; it merely proves that someone has learned to use the ‘back’ button in their web browser. Don’t get me wrong, I have worked most of my career to date in GMP environments and I understand that audit trail is important and non-negotiable.

A better approach may require a higher investment in permanent coach-like resources, who act as guides rather than auditors. These same resources can add further value by being close to the operational action to identify better if job aids and simulated scenario-based training meet ongoing needs.

CSI (Crime Scene Investigation): Data Quality

Over the years, I have learned to be more patient at the scene of ‘data crimes.’ While there is a certain satisfaction in making things right as quickly as possible via expert data cleansing, there is more to be gained by taking the time to understand how the data got to be wrong in the first place. For sure, there are occasions when immediate correction is justifiable, e.g., the CEO wants the newly launched product to be shipped and billable. It would likely be career-limiting to hold up the show until you are finished dusting for prints! But in many cases, the data has been wrong for a while, and it can remain wrong for another few days or weeks until you are finished doing your forensics, interviewing the ‘suspects’, and developing a theory about cause and motives. I keep a growing OneNote notebook full of ‘data disasters.’ which one day might be enough to fill a book where I’ll need to change names and dates to protect the guilty.

Some all too typical master data disasters include:

Obsolete products available in online catalogues
Product descriptions not at all related with the actual item
Downright inappropriate product descriptions! (printing out on customer invoices)
Products blocked for sale authorized markets
Products not blocked for sale in unauthorized markets
Batch management not configured for batch managed materials (and then a recall)

Data Not Yet Known

Data quality issues are often found when the technology requires data, but the business does not have the information available when the record needs to be created or updated. Many ERP systems require that mandatory product master data fields are filled in order to progress to the next stage of the setup or material use. However, depending on PLM (Product Lifecycle Management) stage gates, we might need to assign and partially set up a material number today for a finished good that may not see the light of day commercially for many months, even years later. In such scenarios, it can well be that information needed later is simply not known at this time. Examples could be operational things like product storage and handling requirements or report-enabling data such as product hierarchy assignments.

My experience is that pushing back on the business and expecting that they should know the information sooner to feed the system is unrealistic and sucks you into all kinds of stressful arguments that are not good for the business or your career. A better approach is recognizing that certain data can and will become better defined later in the setup / extension cycle. With this realization, you should build workflows and ‘catch’ mechanisms to make the necessary corrections closer to when they matter most. For example, latest before you need to ship a product for the first time – you had better by then know how it should be stored or handled in transit.

Credible Sources For Verifying Data

It can well be that different systems in a company all have pockets of data quality excellence but that it can be dangerous to pronounce that any one of them is the overall ‘book of record’ or ‘golden record’.

At one company I discovered significant discrepancies between the finished good product descriptions used in the European regional sales and service ERP systems vs. what I found in the headquarter global manufacturing system. After many weeks of analysis, we learned that the data admins of the regional system had changed the descriptions after complaints over several years that the words used in over 900 products were not sufficiently similar to what was on the actual physical product box label. Ultimately, we used the finding to update all systems. Yet the fix wasn’t simple. Since the original ‘corrections’ had been done based on direct complaints about certain product lines and country kits, only those were updated, leaving many related records inconsistent.

I don’t want to use this example to show that good intentions can make things worse, but rather to highlight that there need to be companywide forums to raise data quality issues and mechanisms to make decisions for holistic corrections to occur, as well as to integrate findings back into the data maintenance processes to ensure data created going forward can be better.

Downstream Effects … of Mass Corrections

I learned a lesson once that just because the data cleansing updates need to be done, there may be good reasons why they should not be done immediately and instead be held and done on an agreed date.

My example pertained to mass updates to several hundred item masters to get products reassigned to the ‘correct’ product line. The updates were carried out in mid-December to wrap up a data quality improvement mini-project for a business unit rearranging its product line structure. Luckily, I had a good relationship with the relevant Business Performance Controller because she tipped me off on December 23^rd that something looked very wrong in her numbers and that she was wondering how a whole product line could disappear overnight. It turned out that financial reporting for the product line was supposed to stay as-is until the close of the 4^th Quarter. While she had a workaround to move values from left to right in the reporting, it would be better if the sources were correct to avoid audit questions. All I can say is that it was busy Christmas Eve. I had to log in and manually revert the changes because there was no time to request access to the mass update tool.

My key learning was to ensure you get buy-in from key stakeholders when doing mass changes on master data. This advice applies not only to the content of the data (which in this case, had been done), but also to the timing of said updates. It might be perfectly reasonable (essential even) to make immediate updates to operational master data, e.g., weights, storage conditions, etc., but for finance-data where consistency is required within a given time period, there may be reasons why updates need to be scheduled.

Please note that this article is a streamlined version of an extended Linkedin post from March 10, 2022 where I wrote about these topics in the context of a book review of Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information™, 2nd Ed. (Elsevier/Academic Press, 2021) by Danette McGilvray

MenuMenu

What I Learned from Executing Data Quality Projects

More Carrot, Less Stick

Turn Data Quality Knowledge Into Action

WIIFT (What’s in It for Them)

Data Quality Specifications

Perfection is a Roadblock to Progress

How Can We Better Control Our Test Data?

Business Process Workarounds

Cost-Benefit Analysis

Much More Than “Sign Here and You Are Done”

CSI (Crime Scene Investigation): Data Quality

Data Not Yet Known

Credible Sources For Verifying Data

Downstream Effects … of Mass Corrections

David Finlay

MenuMenu

More Carrot, Less Stick

Turn Data Quality Knowledge Into Action

WIIFT (What’s in It for Them)

Data Quality Specifications

Perfection is a Roadblock to Progress

How Can We Better Control Our Test Data?

Business Process Workarounds

Cost-Benefit Analysis

Much More Than “Sign Here and You Are Done”

CSI (Crime Scene Investigation): Data Quality

Data Not Yet Known

Credible Sources For Verifying Data

Downstream Effects … of Mass Corrections

Share this post

David Finlay