Everyone talks about the importance of data quality, but no one does anything about it. Not anything much, that is. Now, have I offended you? Are you thinking about thousands of lines of ETL scripts with quality rules? About the data quality tools your organization has purchased? About a proof of concept dashboard monitoring a few key applications?
Well, that may be the case, but many organizations have implemented data quality improvements at the project level. However, it is likely that your organization has failed to address data quality with an approach that has a solid chance of improving both the quality significantly and sustaining those improvements. More typical is ‘do what you’ve always done, get what you’ve always gotten’ – project sponsors and program managers are left to their own devices, and funding for quality improvements is dependent on discrete business lines, aimed at the applications they sponsor.
Taking a step back, I often refer to the ‘Big Three’ pillars of Data Management – Architecture, Governance, and Quality. An organization must have the ambition, will, and resources to succeed at all three. All three pillars are critical to achieving a flexible, well-organized, governed data layer that delivers trusted, timely, and accurate data for operational decisions, reporting, and predictive modeling. And success cannot be achieved without an enterprise-wide focus. Which implies, at a minimum, that:
- Data is understood to be a critical infrastructure asset
- Executives agree that data is vitally important
- Line of business staff accepts that they own the data that they create and manage
- The organization understands how data should support the business strategy.
Just as Diogenes wandered the earth with his lamp looking for an honest man, when I begin to work with an organization, I look for signs that it has synthesized a vision for data. Vision = “the ability to think about or plan the future with imagination or wisdom.”
As we know, most organizations suffer from a disorganized data layer that grew by ad hoc accretion, data store by data store, over the decades. For example, data warehouse repositories built in the 90s typically morph into Godzilla-sized monsters of redundant data with hundreds of incoming and outgoing interfaces, mostly undocumented. Staff data gurus often store tribal knowledge about what data means and where it is in their heads – when they retire, the organization scrambles, suddenly bereft of their oral tradition.
If an organization is lacking in vision, perceiving its data assets “through a glass darkly,” it is impossible to craft a clear and practical path to the future. The formula that works is: Imagination + Thinking + Planning + Wisdom. First, the organization must be confident that it can answer the question ‘What does good look like?’ Good, from the standpoint of the business strategy, might be ‘a seamless customer experience in navigating easily and logically through our product offerings.’ It might be ‘reduce the time to permitting from three months to two weeks.’ It might be ‘launch the new global product line in Europe.’ There are countless examples of business strategy priorities in different industries, and an organization may have many on its wish list. When data is added, answering the question ‘What data do we need to achieve (the Good)?’ then the key drivers for the organization’s data management strategy (DMS), addressing high-level governance, architecture, and quality objectives, can be articulated. For this column, we’re going to assume that the organization has established an overall data management strategy, and take the next step, focusing on data quality.
In the Data Management Maturity Model, a practical and precise capability and maturity measurement framework, the first of four Process Areas in the Data Quality category is Data Quality Strategy (DQS). We differentiate the DQS from the overall data management strategy and program because:
- It represents a further decomposition of the DMS, appropriate due to the vital importance of quality data
- It requires analysis of the current state of selected elements (subject areas and physical components) of the data layer
- It is highly dependent on analysis and issue identification from the lines of business
- It presumes operationalized data governance
A Data Quality Strategy captures business goals, objectives, data scope, roles, specific initiatives, and sustained activities to improve data integrity, accuracy, and trustworthiness. Its purpose is to establish and embed a data quality program, a commitment to a persistent, sustainable focus on data quality.
Let’s look at the connections and dependencies between the DQS and other data management processes. The diagram below illustrates the key relationships, and we’ll walk through them briefly.
- Data Management Strategy – the DQS is dependent on the DMS and the business strategy, which answer WHY the organization wants to improve data quality
- Data Management Function – the data management organization (DMO) is typically responsible for data quality policies, processes and standards
- Governance Management – governance participants develop quality rules and assess quality needs
- Metadata Management – quality rules are developed for business terms and data elements and should be linked to them in the metadata repository
- Data Quality Assessment – the lines of business own the data they create and manage, and assess the level of quality needed to conduct business processes and make business decisions, referenced in the DQS
- Data Profiling – the determination of which data sets to profile, the methods, toolsets, techniques and frequency of profiling level of quality needed to conduct business processes, are described in the DQS
- Data Cleansing – cleansing and improvement of data, root cause analysis, and cleansing of shared data at or near the source is described in the DQS
- Data Requirements – requirements for data needed to perform system functions are related to business terms, data elements, and quality rules
- Data Provider Management – quality requirements for data acquired internally and externally are captured in interface documentation, contracts, and Service Level Agreements
- Data Standards – data quality standards are referenced in the DQS, and reflected in policy and standards documents; data representation and design standards contribute to quality improvements
- Data Integration – data quality rules applied to data are essential for integrated repositories, views, and reports; often an organization will concentrate on a shared repository in the initial launch of data quality processes and standards, for instance, when implementing a data lake.
The Data Quality Strategy directly and indirectly underlies many important data management activities. It represents consensus overall guidance for what the organization plans to do, and the program it is establishing to build and maintain a ‘quality culture.’
I’m going to outline the process of creating a Data Quality Strategy at the enterprise level, across multiple lines of business. However, this approach can also easily be applied to a narrower scope, such as: shared repositories (e.g., data lake, data warehouse, master data store, etc.); or operational data stores owned by a business line.
The diagram below illustrates the three phases of creating and implementing a DQS: Analysis of the current state and major issues; Creation of the Strategy; and phased implementation according to a well-considered sequence plan.
Phase 1 – Data Quality Analysis
This task is to conduct a baseline analysis of the current state of data quality within the selected scope. You’re essentially looking for problems, and what would be better if the problems were fixed. The fact-gathering portion of this effort can be accomplished relatively quickly, since you will be interviewing stakeholders to discover:
Activities in Phase One – Data Quality Analysis
|1. Identify Participants||Determine who will lead the effort and who should be on the team to develop the Data Quality Analysis. Typically, this is the Data Management Organization, but it may also be initiated by Data Governance.|
|2. Identify Interviewees||Based on the scope of the effort (e.g., enterprise-wide, business line, shared repositories, etc.) and the number of data sets (major subject areas), create a list of business stakeholders, including data owners and senior business data experts. Then identify other selected stakeholders, such as IT program managers and senior data architects. Keep the number small enough to complete the interviews within two weeks, while ensuring that you include individuals with significant knowledge and span of control.|
|3. Create Questionnaire||Create a questionnaire with the following topics for discussion:
Most interviewees will be willing to entertain requests for additional detail, such as questioning the adequacy of the data set(s) against data quality dimensions. For instance, uniqueness, completeness, timeliness, etc.
|4. Conduct Interviews||A 45-60-minute interview is suggested – provide the questionnaire as you open the interview|
|5. Analyze results||Organize the interviews by business area / major program and subject area, then analyze:
Review your analysis as a team and draw conclusions:
|6. Verify problem description||Present the draft of the problem description portion of the Data Quality Analysis to the appropriate peer governance body for verification and additional business context, for example, other related quality issues, work that has been done to date, etc.|
|7. Verify goals, gaps and conclusions||Present the benefits and business alignment of suggested data quality improvements to the peer governance body for verification, enhancement, and context.|
|8. Summarize capabilities and recommend the DQS||In the Data Quality Analysis report, succinctly summarize:
Recommend that the organization develop a multi-year Data Quality
Strategy and a formal data quality program.
|9. Secure Executive Approval||Present results to the senior data governance body and gain approval to create the DQS.|
This task, once initiated, should take no more than 1-3 months, depending on scope (enterprise, line of business, shared repository, single data store) and the number of governance bodies you select for their input. The analysis report, in most organizations, will serve to highlight the problems currently being experienced, the manual effort needed, and the impacts of poor quality. I have found that undertaking the effort to conduct this analysis is clearly the best way to focus attention on data quality across multiple stakeholders and gain consensus on the need to implement improvements.
Phase 2 – the Data Quality Strategy
I’ve found that organizations with strong data management capabilities either already have an approved enterprise-level DQS or are planning to develop a DQS. This is not surprising— as ‘data awareness’ builds across the organization, more stakeholders are keenly focused on the business potential of trusted, accurate quality data, and aware of the contrast between current persistent problems and the desired future state.
Remember, the purpose of a consensus-driven, approved Data Quality Strategy is to encapsulate the organization’s plans to assure that data is fit for purpose, and meets future business needs. It outlines WHAT the organization is committing to do, HOW the organization plans to do it, and establishes a sustainable program to deliver these results. And of course, is the means to secure initial funding for at least the Year 1 initiatives.
Referencing the approved, factual current state analysis in Phase 1, the DQS describes:
- Vision (remember this?) and principles for data quality (e.g. business engagement, cleanse data at the source, monitor critical data stores, etc.)
- Goals and objectives of the data quality program
- Business benefits and positive impacts as results are achieved
- Description of program elements needed, including
- Data Quality Policy(ies)
- Defined data quality processes – (e.g., data profiling, data enhancements, business-driven quality assessments, data cleansing, etc.)
- Technologies – (e.g., tool selection, standard toolset designation, tool training, etc.)
- Staff (e.g., recommended starting resources, what organization will lead, skill sets, etc.)
- Governance – (e.g., which governance bodies will participate and role descriptions, what additional tasks will be required from governance participants, quality working groups, etc.)
- Compliance (e.g., how the organization will ensure that relevant programs and projects are following the implemented policies and processes)
- Information Technology – (e.g., the role of IT in selecting, training and maintain data quality toolsets, enhancements to the systems development life cycle to include quality processes, enhancements to data integration processes, etc.)
- Training – how the organization will train technologists, governance participants, data management staff, and business data experts in data quality processes, approaches, and techniques
- Metrics – how the organization will measure the success of its data quality program (e.g., data stores profiled, percentage decrease of defects, number of trained data stewards, etc.)
- Sequence Plan – the last section of the DQS is a multi-year plan for implementation and rollout
- Priorities for capability building, and the proposed order in which they will be addressed – (e.g., in Year 1, Q1 we will create a Data Quality Policy, and conduct a data quality audit of a critical data store, in Q2 we will complete selection and designation of a standardized toolset, etc.)
- Priorities for focus on subject areas / data sets, and the proposed order – (e.g. in Year 1, Q 1 and 2, we will profile, cleanse and make design enhancement suggestions for the five systems with Product data, etc.)
- Quality objectives for major data stores – (e.g., the data lake, a data warehouse, a master data hub, databases serving multiple business lines, etc.) and the proposed order – these objectives are the positive, affirmative version of the major issues and challenges identified in the Phase 1 Analysis
- Formal organizational structure(s) engaged in the quality program, and the proposed order for phase-in – (e.g., in Year 1, Q1 the DMO will hire a Data Quality Lead to support creation of standard quality processes and lead the Data Quality Pilot, etc.)
This may seem like a lot, but some organizations have accomplished approved strategies, and you can do it too. The sequence plan is not intended to be a detailed Gantt chart; a useful way to depict a high-level summary plan is a chevron diagram that represents program elements across a 3 to 5-year period:
- Capability enablement – proposed timeline of what processes, standards, technologies, guidelines, and training will be established and implemented
- Subject area focus – what subject areas will be addressed in what order (e.g., Customer, Product, etc.)
- Organizational enhancements – proposed staff by function
- Quality training – what training will be developed in what order
- Launch projects – initiatives that build capabilities in a proposed order.
The overall sequence plan chart should be accompanied by text description and additional decomposition, to clearly delineate WHAT the organization intends to accomplish and by WHEN. If appropriate, it may be useful to provide selected sequence plans for one or two major data stores, for example, the data lake. The diagram below presents a notional, simplified sequence plan as an example.
See, nothing to worry about, as long as you’ve engaged in thoughtful consideration of what needs to be done with the assistance and approval of data governance. Now that you’ve seen a sample sequence plan, let’s finish by outlining the activity steps to develop the Data Quality Strategy and get it approved.
Activities in Phase 2 – Data Quality Strategy
|1. Convene Data Strategy Working Group||Recruit an executive sponsor. Assuming that the Data Management Organization will lead the DQS effort, determine which key stakeholders / governance participants should be involved. It’s recommended to include at least one representative from each major business line.|
|2. Establish Vision||Based on the aspirations captured in interviews for the Data Quality Analysis, draft a data quality vision for the organization and outline the business priorities.|
|3. Verify vision and priorities||Present the draft vision and priorities to the executive sponsor and peer governance group review. Analyze and reflect input.|
|4. Evaluate capabilities and prioritize||Applying the DMM and the Phase 1 Analysis results, clearly describe the data quality capabilities that need to be developed.|
|5. Prioritize subject areas||Based on the business strategy, the data management strategy, and the Phase 1 analysis, create a draft priority list for capability implementation and rollout (e.g., Year 1, Customer; Year 2 Product, etc.)|
|6. Select technologies||Conduct a data quality tool selection effort or select among existing available toolsets. My experience has been that most leading vendor products offer similar robust features, so you may want to emphasize price, licensing, user interface, integration with your technology stack, and other supporting criteria.|
|7. Designate primary stakeholders||With the assistance of data governance, determine who your primary stakeholders are for the data scope you addressed. Typically, stakeholders include business line executives, data owners of major applications and repositories, and senior data stewards.|
|8. Establish DQS roles||Data quality requires many hands rowing in the same direction according to a plan (the DQS). The DMO typically takes the lead, and it may expand as the data quality program grows. For example, when the Data Quality Policy is adopted, the DMO may conduct compliance audits. One activity that the DMO should undertake is to identify sound data quality processes and standards currently in use by specific projects and programs. These are strengths to leverage in developing standard processes, standards, guidelines, and training.Data stewards and data quality working groups will be needed to validate data quality rules, set targets and thresholds, determine which data sets should be profiled, and validate metadata. Information Technology typically provides tool expertise and resources for complex queries. Business executives make major decisions about what quality improvements should be prioritized and funded.|
|9. Data Quality Training||Determine who needs training and what form the training should take. Some examples are: tool training for IT resources executing profiling tasks; training in data quality concepts and methods (e.g., applying dimensions, setting targets and thresholds, creating quality rules, etc.) for data stewards and business data experts; training for data owners and managers; training for executives. A mixture of instructor-led and computer-based training is often the most effective.Some advanced organizations require ‘data awareness’ training, including basic data quality concepts, as part of orientation for all employees who produce or use data. This is helpful to developing a ‘quality culture.’|
|10. Develop initial metrics||Develop a starter set of metrics for the organization to assess progress of the program as well as achievement of business benefits.|
|11. Conduct governance review of DQS||Conduct reviews of the DQS document sections and metrics with peer governance groups, soliciting feedback from their business lines and surfacing any confusion or disagreements about priorities. Revise the DQS accordingly.|
|12. Create DQS Sequence Plan and Review||Develop the DQS Sequence Plan with input from all relevant sources, and review with data governance. Analyze feedback about projects, timelines, and priorities, and seek resolution first through governance, then from your executive sponsor if needed.|
|13. Detail first-year projects (optional)||Provide initiative abstracts, high-level project plans, and projected costs for the Year 1 projects. These will be the baseline for project business cases linked to the data quality program; an estimated 3-5 projects would be a good start.|
|14. Secure Executive Approval||Present results to the senior data governance body and gain approval for the organization to adopt the DQS.|
The Data Quality Strategy represents a major achievement in an organization’s management of its data assets, formalizing the commitment to treat data as a permanent infrastructure asset. It will put the data management organization on the map; it will greatly increase the knowledge and effectiveness of data governance; it will work hand-in-glove with data architecture to optimize the data layer over time, it will lead to deeper business knowledge and better decisions, and ‘make the way straight’ for analytics insights. I hope that you can make use of this approach in your organization, and I’m eager to hear about your successes.
My previous TDAN.com column, “Boot-Strapping with Jet Packs: Accelerating Enterprise Data Quality” provides you with the activity steps to conduct a Data Quality Pilot project and use the results to inform your Data Quality Strategy. The pilot may be useful, as it can pave the way for approval to develop the DQS.