|
A New Way of Thinking - July 2007
Data Quality, Data Control, and Data Quality Service Level Agreements
Published: July 10, 2007 There are always going to be data issues that require attention and remediation. This article looks at the protocols that need to be in place to determine data errors as early as possible in the
processing stream.
Business processes are implemented within application services and components, which in turn are broken down into individual processing stages. Communication between processing stages is typically performed via an information exchange, either explicitly such as the generation of an output file that is then used as the input for the next processing stage, or implicitly via persistent storage, such as through new records or updates posted to a database. Of course, the business processing stages expect that the data being exchanged is of high quality; and, in fact, our methodologies for application development essentially assume that the data is always appropriate. The issues occur later, when it becomes obvious that there is the potential for introducing flawed data into the system. Errors characterized as violations of expectations for completeness, accuracy, timeliness, consistency, and other dimensions of data quality often impede the ability of a processing stage to effectively complete its specific role in the business process. Data quality initiatives are intended to assess the potential for the introduction of data flaws, determine the root causes, and eliminate the source of the introduction of flawed data. Yet even the most sophisticated data quality management activities do not prevent all data flaws. Consider the concept of data accuracy. While we can implement automated processes for validating that data values conform to format specifications, belong to defined data domains, or are consistent across columns within a single record, in the absence of an absolute “source of truth,” there is no way to automatically determine if a value is accurate. For example, employers are required to report their employee quarterly wages to a state workforce agency; but if the employer transposes two digits on an employee’s wage amount, the state workforce agency would not be able to determine the discrepancy without a state staff member actually calling the employer to verify the numbers. The upshot is that despite your efforts to ensure quality of data, there are always going to be data issues that require attention and remediation. The critical question now revolves around determining which protocols need to be in place to determine data errors as early as possible in the processing stream(s), whom to notify to address the issue, and whether the issue can be resolved appropriately within a “reasonable” amount of time. These protocols are composed of two aspects: controls, which are used to determine the issue, and service level agreements that specify the reasonable expectations for response and remediation. ControlsIn practice, every processing stage has embedded controls, either of the “data control” or “process control” variety. The objective of the control process is to ensure that any issue that might incur a significant business impact late in the processing stream is identified early in the processing stream. The effectiveness of a control process is demonstrated when:
Contrary to the intuitive data quality ideas around defect prevention, we hope that the control process discovers many issues since the goal is really to make sure that if there are any issues that would cause problems downstream, they can be captured very early upstream. Data Quality Control vs. Data ValidationData quality control differs from data validation in that validation is a process to review and measure conformance of data with a set of defined business rules, but control is an ongoing process to:
The value of a data quality control mechanism lies in establishing trust on behalf of downstream users that any issue that would have material impact would have been caught early enough to have been addressed and corrected, thereby preventing the occurrence of the material impact altogether. Trust and Data Quality Service Level AgreementsBy establishing both the ability to identify the issues and initiate a workflow to mitigate them, the control framework bolsters the ability to establish data quality service level agreements. Trust grows as the ability to catch an issue is pushed further and further upstream until the point of data acquisition or creation. By ensuring that “safety net,” the control process reduces the need for anxiety at the end of the business process on behalf of the downstream users that they need to also monitor for poor data quality. As long as the controls are transparent and auditable, those downstream users can be comfortable in trusting the resulting reports. The key component of establishing the control framework is a service level agreement (SLA). That data quality SLA should delineate a number of items:
Data Controls, Downstream Trust, and the Control FrameworkData controls evaluate the data being passed from one processing stage to another and ensure that the data conforms to quality expectations defined by the business users. Data controls can be expressed at different levels of granularity: the data element, the data record, the data set (table), or collection of data sets:
The expectation of downstream trust suggests a number of notions regarding environmental controls:
SummaryIn essence, data quality management and data governance programs, by necessity, must provide a means for both error prevention and error detection and remediation. Continued monitoring of conformance to data expectations only provides some support to the ability to keep the data aspect of business processes under control. The introduction of a service level agreement and certifying that the SLAs are being observed provides a higher level of trust at the end of the business process that any issues with the potential for significant business impact that might have appeared will have been caught and addressed early in the process. Go to Current Issue | Go to Issue Archive Recent articles by David Loshin
David Loshin - David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management
solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Enterprise Knowledge
Management – The Data Quality Approach (Morgan Kaufmann, 2001) and Business Intelligence – The Savvy
Manager's Guide and is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.
Editor's note: More David Loshin articles, resources, news and events are available in the Business Intelligence Network's David Loshin Channel. Be sure to visit today! |