Business processes are implemented within application services and components, which in turn are broken down into individual processing stages. Communication between processing stages is typically
performed via an information exchange, either explicitly such as the generation of an output file that is then used as the input for the next processing stage, or implicitly via persistent storage,
such as through new records or updates posted to a database. Of course, the business processing stages expect that the data being exchanged is of high quality; and, in fact, our methodologies for
application development essentially assume that the data is always appropriate.
The issues occur later, when it becomes obvious that there is the potential for introducing flawed data into the system. Errors characterized as violations of expectations for completeness,
accuracy, timeliness, consistency, and other dimensions of data quality often impede the ability of a processing stage to effectively complete its specific role in the business process. Data
quality initiatives are intended to assess the potential for the introduction of data flaws, determine the root causes, and eliminate the source of the introduction of flawed data.
Yet even the most sophisticated data quality management activities do not prevent all data flaws. Consider the concept of data accuracy. While we can implement automated processes for validating
that data values conform to format specifications, belong to defined data domains, or are consistent across columns within a single record, in the absence of an absolute “source of
truth,” there is no way to automatically determine if a value is accurate. For example, employers are required to report their employee quarterly wages to a state workforce agency;
but if the employer transposes two digits on an employee’s wage amount, the state workforce agency would not be able to determine the discrepancy without a state staff member actually calling
the employer to verify the numbers.
The upshot is that despite your efforts to ensure quality of data, there are always going to be data issues that require attention and remediation. The critical question now revolves around
determining which protocols need to be in place to determine data errors as early as possible in the processing stream(s), whom to notify to address the issue, and whether the issue can be resolved
appropriately within a “reasonable” amount of time. These protocols are composed of two aspects: controls, which are used to determine the issue, and service level agreements that
specify the reasonable expectations for response and remediation.
Controls
In practice, every processing stage has embedded controls, either of the “data control” or “process control” variety. The objective of the control process is to ensure that
any issue that might incur a significant business impact late in the processing stream is identified early in the processing stream. The effectiveness of a control process is demonstrated when:
- Control events occur when data failure events take place,
- The proper mitigation or remediation actions are performed,
- The corrective actions to correct the problem and eliminate its root cause are performed within a reasonable time frame, and
- A control event for the same issue is never triggered further downstream.
Contrary to the intuitive data quality ideas around defect prevention, we hope that the control process discovers many issues since the goal is really to make sure that if there are any issues that
would cause problems downstream, they can be captured very early upstream.
Data Quality Control vs. Data Validation
Data quality control differs from data validation in that validation is a process to review and measure conformance of data with a set of defined business rules, but control is an ongoing process
to:
- Reduce the number of errors to a reasonable and manageable level,
- Enable the identification of data flaws along with a protocol for (conditionally) halting the processing stream, and
- Instituting a mitigation or remediation of the root cause within an agreed-to time frame.
The value of a data quality control mechanism lies in establishing trust on behalf of downstream users that any issue that would have material impact would have been caught early enough to have
been addressed and corrected, thereby preventing the occurrence of the material impact altogether.
Trust and Data Quality Service Level Agreements
By establishing both the ability to identify the issues and initiate a workflow to mitigate them, the control framework bolsters the ability to establish data quality service level agreements.
Trust grows as the ability to catch an issue is pushed further and further upstream until the point of data acquisition or creation. By ensuring that “safety net,” the control process
reduces the need for anxiety at the end of the business process on behalf of the downstream users that they need to also monitor for poor data quality. As long as the controls are transparent and
auditable, those downstream users can be comfortable in trusting the resulting reports.
The key component of establishing the control framework is a service level agreement (SLA). That data quality SLA should delineate a number of items:
- The location in the processing stream that is covered by the SLA,
- The set of data elements covered by the agreement,
- The business impacts associated with potential flaws in the data elements,
- The data quality dimensions associated with each data element,
- Assertions regarding the expectations for quality for each data element for each of the identified dimensions,
- The methods for measuring conformance to those expectations (automated or manual),
- The acceptability threshold for each measurement,
- The individual to be notified in case the acceptability threshold is not met,
- The times for expected resolution or remediation of the issue,
- The escalation strategy when the resolution times are not met, and
- A process for logging issues, tracking progress in resolution, and measuring performance in meeting the service level agreement.
Data Controls, Downstream Trust, and the Control Framework
Data controls evaluate the data being passed from one processing stage to another and ensure that the data conforms to quality expectations defined by the business users. Data controls can be
expressed at different levels of granularity: the data element, the data record, the data set (table), or collection of data sets:
-
Data element level controls review the quality of the value in the context of its assignment to the element. This includes accuracy,
completeness, or reasonability tests of the value as it relates to the data element. -
Data record level controls examine the quality of the set of (element, value) pairs within the context of the record. This includes accuracy of the values,
completeness of the data elements in relation to each other, and reasonability/consistency of sets of data elements. -
Data set and data collection level controls focus on completeness of the data set, availability of data, and timeliness in its
delivery.
The expectation of downstream trust suggests a number of notions regarding environmental controls:
- Since all downstream results are essentially byproducts of data elements created and exchanged earlier in the processing, it would be reasonable to institute data element level controls as
close as possible to the location in the processing stream where the data element is created or modified. Data element controls should be placed close to data element value assignment or
modification. - The downstream processing activities are dependent on collections of data (records, tables) coming from multiple upstream activities, and may require synchronization prior to initiation.
Therefore, it is reasonable that where a processing stream requires data handed off from more than one upstream process, the controls are likely to be at the data table or data set level of
granularity. Data set level controls should be placed at each point of information exchange. - When a data element is not expected to change during a process, if the value is validated before the process, there is effectively no reason to validate its value after the processing stage.
The only reason would be to ensure that the process is not modifying the data element. However, that can be done by reviewing the process to make sure that there is no place where the data element
is modified. Rather than instituting a continual assessment of relatively static data, a process review can reduce the complexity of ensuring data control. Review of a process to ensure
that it does not impact a data element’s value is a valuable process control.
Summary
In essence, data quality management and data governance programs, by necessity, must provide a means for both error prevention and error detection and remediation. Continued monitoring of
conformance to data expectations only provides some support to the ability to keep the data aspect of business processes under control. The introduction of a service level agreement and certifying
that the SLAs are being observed provides a higher level of trust at the end of the business process that any issues with the potential for significant business impact that might have
appeared will have been caught and addressed early in the process.