Thomas Redman, a leading data quality consultant (www.dataqualitysolutions.com) believes “For a typical organization, the cost of poor data
quality is about 20% of revenue. If we can free up that 20%, we can create an economic boom that will make the 90’s seem like a depression.” Although some may dispute that statement, many agree
that poor quality data can lower customer satisfaction, increase employee frustration, raise the cost of doing business, and decrease management’s decision making ability – all by not
having the right data in the right format at the right time. To stem the flow of disappearing revenue, many organizations are working to improve the quality of their data. These improvement efforts
can take several forms.
Error Detection & Correction – Techniques such as comparisons against another data source, data edit checks, and duplicate
record removal can be used to clean up stored data. While these efforts may yield more accurate, complete, and consistent data in the short term, they seldom result in lasting improvement in
databases that are experiencing rapid inserting and updating of records.
Process Control & Improvement – Quality control and process management are not just for manufacturing. Many organizations find
that root cause analysis, statistical process control, and quality management principles can be successfully applied to existing information systems for lasting data improvement at a reasonable
long term cost.
Process Design – When it is time to reengineer a process or information system, this is often the ideal time for an organization to
review their information needs and to make the new system as error proof as possible. This can be accomplished by redesigning data collection processes to include built-in data edit checks, data
quality performance measures, and automated transcription, data entry and format change activities that minimize the number of data handoffs. System designs that promote data quality will help
organizations realize significant long term improvement at a low long term cost.
To aid them in their data quality improvement efforts, some organizations are creating a new job, the Data Quality Analyst (DQA). To get the most value out of this new position, organizations face
three challenges: (1) how to define the duties of the DQA, (2) what skills and knowledge should the DQA possess, and (3) where to place the DQA position in the organization’s hierarchy?
What are the responsibilities of the DQA?
Inmon, Welch, and Glassey (1997) described the role of the DQA in their book, Managing the Data Warehouse. They envisioned the DQA as an overseer of the data quality of the data warehouse
with these duties:
- Review data loaded into the data warehouse for accuracy.
- Recommend maintenance enhancements to data acquisition processes to improve accuracy of data warehouse data.
- Make recommendations to operational support for enhancements to systems of record to improve accuracy of operation data.
- Review referential integrity of data warehouse data.
- Review historical integrity of data warehouse data.
Expanding upon this early description of the DQA, organizations have found new uses for this position that go beyond the data warehouse environment to any organizational process where data quality
is an issue. Similar to the role of the quality assurance function for information systems, organizations are looking to the DQA to assist them in the following activities.
- Monitor compliance of information flows and data stores against data quality standards.
- Identify areas for data quality improvement and help to resolve data quality problems through the appropriate choice of error detection and correction, process control and improvement, or
process design strategies.
- Measure and report to management on the progress of data quality improvement. This may also include documenting the return on investment associated with improving data quality.
- Provide data quality training and consulting services to others in the organization.
What skills and knowledge should the DQA possess?
Chung, Fisher, and Wang (2002) examined this question in their 2002 study entitled “What Skills Matter in Data Quality?” Their research advises that a data quality professional needs
training in three areas.
Technical – This category represents a person’s ability to work with computer systems. In a survey of 26 Internet posted job
ads for the DQA position, Pierce (2003) found the most commonly requested technical skills were good office software skills (e.g. Excel, Access, Word), knowledge of relational databases, and
familiarity with statistical packages and analysis techniques.
Adaptive – This category represents a person’s ability to effectively interact with data users, managers, and other
stakeholders. Pierce (2003) found that the most requested adaptive skills in job ads for the DQA position were verbal and writing skills, project and time management skills, and the ability to
work in teams. Chung, Fisher, and Wang (2002) also recommend that data quality professionals have a good working knowledge of data quality measurements, total quality management, data entry
improvement, and user requirements in order to facilitate interaction with other individuals interested in improving data quality in their organization.
Interpretive – This category refers to a person’s ability to identify and describe the complex interplay between technology
and organizational structure. Chung, Fisher, and Wang (2002) cite the ability to manage the change process, to understand implications of data quality, to measure the cost and benefits of data
quality, and to detect and correct errors in databases as examples of interpretive skills, While Pierce (2003) found that no job ads specifically requested these skills, over half did ask that
DQA applicants have good problem solving and analytical skills.
Although these studies into data quality skills provide some interesting insights, more research is needed to see if individuals with specialized training in data quality perform their job
responsibilities better than individuals who do not. More research is also needed to identify when is the best time to include data quality training in a person’s career and what the content
and format of that training should be.
Where should the DQA function be placed in the organization?
Proper placement of the DQA function is critical to its success. In order to establish and enforce data quality standards, the data quality assurance area must be placed high enough in the
organization hierarchy so that it has sufficient influence and independence to accomplish its mission.
At the same time; however, the very interdisciplinary nature of data requires that the DQA function be flexible enough to interact in cross functional and quality improvement teams at every level
of the organization. Companies wrestling with where to place their DQA personnel may want to follow Womack and Jones (2000)’s conception of a “lean enterprise”. In their vision,
any functional area within an organization should serve two major roles: (1) to serve as a school summarizing current knowledge, acquiring new knowledge, and training others in that
function’s area of expertise and (2) to develop guidelines and best practices for how to best interact with other functions in the organizations.
Under this vision, a newly hired DQA may begin his career by first working in the Data Quality Function, learning the basics of data quality management. He then transfers to a cross functional team
responsible for providing some products and/or services to a set of customers. While on this assignment, the DQA concentrates on sharing his data quality expertise with others on the team to
improve the quality of the information needed to provide customers with the cheapest, fastest, and best products and/or services. Periodically the DQA will return full time to the Data Quality
Functional Area to retool and upgrade his skill set before returning to another project. This pattern of alternating between the Data Quality Functional Area and cross functional teams continues
throughout the career of the DQA. Although this new style of organization requires substantial changes in how careers and functional areas within a company are staffed, Womack and Jones believe
this is a model whose time has come.
Given the “newness” of the DQA position, it is certain to continue to evolve over time. It is hoped that a better understanding of the DQA’s job responsibilities, skills, and
placement within the organization will lead to better education and utilization of persons interested in pursuing a career in data quality. As a final note, individuals interested in sharing their
thoughts and experiences with the DQA or similar positions within their organizations are encouraged to contact the author, Elizabeth Pierce (firstname.lastname@example.org).
Chung, W.; Fisher, C.; and Wang, R. What Skills Matter in Data Quality. Proceedings of 7th Conference on Information Quality. Massachusetts Institute of
Technology, (2002), 331-342.
Inmon, W.; Welch, J.D.; and Glassey, K.L. Managing the Data Warehouse. John Wiley and Sons (1997).
Pierce, E. Pursuing a Career in Information Quality: The Job of the Data Quality Analyst. Proceedings of 8th Conference on Information Quality. Massachusetts Institute of Technology, (2003),
Redman, T. www.dataqualitysolutions.com/Data%20Doc%20-%20COPDQ.htm. Accessed on February 12, 2004.
Womack. J.P. and Jones, D.T. From Lean Production to the Lean Enterprise. Harvard Business Review on Managing the Value Chain. Harvard Business Scholl Press (April 2000), 221-250.
Weber, R. Information Systems Control and Audit. Prentice Hall (1999).