The first article of the series was titled “The Data Won’t Govern Itself”. The second article was titled “Data Governance Is NOT a Methodology”. As you can tell by the names of the previous two articles, I have an attitude when it comes to the ways some people define and approach the building of data governance programs. In this article I will settle down a little bit and share with you tools that I have used as part of the data governance trade. Future articles in this series will address the Data Governance Organization, the Data Governance Council, and Pro-Active and Reactive data governance processes.
Introduction
Companies or organizations in the process of defining or implementing Data Governance Programs are finding that there are Tools of Data Governance, often existing tools in your data environment or tools that you can develop yourself, that can improve your chances of deploying a successful Data Governance program. This article will briefly discuss each of the tools that I have found to be beneficial in implementing data governance programs. Several of these tools do not require purchase and they can be developed and delivered internally at a low cost and over a short period of time. The tools that will be discussed in this article include:
- The Data Steward Repository
- The Meta-Data Repository
- The Work-Flow Management Tool
- The Data Quality Tool
- The Data Quality Issue Log
- The Common Data Matrix
- The Data Governance Activity Matrix
The Data Steward Repository
- If you have read my previous articles on Data Governance, you may be familiar with my 3-D Approach© that I use to initiate the Data Stewardship approach to Data Governance. The 3-D Approach discusses 1) “Defacto”, 2) “Discipline” and 3) “Database” as core components of successful Data Stewardship. The third “D” is the Data Steward Repository.
- A Data Steward Repository is a database that holds information (stewardship meta-data) about the relationship between the data stewards of the organization and the specific data that they steward. Some companies start by documenting this information in a spreadsheet but quickly find that the spreadsheet limits update-capability and accessibility to this information.
- A Data Steward repository is typically built to enable a company to identify and communicate with data stewards quickly and effectively.
- This tool is also used by the data steward coordinators (within each Business Unit), the data domain stewards (for each cross-Business Unit data sets) and operational data stewards to make certain that they utilize the appropriate people (stewards) across Business Units at the appropriate time. Since the number of stewards in your organization can number in the hundreds (depending on the granularity to which stewards are identified), it will be impossible for any single person to know who all the stewards are.
- If it is uncertain whether your organization will build a repository tool to house data stewardship meta-data, you may consider housing the information relating the data stewards to the data in a workflow management tool or in the Common Data Matrix defined in the next few pages.
- The Common Data Matrix will also store this same information however your organization will be required to build multiple matrices to house information pertaining to each domain of data for each Business Unit. As long as your organization makes these matrices available on-line to interested parties, that should satisfy the Stewardship Meta-Data needs of the company.
The Meta-Data Repository
- The implementation of an Enterprise-Wide Meta-Data Repository tool can be a lengthy experience and a large investment/expenditure for large companies. Since this is true, many organizations look to tools that they already have in-house and work to find ways to manage their meta-data using tools that they already have purchased or that they presently license. Enterprise repositories are designed to manage both business-user meta-data and technical meta-data. Tools that are being re-purposed toward meta-data typically focus on either Business-User meta-data or Technical meta-data depending on the nature and purpose of the tool.
- Meta-Data requirements need to be clearly defined. Successful companies often embark on quick and effective Meta-Data Requirements gathering process to clearly understand the steps that are needed to satisfy the meta-data requirements for Data Migration, Data Governance & Stewardship, Meta-Data Management, Customer Data Integration, Master Data Management and Business Intelligence & Data Warehousing.
- In order for the Meta-Data Repository to demonstrate specific value to the initiatives mentioned in the previous bullet and to the organization, it will be necessary to implement Meta-Data Governance and identify Meta-Data Stewards as part of the process. The Meta-Data Governance process, similar to the Data Governance Program, will identify and engage the Meta-Data Stewards (Definers, Producers & Users) in the process of defining Meta-Data Standards, Collecting Quality Meta-Data, Validating Meta-Data and Distributing Meta-Data to the people that will value from its use.
The Work Flow Management Tool
- In order to assure consistent, efficient (timely) and effective communications and appropriate decision making (authorization and approval), it will be helpful to automate the work flow processes as much as possible. One way to provide this efficiency and effectiveness is to utilize a workflow management tool that allows people to enter issues/questions/requests and get the issues delivered to and addressed by the right people.
- A tool like this will be extremely helpful during both Pro-Active Data Governance processes (methodology-based) and Reactive Data Governance Processes (resolving known data quality issues). During data migration, a workflow management tool will provide the migration team with the ability to put the migration decision making into the people that have been designated with the authority to make the decisions. After the migration step, the workflow management tool will direct information about decisions about added/changed/retired data to the people authorized to make those decisions.
- Without a workflow management tool, the decision making information will be distributed, assessed, reviewed and tracked manually which will add time to the process, decrease the quality of the process, and make the ability to track decision making cumbersome and slow.
The Data Quality Tool
- The purpose of the Data Quality tool is to automate the comparison of a specific set of data to the data quality standards that have been defined for that data.
- The Data Quality standards include format, content and dependencies on/with other data, business and quality rules, whatever your organization decides to include in their data quality standards.
- The data quality tool will be used as part of the control and monitoring of the quality of data.
- In the developmental portion of a CDI-MDM action plan (for example), the data quality tool will be used to offer comparisons of master data that is to be included in an ERP system (example new customers, new materials) to highlight potential duplicates to the individuals that are responsible for the first analysis of this data. The results of the comparison in the data quality tool is then entered into the ERP package databases and distributed back to the appropriate legacy applications or the comparison is used on the front-line to determine that this data is a duplicate of data that already exists.
The Data Quality Issue Log
- The Data Quality Issue log is the tool that is used by the individuals in the organization to record problems or issues that they find with the data.
- The data quality issue log is reviewed regularly by the Data Steward Coordinators to determine the action that must be taken to resolve the issues.
- The data quality issue log must include places for the stewards to identify if the issue is an operational issue (just relevant to their Business Unit) or if the issue is a strategic problem that crosses Business Units (therefore associated with an “enterprise”-managed data domain).
- The issue log is also the place where data quality activity is held for reporting. Data quality reporting must include not only the total number of issues in the log, but also the number of issues that have been added to the log (and thus been added t the queue to be resolved), and the number of issues that have been resolved.
- Knowing that there are 10 issues in the log this week is less important than knowing that 6 new issues have been added (now in the queue to be resolved) and 7 issues have been resolved.
- Organizations should define in detail how the Data Quality Issue Log will be used to identify and record issues, review and assess the issues, initiate an issue resolution effort (sample listed below), and track and report the results of data quality issues.
The Common Data Matrix
- A Common Data Matrix consists of a two-dimensional chart that cross references the data of the organization to the Business Units that define, produce and use that data. This matrix enables companies to quickly see where the impact of changes to data will be reflected across the organization.
- A Common Data Matrix can be created to help the organization understand how similar or the same data from different domains are shared across business units.
- The Common Data Matrix should include Business Units & Specific Responsibilities across the Business Units on the top of the matrix, and the Domain (cross-Business Unit subject areas of data) and the Sub-Domains of the data along the left side of the matrix.
- Information should be added to the cross-reference boxes of the matrix where the specific stakeholders of the data (across the top) meet the specific data of the company (along the left side).
The Data Governance Activity Matrix
- A Data Governance Activity Matrix consists of a two-dimensional chart that cross references the data of the organization to the data governance activities of each of the roles & responsibilities.
- This matrix enables the company to quickly see where the impact of changes to data activities will be reflected across the organization. A Data Governance Activity Matrix should be created to help the organization understand how data activities are carried out for each business units.
- The Data Governance Activity Matrix should include Business Units & Specific Responsibilities across the Business Units on the top of the matrix, and the Data Activities such as Data Migration tasks, Data Quality tasks, Master Data tasks (already included in project activities) along the left side of the matrix.
- Information should be added to the cross-reference boxes of the matrix where the specific stakeholders of the data and the specific roles & responsibilities (across the top) meet the specific data activities of the company (along the left side).
SUMMARY
The Tools of Data Governance described in this article cover the basic requirements of most Data Governance programs. When built specifically for your environment, these tool will enable your organization to understand and manage many important aspects of your data including:
- Who “owns” what data-wise. OK … I do not like the term “own” as it implies that they can do with the data what they choose. The steward repository records information about who the data stewards are and what data they steward.
- What you company knows about your data. The meta-data repository can be an endless source of information about how the data of your organization is defined, produced and used.
- When do the people that need to get involved, need to get involved. The work flow management tool and the data governance activity matrix will play a vital role assuring that the “right” data stewards are engaged at the “right” time and enabling efficient response.
- How bad the data is. The data quality issue log is an on-going record of organizational data deficiencies that are managed and resolved through reactive data governance processes.
- How data is shared across the company. The common data matrix is used to graphically represent stakeholder interest and responsibility over the management of Business Unit and cross-Business Unit data.
Your comments on the types of tools you are using for your Data Governance program and how you are using them would make a great follow-up article. If you are using similar type tools, or you have created your own tools (and you don’t mind sharing them with us), or you have question on the use of the tools decsribed in this article, please send me an email and I will be glad to share them with the TDAN.com reader-base.