Every enterprise recognizes data governance is an important practice to follow as it unlocks the business value of data and helps identify risks. But often organizations take a step back when we talk about implementing data governance though it is obvious business sense. It is perceived as a challenging proposal as it involves defining policies and processes, and making someone responsible to ensure these P words are followed.
Undoubtedly, these are all key components of any data governance program. But often this top-down approach of data governance encounters obstacles of executive sponsorship, cultural barriers, resourcing and many others. So instead of that, if we follow a bottom-up approach of first identifying enterprise data assets that are critical for data governance and cataloging them, then that will enable the data governance program.
As they say: “A journey of a thousand miles begins with a single step.” Every big thing starts with taking little steps. In the data management landscape, there are various stages where different data assets get involved and in every stage it is crucial to manage them. So if we take small, tactical steps of collecting these data assets first at every stage then that will enable data governance to develop effectively.
Source systems are data feeding pipes for data warehousing to solve for any business problem. An enterprise can source data from disparate systems, so it is valuable to capture information about all the sources and how they are useful in enterprise. A source system document with below facts will substantiate the purpose.
- Name of the source system
- Type of the source system (transaction, CRM, Web Applications, etc.)
- What type of data is generated from this system? (Transaction data, marketing data, web logs, etc.)
- What type of data feeds are generated from this system? (Flat file, XML file, unstructured data, etc.)
- Does this source provide master data like customer or product?
- Who are the business owners and technology owners of the system?
- Who uses this source (Lines of business), and for what purpose? (Analytics, Operational Reporting, Regulation Reporting, etc.)
- How often the data is sourced from this system? (Frequency of data source)
As success of data governance lies in assuring that data is trustworthy and relevant, this document will help justify that you are sourcing the right data from the right systems.
In designing enterprise data, essential assets to maintain are the business glossary and the lineage document which add context to the data and depicts the data flow from source to database.
Business Glossary – A business glossary is a central place to manage the business terms throughout their lifecycle. Typically it should capture the details that can answer below questions.
- What is the name of Business Term?
- What does the business term mean?
- Any other similar term to this business term? (there could be different name for the business term so it is important to mention all of them)
- Who is the owner of the Business Term? (Person who defines the business term)
- Who is the data steward? (Person who is creating and maintaining the business term)
- Who is the approver of the business term?
- What are the business rules applicable for this term? (Business logic, data quality rules, etc.)
- Is there any reference data associated with this business term?
- Who uses this business term and for what purpose? (An imperative question to ask as the same term could mean different things for different people)
- Changed history and documentation (who changed what?)
The business glossary helps IT and business have the common mode of communication and collaboration thus creating a solid foundation for the data governance program.
Data Model lineage Document – A document to show how logical and physical entities and data elements are mapped in the system
- What is the name of the Logical entity?
- What source (actual source table/filename) is this generated from?
- What attributes contribute to this Logical entity?
- Which source column the attributes map to?
- What physical entity this is mapped to?
- What physical attribute it is mapped to?
- What best defines the entities and attributes?
- What are the domain values of the attributes?
Maintaining a data model lineage document is decisive as it gives complete traceability of how source system data is modeled into warehouse. It also helps in impact analysis of any proposed changes in the system.
As data gets integrated into the data warehouse, it is necessary to capture where data originated and how it transforms through the system. A data mapping comprised of the details below should give that visibility.
- How source table/files are mapped to database table/columns?
- Data types of source and target columns
- What data quality rules are applied to the source data?
- What transformation rules are applied from source to target?
- What loading strategy is used to load the target?
- Is there any reference data used in the integration process?
- Frequency of loading the target table from source
The Data Model lineage document and the data-mapping document become key assets for data governance as they provide the visual flow of how data is integrating into target system from various sources. This transparency of data flow helps meet compliance needs, which is one of the objectives of data governance.
This is a very critical layer in data management as all of the data collected so far in data warehouse gets meaningful insight through BI reporting. A report inventory document must be created with the following details to keep track of all the reports and their usage.
- What is the name of the report/dashboard?
- What is the usage of the report and which business case does the report solve?
- What type of report it is? (Operational, Transactional, Analytical)
- Who owns the report from the Business and Technology side?
- What are the dimensions and measures in the report and their definitions?
- What source (database, table, column) is used for these dimensions and measures?
- What business rules are applied on these measures?
- What is the frequency of report generation?
- Who is the target audience of the report?
- What is the SLA for report generation?
This document will help keep track of duplicate or unused reports and also ensures that you are delivering the right information through the right reports to the right people.
As you can see above, the information in each layer gets associated to next layer as the data is flowing through them. If you manage these data assets across each layer and connect the dots, then you get a clear visual map of data movement and transformation throughout the environment. And believe me; your individual teams across data modeling, integration, and reporting must be creating these documents in one form or the other as their traditional chore. They might also be using some industry standard tools to capture this information and maintain these data assets. So to adapt this “Chore” into “Practice,” organizations need to formalize policies, processes, and assign people to make sure these Ps are followed diligently across all layers. If we take these little steps, and follow the Ps, then data governance is not a complex undertaking, but an attainable mission.