An integral part of organizational data comprises of customer and employee data.
This data is used for important decision making related to improving sales, budget planning & allocation, resource utilization, etc.
At the same time, this data potentially contains sensitive customer and employee data. Therefore while using this data for decision making, it becomes imperative to:
– Protect data from a regulatory compliance standpoint (and)
– Ensure data adheres to highest standards of quality and accuracy.
In this article, we will list out and describe in detail all the data governance requirements that need to be addressed to ensure such reporting data is fully protected and maintained for optimal usage.
The Need for Data Governance
Data Governance provides a competitive advantage when it comes to effective utilization of data. It helps align unsorted information, hide confidential information, and structure/protect relevant information used for decision making. Achieving these objectives will be a step in the right direction to keep regulatory authorities at bay. It will also keep the executive leadership content with regards to the validity of the information.
Different data governance principles can be implemented to accomplish each of these tasks. Let us discuss in detail the different data governance principles that need to be adhered to, for a proper and functional data management.
Data Governance Principles
These principles will help determine the ownership of the Business Intelligence (BI) object or report (and) also address additional information such as information source, quality, validity, and protection around data including its lineage and relevance. Abiding by these principles will help enterprises gain a 360 degree view and control over the information used for critical decision making.
- Report (as an Asset) must be inventoried in the catalog and classified with appropriate steward/owner.
Every Report must be inventoried as part of an asset catalog with the appropriate ownership. It is advisable that the data owner/steward of the report is a subject matter expert (SME) of the report and its content.
- The report requirements including business need, usage and report fields along with any calculations/mapping to the source data assets need to be identified by the owner of the report and should be stored in the catalog.
Reported related information should be identified and recorded within the asset catalog. The information should include the business requirements of the report, the usage (Board of Directors reporting, Regulatory reporting, etc.) and the corresponding report fields. In addition, metrics to accurately identify the characteristics of the report should be included.
- The business/technical metadata and the critical data elements within the report should be documented, aligned and captured in the catalog.
The business metadata (Logical Data Dictionary) and technical metadata (Physical Data Dictionary) for all the critical data elements within the report should be recorded and mapped within the catalog so that the business users can identify the elements within the report for usage and decision making.
- The report lineage needs to be captured to the source information assets which need to be Authoritative Data Sources (ADS).
The critical data elements within the report can be captured from multiple sources of data assets. They also might have gone through transformations with multiple data hoops. The lineage of these elements should be traced back to the Source Asset within the organization. The source asset needs to be a reliable source and is ideally expected to be an Authoritative Data Source (ADS).
- The Source Information Asset used in the creation of the report must be inventoried in the catalog and should be assigned the appropriate asset owner/steward.
The source assets used across the lineage, as described in the previous requirement for the generation of the report, should also be inventoried in the asset catalog along with their ownership. The data owner/steward is preferred to be a subject matter expert for the given asset and its components.
- The business/technical metadata for the critical data elements within the source information asset should be documented, aligned and captured in the catalog.
The business metadata (Logical Data Dictionary) and technical metadata (Physical Data Dictionary) for all the critical data elements within the source assets of the report should be recorded and mapped within the catalog. This must be done so that the business users can identify the elements within the source assets and their transformations for report utilization.
- Rules must be defined for all six dimensions of Data Quality as applicable for the source information asset. All critical data elements within the asset must be monitored as approved by the owner of the asset. A Data Quality Index (DQI) must be calculated for the critical data elements (CDEs) within the asset in order to maintain the quality of the data used within the report. The DQI should be assessed against a threshold set up by the source information asset owner.
Data Quality rules need to be defined for all six dimensions i.e. accuracy, completeness, consistency, timeliness, validity and uniqueness. These rules need to be assigned and monitored for all information assets and their critical data elements as applicable by the respective stewards of the assets. There are different ways to monitor the application of these rules for the assets and their elements. The most common way is defining Data Quality Index for the CDEs and assessing it against the threshold of data quality as defined in the system by the governance team in accordance with the asset owner.
- Data Quality issues and exceptions within the report must be identified and traced back to the source information asset on a pronto basis. They must be resolved within a stipulated time frame.
The data quality issues and exceptions as identified by the rules established for the assets need to be identified to the source of the issue and resolved before the asset is utilized by the organization.
- If the source information asset is external to the system, a service level agreement and information sharing agreement must be in place to identify the metadata of the asset and the critical data elements shared externally to build the report.
The external source information asset should have a service level agreement and information sharing agreement to identify and validate the data from the external source. It should have the metadata, data quality rules and privacy controls for the critical data elements and sensitive data defined within the agreement.
- Appropriate security and privacy controls need to be assessed and managed for all sensitive data elements within the report and all source information assets.
Sensitive data elements need to be identified for all assets within the lineage of the report. The appropriate security and privacy controls need to be implemented, assessed and managed for these sensitive data elements as per the privacy regulations established within the system. Privacy regulations can be in form of General Data Privacy Regulation (GDPR), California Consumer Privacy Act (CCPA), etc. The privacy controls can be implemented in multiple ways including data masking, encryption, data shuffling, etc.
- Retention periods and legal holds need to be developed, maintained, reviewed and published for the report.
All assets within the organization need to follow retention rules and adhere to legal holds for statutory and regulatory purposes. The retention periods and legal holds as defined as per regulations need to be developed, recorded, maintained. and reviewed periodically. The assets can only be discarded from the system once the retention period and legal holds for the asset have expired.
These are the general data governance steps that need to be followed to ensure proper regulatory compliance, effective resource utilization and optimal utilization of assets (reports) within an organization. Even though they won’t solve the problem of decision making, they will provide the accurate and well-defined data that can be used for making the decisions by the business and technical stakeholders. The other benefits of adhering to these rules include well-defined data framework, ease for data management and compliance to privacy rules and state and federal regulatory reporting norms.