I wrote an article a year ago about the Business Glossary Prime Directive for TDAN.com, which can be found here. Simply stated, the Business Glossary Prime Directive is “to eliminate semantic confusion across the enterprise.” There are many implications involved in achieving the elimination of semantic confusion. For Data Governance, it means that each business term has a unique name, a single definition, a single value set, a single set of business rules, single authoritative source, and accountable party identified.
For our Analytics program, it means that we have a single definition, single value set, single set of business and quality rules and single authoritative source for all of our business dimensions and facts. Thus, having data governance involved in governing our Analytics dimensions and facts is critical to the success of Analytics projects.
Over the last 25 years our DW/ BI & Analytics implementations have had a suggested high rate of failure. The failure rate has been measured as “not meeting original defined requirements or not meeting business needs.” There is a plethora of reasons for the perceived or actual root causes of these “failures.” From my 20+ years as a consultant implementing Analytics projects, I have found that a significant root cause of the “failure” is directly related to the lack of agreement in the definition of Analytics dimensions and facts. Oh, sure we all think we agree on what a customer is or how to compute customer-lifetime-value. But then why do we have conflicting numbers on our analytics reports for what is seemly the same metric? The answer is simple. We really don’t agree on the definitions of our dimensions and facts, nor the filtering (another dimension) that we use in differing scenarios. Yet, none of our differing definitions are wrong per say. They are just coming from different views. While the problem can be simply stated, the solutions are often very complex and that is why we struggle to achieve Analytics project success without Data Governance.
A recent survey of CFO’s noted that 78% of those CFOs want more accurate reporting, need to find data faster, need to reduce the cost of reporting activities, and need to create an effective environment to share information. The same CFOs noted that their Analytics platforms were too tactical, too technically focused, and did not address the business issues, data quality issues, or provide reporting consistency. I believe that Analytics success begins with a clear understanding and focus on using data governance practices to eliminate the confusion of our dimensions and facts. Essentially, let’s focus on the business issues and eliminate the semantic confusion around our terminology. Today we use Data Governance processes, people, and technology to complete a business glossary.
Your business glossary should focus on enabling all analytics processes and people to easily find, understand, and trust the data they should be using. And not the data they should not be using. Effective governance allows the right people to use the right data, for the right business purpose, at the right time, with the right technology. Most individuals have the need to perform data analysis at some level. And almost all of us have different backgrounds and experiences that we bring to our organization. It is the differing background that provides us with a differing understanding of the business concepts. Thus, we have a semantic difference in our understanding of the data without a strong business glossary to guide us. If the definition, value set, business rules, authoritative source, and usage limitations are not clear then we could use data incorrectly. We could make erroneous decisions that increase the risks in doing business. Without a good understanding of the data we could create very sexy, technically correct, but very business inaccurate reporting. That is what the CFOs noted above are addressing. So what is the process to provide great governance for our business glossary for BI/Analytics?
Most Data Governance teams that have tried the pure Top-Down approach for governing BI/Analytics have not been highly effective. I find that the effective and expedient approach is to begin with the data we have or want to have, both dimensions and facts, on a set of analytics reports. Let’s call this data our critical data elements, or CDEs. It is easier for our business and analytics teams to talk first about the CDEs that are needed on each report. You can use this approach with data analysts, business managers, and data scientists. The type of source for the data does not matter to the approach. The data can be from a Data Lake source, a Data Mart source, an application source, or even a spreadsheet source.
We look at the CDEs for determination of a scope of effort, which the scope of the iteration to implement governance. The CDEs at this point can be discussed as the data on the report, the organization and filtering for the report, the columns, the computations of each, and the summarizations of the report. I suggest we label the Data Governance processes as “governance as you need it.” You need to govern the CDE that will be included in a report or set of reports. This is a very practical approach that seems to resonate with business teams. Using this approach, you can control the scope of the governance project. Try to keep the scope to 50-75 CDEs. This should allow you to complete a “implementation” in 2-4 months.
You want relatively short implementation time frames to:
- Produce business value quickly
- Show progress in the Data Governance program
- Show progress while developing your governance processes and educating the business and technical staff on the processes and technology
- Establish well understood and trusted reports
- Reduce the conflicting reporting and political issues
- Consistent improvement in the elimination of semantic issues across the enterprise
Again, the data governance objectives are to leverage the business glossary to help data and reporting consumers to find, understand, and trust the data under governance.
OK, now you could say “well great Lowell, I have some CDEs but now what. How do we get to approved data assets under governance and certified Bi/Analytics reporting?” Glad you asked.The CDEs provide a scope for the Analytics and Data Governance teams, working in parallel, to complete an implementation. I’m going to focus on the activities of the Data Governance team activities but both teams must work together.
Once we have a list of CDEs, then the data governance team can execute a top-down governance effort similar to the following.
- Engage with the business stewardship resources to define and document the CDEs as business assets (define business assets).
- Each CDE should be defined as a business term in the business glossary. This is for both the analytics dimensions and facts, as well as the calculation or model components even if they are not persisted in a database.
- Abbreviations, business rules, quality rules, and quality thresholds should be documented.
- Roles such as data owner, accountable person, and business steward are defined.
- Any CDE that has security or privacy constraints should be tagged in the business glossary.
- Standards and associated policies should be defined as well.
- Engage with the technical stewardship resources of the CDE source databases/applications to define and document the physical data assets and IT assets (define the data assets).
- All CDE that are persisted in a database column will be documented as data assets.
- All physical characteristics, data values, rules and domains should be documented.
- Technical stewards, application owners, etc. should be defined.
- Engage the business and technical stewards to map the relationship of the data assets and columns on the reports to the business assets (map business and data assets).
- An analysis needs to be done on the business assets and data assets to ensure that all assets are mapped. And that all columns on the reports are defined as assets in the business glossary.
- We may find that we missed defining a business asset for each data asset.
- We may have report columns that are just calculations or components in a model and thus assets are mapped to the computation of the report column (such as percentages or averages).
- Technical stewards and accountable individuals should be defined.
- Document the data quality metrics for the data assets (determine data quality fit for purpose).
- Data quality metrics should be computed with the business rules established at the business asset level.
- Where one business asset is mapped to multiple data assets, data quality must be computed at each physical source. This will aid the stewards to determine the best authoritative source for reporting.
- Data quality fit for purpose should be discussed with the Analytics consumers to define the fit for purpose quality expectations needed for trust in data usage.
- Data Stewards and owners should establish processes to meet the fit for purpose quality.
- Engage with the Subject Matter Experts or Technical data stewards to define and document the data lineage and traceability of the data assets (support consumers understanding of trust).
- Import data integration metadata to help define the lineage and trace-ability.
- Define analytical reports in a Report Catalog (define critical reporting in a Catalog).
- This is on place where the Analytics development team and the Data Governance team have to coordinate.
- The Report Catalog is a responsibility of the report developers not the Data Governance team. I often put this responsibility against the business teams or BI/Analytics teams.
- Self-service reporting can leverage the Report Catalog and enhance the Catalog as well.
- Document all report elements or the columns in each report and map those to data assets or business assets (define report elements and traceability). This completes the mapping of business asset to data asset to report assets.
- Request the report developer to define the report element, rules, and any computations.
- Request the report developer to have business stewards, technical stewards, and report responsible party to approve the lineage of report element: first to report, to data asset, to business asset.
- Ensure that all mappings, assets, lineage, and traceability’s are documented in the business glossary.
- Request the Stewards and Analytics team to “certify” each report. Given that we know the full traceability of the data and all assets, we can consider the reports to be “certified”.
- Of course full testing and acceptance of the BI/Analytics reports have to be done as well. Any changes to the documented assets will have to be changed in the business glossary as well as the Analytics reporting application.
Wow, that was easy to put on paper. I’m sure I missed something in the details of the process but I wanted to give you a guide. However, it is not easy to organize, communicate, educate, and have the resources complete their activates in a timely manner. Yet, if you can get the governance efforts completed in coordination with the BI/Analytics project, then you should provide significant value and likely be considered much more successful than we have been in the past. Just don’t forget that this maturity of alignment is required before you can achieve the Prime Directive. It is OK, stay calm and allow your business glossary to prosper.