In this column I have discussed the significant concepts, processes, resources, and deliverables that can be maintained in a Business Glossary. But, I may have neglected to use the “M” word, “Metadata”, until now. Yet, everything maintained in the Business Glossary is metadata. I’ll double-down on that.
Everything in a Business Glossary is metadata. I did not actively attempt to ignore using the term Metadata until now, however the kind of information classified as Metadata is wide-ranging and confusing to those new to Data Management. Many technical resources shy away from using the term. When I mention the term to business people, they immediately start checking email on their phones. Well maybe, just maybe, I can shed some light on the subject of Metadata, specifically how we use Metadata in the function of Data Governance.
Metadata Management is one of the eleven Data Management functions identified in the DAMA-International Data Management Book of Knowledge (DMBoK) (Refer to www.dama.org if you are not familiar with the DMBoK or send me an email and I’ll point you in the right direction). Metadata is not a new concept. Most large enterprises have a technology team dedicated to the discipline of Metadata Management. Without reliable Metadata, organizations will be challenged to know what data they have, where the data resides, who is accountable for it, what the values in the data mean, what is the life-cycle of the data, what security and privacy is maintained for it, who used the data and for what business purposes, as well as what quality the data has. Without Metadata, an organization cannot manage data as an asset or even manage data at all. The function of Metadata Management has existed prior to the popularity of Data Governance and still exists separate from Data Governance. The DAMA-International Data Management Functions are identified on the picture below.
Figure 1: DAMA-I “Wheel” of Data Management Functions
For years I consulted organizations looking to create and implement an enterprise Metadata Strategy. The concept of a Metadata strategy was made popular in the mid 1990s’ (some will say the late 1980’s) as a component of an enterprise information or data warehousing strategy. At that time, the focus of the Metadata Strategy was very technically oriented; focused on getting technical metadata about the data movement and ETL processes into the data warehouse/data mart environment for communicating “data sources & lineage” to the BI/Analytics consumers. While there is a list of technical benefits for this and many consumers of the Metadata Strategy, communicating to BI/Analytics consumers made the efforts to manage Metadata economically and politically feasible. Over time the popularity of maintaining the Metadata has dwindled as “self-service” capabilities and technologies have grown. Data Governance has created a new wave (or slight tremor as it may be) and business recognition for a Metadata Strategy. Yet, Data Governance and the Business Glossary is just one of many use cases for your Metadata Strategy.
To add context to the enterprise Metadata Strategy, we should consider the 3 types of Metadata: business, technical, and operational. I suggest that it is easier to understand and discuss the sources of Metadata that can be in your strategy. An architectural picture is a critical deliverable of the Metadata Strategy. Some of the sources of an enterprise Metadata Strategy will include:
- Business Functional repositories (business functions, business units)
- Business Rule repositories (quality rules, processing rules)
- QA/Testing repositories (Testing/QA languages/scripts/databases)
- Operational Application repositories (Operations control processes/languages/scripts, User application security repositories)
- Logical Data Models (conceptual/logical models)
- Architectural Design Models (Data distribution, Application architecture integration)
- Governance Business Glossary
- Business Intelligence tools (user profiles, security, metrics)
- Configuration Management tools (Production Programs, Program modules Program code, scripts)
- Data Dictionary/Database Management catalogues (physical data structures, keys, Performance tuning scripts/languages, Backup/archiving, recovery scripts/programs)
- Data Integration and mapping tools (ETL programs, control processing, audit)
- Data quality tools (profiling, quality metrics & measures)
- Event Messaging tools (programs, schedules)
- Reference & Master Data repositories (business rules, data integration rules)
- Service/SOA registries
- Big Data tagging repositories
That’s a long list that often takes organizations many years to implement. A Metadata Strategy is a significant definition, design and implementation effort as you can see from the above list of potential Metadata sources. This is most often the responsibility of an enterprise Metadata management team or an Architectural Review Board. Every Data Governance team should leverage the existing Metadata Strategy, as well as the technology integrations that have been implemented supporting the Metadata Strategy. Effective “try to use what you have on the shelf so you don’t reinvent the wheel.”
Many Data Governance programs are staffed with new resources that may not recognize the criticality to have a Metadata Strategy defined. Yes, a Metadata Strategy should be a foundational deliverable for all Data Governance programs. I’ve seen programs get delayed and even cancelled because the technology team wants to ingest metadata from many sources as a first step for the Data Governance program. Yes, we need metadata from existing sources but only in the context of a project iteration and even that should be a small number of sources within the delivery of a specific use case. Bringing in metadata from, say, 2000 tables is the equivalent to attempting to boil the ocean. It’s too much without a focus on the value proposition to the business.
The objectives of Data Governance should include a metadata strategy to manage the assets that will be governed by the program. This does not include all the metadata in the enterprise, just the metadata associated with governing our data assets. And only starting with our critical data assets. Having a strategy that identifies and architecture, processes, people, and technology is critical to the longer term success of the Data Governance program. Let’s discuss what capabilities should be in the strategy and which should not be in it.
Many of the Metadata sources that I noted above will not be in scope for your Governance Metadata Strategy. An example of the metadata that will likely not be in scope is the SOA Registry, or QA/Testing repositories. So what are the types of Metadata that are likely to be considered for your Governance Metadata Strategy? You should consider the following list:
- Policies, regulatory & internal (in document repository like SharePoint)
- Governance Operating procedures (in document repository like SharePoint)
- Standards (in document repository like SharePoint)
- Business Procedures (in document repository like SharePoint)
- Business terms (in Excel sheets or document repository)
- Data Elements (in database catalogs)
- Data Lineage & tractability (in data integration/inter-operability repositories)
- Reports, reporting elements, models (in reporting technologies)
- Metrics, KPIs, calculations, models (in Excel sheets)
- Risk models (in Excel sheets)
While all of the above list likely should be in your Data Governance Metadata Strategy, it is recommended that you operationalize the strategy as you need the metadata to deliver on an iteration use case. Thus, we often start with the integration of Business Terms and Business Processes. Then look to integrate Data Elements and Data Lineage, etc. Thus, we establish the processes where each iteration is building upon the previous and the automation of asset capture and life-cycle management is achieved.
For those of you that want further information on metadata, you may research www.tdan.com, www.dama.org, www.gartner.com, or www.dataversity.net. A recent research paper was published by Gartner, the 2018 Gartner Magic Quadrant for Metadata Solutions. This paper can be downloaded from Gartner or downloaded free of charge from Collibra,
https://www.collibra.com/landing_page/2018-gartner-magic-quadrant-for-metadata-management-solutions/.
As always, it is OK, stay calm, and allow your business glossary to prosper.