TDAN.com is proud to introduce this new quarterly column from Lowell Fryman called Business Glossaries and Metadata. In this column Lowell will write about different aspects of business glossaries and their associated metadata.
The Business Glossary – The Heart of Data Governance
What is a Business Glossary (BG)?
We initially defined the term Business Glossary, or BG, almost a decade ago, yet today there is not a global industry standard understanding or description for a BG. This should not be a surprise, as each vendor defines the BG to fit the technology implementation that they have developed. For the purpose of this article, we define a BG to be a repository that can contain many items or content associated with informational content of Business Term. This repository must be easily managed and changed, as well as accessible by everyone in the organization (within certain limitations). A BG is more than a Glossary of Terms; more than just term names and definitions. The BG repository should contain business definitions, logical data descriptions, physical implementation descriptions, and much more. I have a list of the potential attributes and content of a BG below.
Why should we consider it the Heart of the Data Governance program?
The BG and associated semantic vocabulary is a critical deliverable from the Data Governance team. I recall someone once said “you can’t manage what you can’t define”. A critical goal then for the Data Governance team will be to identify the scope of responsibility and then define the critical data within that scope to manage a set of the organization’s data. For example, think of the term “Retail Customer”. The BG will contain the information about those critical terms so that they can be distributed throughout the organization and its critical business functions.
There are significant definitional and descriptive pieces of information that the Governance organization will have to identify and manage. For example the “Retail Customer” Governance team should identify the following:
- What is the definition of a Retail Customer?
- How is Retail Customer data different from any other Customer categories?
- What is the life-cycle and lineage of Retail Customer data?
- What are the controls in place throughout the data lineage to ensure data quality?
- What are the security, privacy, retention and compliance concerns for Retail Customer data?
- What are the internal metrics, how and where are the metrics created and maintained?
- What are the data values and business rules associated?
- What applications create, maintain, report, and archive Retail Customer data?
- How do we communicate the policies, standards, data and metrics about a Retail Customer and our activities to all individuals in the organization?
Not all of the BG information will be captured at one time. Much of the information needed will be disbursed across the organization or not captured at all, or be captured in many non-integrated technologies. We have found that it is commonplace for the critical data governance information to be very federated across many technologies and individuals in the organization. The size of the organization is not necessarily the issue; similar challenges exist in large international firms or small non-profits. Often, the vocabulary of the business is only known by word of mouth or “tribal knowledge”.
How do we use the BG?
I recently worked with a midsized non-profit and found the challenges of terminology very similar in nature to larger Fortune 50 firms. Different business units have very different business context and thus different definitions, metrics, and usages for what is often called by the same business term. Let’s use the term “constituent” in this example and see how well the example relates to your organization.
The term “constituent” may mean many different things depending on the context of the individual and whether or not a BG was in place to help. It is made more complex with the two core operational applications that have their own context and business rules for this term. Those applications are purchased and managed by outside vendors. Every organization has similar challenges, either through purchased applications or through M&A activities.
On the surface it seemed that those operational applications had a common context of a “constituent” since they synchronized “constituent profiles” three times per day. Not a chance. Application A managed all Internet profiles and gift donations while application B managed all off-line profiles and gift donations. The two applications actually did not synchronize all data of the same types. Business decisions had been made over the years to essentially not pay for both vendors to maintain the same grain of data and business processes. The perception of the business was that the context of a “constituent” was the same in each application. However, application A had 11 million “constituent” profiles and application B had 23 million “constituent” profiles. Thus, many governance and business intelligence questions such as which application was the system of record, which application had the correct business metrics, or which application could be used for financial reporting, were difficult to answer.
This organization could resolve the business terminology issue through good BG practices. They could find that a “constituent” could be defined as:
- Event Participant
- Self-donor
- Team Captain
- Patient’s Family member
- Patient
- Patient’s friend
- Individual Advocate
- High volume Donor
- Event Volunteer
- Sponsoring Organization
- National Donor Organization
Each of the above had unique business definitions, business rules, reporting requirements, financial and compliance concerns, and even business operations activities. However, each of those terms was physically implemented as one identifier in the database of both applications. In this case, a Team Captain can also be an Event Participant, a Self-Donor, a High Volume Donor, a Patient Friend or a Patient Family Member. Thus, one “constituent” in the database can be recognized as one to six different types of “constituents” to count and track. As you can see, the business terms definitions have a direct impact on the business rules, data quality and analytics. The BG should be used to define the differences in terminology and then be accessible to everyone in the organization for semantic clarity.
Who are the Users of the Glossary?
We have seen many implementations determined as failures due to a lack of forethought on who should be the Users of the BG. Many early BG implementations were focused on the IT users. We recommend that the potential users of the BG be everyone that works with data and almost everyone in the organization who is likely to use data as part of their job. They need to understand the context of that data, and how and how not to use that data.
Many studies over the last decade have identified that experienced middle managers and executives have made errors in decisions simply due to a misunderstanding in the context of the data presented to them. Thus, we are adamant that everyone in the organization is a likely user of the BG in specific use cases.
Attributes (the Metadata) of the Glossary
Data governance teams can be overwhelmed by the amount of attributes that can be captured in a BG. For example, one can consider the ultimate set of attributes for each Business Term in the BG to be:
- Term Definition
- Term Definition Update Date
- Term Definition Status
- Term Definition Status Date
- Term Definition Deprecated Date
- Term Definition Updated By User ID
- Term Name
- Term Name Update Date
- Term Name Status
- Term Name Status Date
- Term Name Deprecated Date
- Term Name Updated By User ID
- Term Definition Example
- Term Business and Usage Rules (how to and how not to use)
- Data Quality Expectations
- Accuracy rules
- Completeness rules
- Integrity rules
- Validity rules
- Consistency rules
- Timeliness rules
- Data Quality Measured
- Date Measured
- Completeness %
- Validity %
- Accuracy %
- Integrity %
- Consistency %
- Timeliness %
- Term Taxonomy Category, Sub-Category Name
- Term Acronym
- Term Abbreviation
- Related Term Name (1 or more)
- Replacement Term Name
- Managed by Business Unit
- Managed by Application
- Managed by Business Function
- Data Steward Name
- Data Steward Contact Phone, Email, Location
- Retention and Compliance Rules
- Privacy Classification
- Security Classification
- Mapping to IT Assets
- Application name
- Server Name
- Physical Column Name and Table Name (1 or more)
- Database Name/Scheme Name/View Name (1 or more)
- Report Name (1 or more)
- Report Column Name (1 or more)
- ETL Process Name (1 or more)
Does this mean that the Data Governance team will need to start by populating all of the above, about 50 attributes? Absolutely not. The team can start by defining the following 8-10 attributes for each business term.
- Term Definition, Name and Example
- Term Acronym or Abbreviation (optional)
- Term Security, Privacy and Compliance (usage) rules
- Data Steward Name and Contact information
- Business Rules
Hopefully I have made a case for why the BG is a critical component- the heart- of the data governance program, and you can understand why it is important to govern the business terminology to achieve the goals of the data governance mandate. In my next column, I’ll address the challenges and guidelines for creating business definitions and term naming.