It’s all about communication. Everyone talks about collecting, storing, and analyzing data but how do you make use of this data if you cannot understand it? The primary problem is the meaning of data within a certain context where it is used. Suppose you need to make some business decisions based on the number of customers you have. How do you count customers? Is a customer a person, a company, or both? Are there retail customers and wholesale customers? Do all lines of business look at customers the same way? Miscommunication about the definition of customer can lead to poor business decisions.
A business glossary is business metadata that adds semantic context to data. Business terms are data elements defined with an explicit meaning. Business rules document the criterion for how data is used, or decisions that are made during daily business activity.
Building a Business Glossary
Once you realize the benefit of having a business glossary, the next step is to create the content. Business glossary metadata can come from a variety of sources, both technical and non-technical.
At this point it is a good idea to differentiate between a business glossary and a data dictionary. A business glossary is focused on business meaning for business people. Any kind of description for a business data element would be useful in the discovery process to build a business term definition. A data dictionary is focused on the technical specifications for the storage of data. Data dictionaries are useful for technology people to manage data.
People are not very good at documenting what they know. There is a plethora of information stored in the heads of people who work with data. This is what is known as tacit or informal knowledge. Conduct an interview to extract as much informal business knowledge as possible, then formalize it in the business glossary. In addition, these people are good candidates for stewards, which we will discuss later.
Documents that are used to conduct regular business activities contain business terms. Agreements and Contracts contain terms like ‘effective date’. You may also discover synonyms for terms. For instance, ‘effective date’ and ‘start date’ may mean the same thing but are represented as different terms.
The opposite problem involves homographs or homonyms. One word has multiple meanings. This is where context is important. One part of the organization may define Inventory as ‘An itemized catalog or list of tangible goods’. A different part of the organization may define Inventory as ‘The value of materials and goods held by an organization to support production’. In one context Inventory is finished goods. In the other context Inventory is raw materials to create finished goods.
The technology assets used by the organization are good sources for business metadata. The important thing is to separate the data dictionary metadata from the business metadata. For example, CUST is the abbreviation for Customer and the relationship between a Customer and a Product is that a single customer may place one or more orders.
This kind of information can contribute to the building of terms and rules. We are less concerned with the fact that a customer name is stored as 64 characters in the CUST table.
Enterprise Resource Planning (ERP) systems integrate data, process, business rules, and other pertinent information into a unified software application. Unfortunately, this information is buried deep in the mechanics of the system and requires some documentation from the vendor to understand.
There are software products that will allow you to examine the catalog of these systems to produce a data model showing the business names for the cryptic column names as well as definitions.
The column headings of reports contain useful business metadata. Formulas used to compute the values in the columns may contain definitions or explanations. Database catalogs are full of useful information. Reverse engineering a database catalog into a data model can show comments about tables and columns, references between tables can document important business rules, and allowable values.
The Extract, Transform, and Load (ETL) environment depicts the movement of data and its transformation while it is being moved. Business rules may be embedded in the transformation logic. Logical data models are one of the best sources for building a business glossary. Logical data models are the blueprints for the creation and storage of new data. Often, entities and attributes will have descriptions that can be used as the basis for a business glossary.
Business Glossary Structure
Once you have some potential business terms for the glossary, you will need to organize the business glossary in a manner that makes finding things easier. This starts with classifying terms into a hierarchical structure. The structure begins with a broad term and more refined terms are added as sub-terms. For example, an ‘agreement’ is a general term that can be used to classify other, more specific terms like ‘contract’ and ‘purchase order’. Another way of structuring the business glossary is to use areas of interest or subject areas as the starting point. For example, you could break down the organization into People and Organizations, Products, Shipments, and other related areas.
There are some general things to consider when building your classification scheme. Every term will not fit neatly into a single classification. There may be instances where a term may be in multiple places. A simple structure that does not go too deep is more useful. There is no set nesting depth but after about 5 levels it gets tedious. Avoid technical jargon or complexity. People and Organizations are easier to understand than Party Relationships.
Should there be one business glossary or multiple business glossaries? There is no right answer to this question. Sometimes an agreement cannot be reached on a single meaning for a term due to a specialized industry vocabulary. A single business glossary is easier to manage but does require an enterprise wide standardization on the name and meaning of all terms. Multiple glossaries are useful in certain industries like healthcare and insurance but requires careful management to make sure they remain separate. The important thing to know is who the end users are and what information they need.
Managing the Business Glossary
A business glossary without oversight leads to confusion and misunderstanding, which is what we are trying to avoid. One of the key components to a useful business glossary is governance.
There must be a clearly defined process for the submission and approval of business terms and business rules. Stewards must be responsible for the definition, purpose, and use of a business glossary term or rule. This is where the people with the most tacit knowledge about a term or subject area become very important. A proper governance process leads to a level of trust among the end users.
Another thing that governance provides is quality measurement. A term with a poorly formed definition is as bad as no definition at all. A standard that documents what constitutes a quality definition is the first step in establishing measurable quality. Some examples include:
- Definition must be stated in the present tense
- Definition must be stated in a descriptive phrase or sentence
- Definition should avoid acronyms and abbreviations
- Definition must not contain the words used in the term (tautology)
By having an agreed-upon standard, stewardship becomes more consistent. Consistency is the basis for measuring progress. We can measure progress by the reduction in misunderstandings, the number of users accessing the business glossary, the reduction of synonym terms, and other metrics.
Summary
Poor communication can have dire consequences ranging from loss of money to loss of life. Establishing a Business Glossary may prevent potential catastrophes. Business metadata can be found in the people who use data and the technology that captures it. Best practices for defining terms and how they are classified fosters trust and findability.