An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application, and a bad data model can lead
to a bad application. Therefore, we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into
what I call the Data Model Scorecard.
The Scorecard contains 10 categories:
- How well does the model capture the requirements?
- How complete is the model?
- How structurally sound is the model?
- How well does the model leverage generic structures?
- How well does the model follow naming standards?
- How well has the model been arranged for readability?
- How good are the definitions?
- How well has real world context been incorporated into the model?
- How consistent is the model with the enterprise?
- How well does the metadata match the data?
This is the eighth of a series of articles on the Data Model Scorecard. The first article on the
Scorecard summarized the 10 categories, the second article focused on the correctness category, the third article focused on the completeness category, the fourth article focused on the structure category, the fifth article
focused on the abstraction category, the sixth article focused on the standards category, the seventh article focused on
the readability category, and this article focuses on the definitions category. That is, how good are the
definitions? For more on the Scorecard, please refer to my book, Data
Modeling Made Simple: A Practical Guide for Business & IT Professionals.
How Good are the Definitions?
Good definitions support the data model diagram and remove doubt about the contents of data elements and the relationships between entities. This category confirms all definitions have three
characteristics: clarity, completeness, and correctness.
Clarity means that a reader can understand the meaning of a term by reading the definition only once. A clear definition does not require the reader to decipher how each sentence should be
interpreted. The definition contains what the entity represents, and not what the entity contains or when the entity is used. A good way to make sure your definition is clear is to think about what
makes a definition unclear. We need to avoid restating the obvious and using obscure technical terminology and abbreviations in our definitions. Just to restate the obvious, restating the
obvious means that we are not providing any new information. We are merely describing something that already has been mentioned or that is easy to find elsewhere. Let’s say, for example,
that the definition of associate identifier is “associate identifier” or “the identifier for the associate.” Equally unclear is the use of synonyms, as in the
pseudo definition “the identifier for an employee.” As far as clarity is concerned, we also need to make sure our audience understands the terms in our definition. Using acronyms,
abbreviations, and industry jargon in definitions without explaining them can lead to confusion.
Completeness means that the definition is at the appropriate level of detail and that it includes all the necessary components, such as derivations, synonyms, exceptions, and examples. Having a
definition at the appropriate level of detail means that it is not too generic as to provide very little additional value, yet not so specific that it provides value only to an application or
department – or that it adds value only at a certain point in time.
Correctness means that the definition completely matches what the term means and is consistent with the rest of the business. An expert in the field would agree that the term matches the
definition. One of the difficulties with this category is that as we define broader terms that cross departments, such as product, customer, and employee, we tend to get more than
one accurate definition, depending on who is asked. A recruiting department, for example, may have a definition for employee that is correct but nonetheless different from the definition offered by
a benefits department. A good solution to this problem is to use subtypes on your model that contain each of the distinct states of an employee. Through the accurate definition of each subject,
every state is captured.
Here are a few of the violations I have found in the definitions category:
- Entity definitions that only describe what the entity contains, such as this definition for customer: “Customer contains last name, first name, and address.”
- Data elements definitions that provide no additional value and restate the obvious. For example, associate identifier’s definition is “The identifier for the
associate.” - Definitions that appear to be wrong or inconsistent with other information we know about the data element. For example, customer shoe size is defined as “The middle name of the
customer.” - Derived data elements missing an explanation of the derivation.
As a proactive measure to improve the definitions associated with the data model, make sure each definition is clear, complete, and correct. Ask yourself whether somebody from outside the
department or area being modeled can easily understand the data element or entity based on your definition.