An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application, and a bad data model can lead
to a bad application. Therefore, we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into
what I call the Data Model Scorecard™.
The Scorecard contains 10 categories:
- How well does the model capture the requirements?
- How complete is the model?
- How structurally sound is the model?
- How well does the model leverage generic structures?
- How well does the model follow naming standards?
- How well has the model been arranged for readability?
- How good are the definitions?
- How well has real world context been incorporated into the model?
- How consistent is the model with the enterprise?
- How well does the metadata match the data?
This is the tenth of a series of articles on the Data Model Scorecard™. The first article on the Scorecard summarized the 10
categories, the second article focused on the correctness category, the third article focused on the completeness category, the fourth article focused on the structure
category, the fifth article focused on the abstraction category, the sixth article focused on the standards category, the seventh article focused on the readability category, the eighth article focused on the definitions category, the ninth article focused on the real world
category, the tenth article focused on the consistency category, and this article focuses on the data category.
That is, How well does the metadata match the data? For more on the Scorecard, please refer to my book, Data Modeling Made Simple: A Practical Guide for Business & IT Professionals.
How Well Does the Metadata Match the Data?
This question ensures the model and the actual data that will be stored within the resulting tables are consistent with each other. This category determines how well the data elements and their
rules match reality. If a data element is called email address text, for example, will it actually contain the email address? Examining a handful to a few hundred sample values can help
ensure that the entity, data element, and relationship definitions are consistent with the data they will support.
Collecting real data can be difficult to do early in a project’s life cycle, but the earlier the better so you can avoid future surprises that can be much more costly. If data surprises are
not caught during the design phase, they will most likely be caught during system testing or user acceptance testing at a much higher cost of time and money.
Here are a few of the red flags I look for to validate this category:
-
Blatant errors. These are situations when the data is obviously completely different than what is implied by the data element’s name and definition. For example,
customer eye color text contains the customer’s shoe size. -
Linking data elements not linking. A linking data element is a data element that is a foreign key to an entity in a different subject area. For instance, item identifier
in order is a linking data element back to item. Student number in registration is a linking data element back to student. These are difficult to detect, but
one quickly appreciates how much frustration and wasted time can be caused by these linking data elements if they don’t link as expected. -
Length errors. For example, a data element 2,000 characters long in which at most only the first 10 characters are populated. Or worse yet, a data element five characters long
that appears to contain truncated data.
As a proactive measure to ensure the data matches the metadata, I have found using a standard template comparing the data to the metadata to be very effective. For example, I use a template called
the Data Quality Validation Template (refer to chapter 5 of The Data Modeler’s Workbench for more details including an example). A data to metadata comparison template can contain
metadata in one set of columns, the results of data checking in another set of columns, and validation by a business expert in the final set of columns. Validation by a business expert is essential
to ensuring there is a correct data to metadata match.