The Data Modeling Addict – October 2008

An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application, and a bad data model can lead
to a bad application. Therefore, we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into
what I call the Data Model Scorecard™.

The Scorecard contains 10 categories:

  1. How well does the model capture the requirements?
  2. How complete is the model?
  3. How structurally sound is the model?
  4. How well does the model leverage generic structures?
  5. How well does the model follow naming standards?
  6. How well has the model been arranged for readability?
  7. How good are the definitions?
  8. How well has real world context been incorporated into the model? 
  9. How consistent is the model with the enterprise?
  10. How well does the metadata match the data?

This is the tenth of a series of articles on the Data Model Scorecard™. The first article on the Scorecard summarized the 10
categories, the second article focused on the correctness category, the third article focused on the completeness category, the fourth article focused on the structure
, the fifth article focused on the abstraction category, the sixth article focused on the standards category, the seventh article focused on the readability category, the eighth article focused on the definitions category, the ninth article focused on the real world
, the tenth article focused on the consistency category, and this article focuses on the data category.
That is, How well does the metadata match the data? For more on the Scorecard, please refer to my book, Data Modeling Made Simple: A Practical Guide for Business & IT Professionals.

How Well Does the Metadata Match the Data?

This question ensures the model and the actual data that will be stored within the resulting tables are consistent with each other. This category determines how well the data elements and their
rules match reality. If a data element is called email address text, for example, will it actually contain the email address? Examining a handful to a few hundred sample values can help
ensure that the entity, data element, and relationship definitions are consistent with the data they will support.

Collecting real data can be difficult to do early in a project’s life cycle, but the earlier the better so you can avoid future surprises that can be much more costly. If data surprises are
not caught during the design phase, they will most likely be caught during system testing or user acceptance testing at a much higher cost of time and money.

Here are a few of the red flags I look for to validate this category:

  • Blatant errors. These are situations when the data is obviously completely different than what is implied by the data element’s name and definition. For example,
    customer eye color text contains the customer’s shoe size.

  • Linking data elements not linking. A linking data element is a data element that is a foreign key to an entity in a different subject area. For instance, item identifier
    in order is a linking data element back to item. Student number in registration is a linking data element back to student. These are difficult to detect, but
    one quickly appreciates how much frustration and wasted time can be caused by these linking data elements if they don’t link as expected.

  • Length errors. For example, a data element 2,000 characters long in which at most only the first 10 characters are populated. Or worse yet, a data element five characters long
    that appears to contain truncated data.

As a proactive measure to ensure the data matches the metadata, I have found using a standard template comparing the data to the metadata to be very effective. For example, I use a template called
the Data Quality Validation Template (refer to chapter 5 of The Data Modeler’s Workbench for more details including an example). A data to metadata comparison template can contain
metadata in one set of columns, the results of data checking in another set of columns, and validation by a business expert in the final set of columns. Validation by a business expert is essential
to ensuring there is a correct data to metadata match.


submit to reddit

About Steve Hoberman

Steve Hoberman is a world-recognized innovator and thought-leader in the field of data modeling. He has worked as a business intelligence and data management practitioner and trainer since 1990.  Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!) and is a popular, frequent presenter at industry conferences, both nationally and internationally. Steve is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler’s Workbench and Data Modeling Made Simple. He is the founder of the Design Challenges group and inventor of the Data Model Scorecard™. Please visit his website to learn more about his training and consulting services, and to sign up for his Design Challenges! He can be reached at