The Data Modeling Addict – October 2006

Published in October 2006

An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application and a bad data model can lead to a
bad application. Therefore we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into what
I call the Data Model Scorecard.

The Scorecard contains 10 categories: 1. How well does the model capture the requirements? 2. How complete is the model? 3. How structurally sound is the model? 4. How well does the model leverage
generic structures? 5. How well does the model follow naming standards? 6. How well has the model been arranged for readability? 7. How good are the definitions? 8. How well has real world context
been incorporated into the model? 9. How consistent is the model with the enterprise? 10. How well does the metadata match the data?

This is the third of a series of articles on the Data Model Scorecard. The first article on the Scorecard summarized the 10 categories, the second article discussed the correctness category, and
this article focuses on the completeness category. That is, How complete is the model? For more on the Scorecard, please refer to the book, Data Modeling Made Simple: A Practical Guide for Business
& IT Professionals. How complete is the model?

This question ensures the scope of the model matches the scope of the requirements. You can validate the correctness category and this category in parallel, as the red flags and best practices are
very similar. For completeness, we need to make sure the scope of the project and model match, as well as ensuring all the necessary metadata on the model is populated.

If the scope of the model is greater than the requirements, we have a situation known as “scope creep.” This means that we are planning on delivering more than what was originally required. This
may not necessarily be a bad thing, as long as this additional scope has been factored into the project plan. If the model scope is less than the requirements, we will be leaving information out of
the resulting application, usually leading to an enhancement or “phase 2″ shortly after the application is in production.

Regarding metadata, there are certain types that tend to get overlooked when modeling, such as definitions, stewardship (defined by Anne Marie Smith as “responsibility without authority”), and
alternate keys. These types of metadata along with ones that are mandatory parts of our model such as data element name and format information need to be checked for completeness in this category.

Here are a few of the red flags I look for to validate this category. By red flag, I mean something that stands out as a violation of this category.

Information on reports or requirements specifications that do not appear in the model. Are they missing from the model because they are easily derivable in the application? Did we receive oral
commitment that they were not needed even though in the documentation it specified that they were needed? Were they simply forgotten from the model? Make sure everything is in writing and agreed to
by the business. This reduces surprises during development and implementation phases.

Missing or additional states. One of the more popular red flags for completeness concerns states. A state is a recognizable milestone in the life cycle of a business concept. For example, an order
goes through the following states: open, dropped, shipped, received, returned, and so on. It is important to be aware of the states that exist as it will help us make sure we have the right set. We
don’t want to bring in more than we need because it can cause security, data quality, and performance issues. Once I almost brought in research products into a sales BI application. This sensitive
state identifies products to be released to the market in the coming months. Likewise, we don’t want to bring in less than we need because we won’t make the users very happy. It is not uncommon
to leave a state off a model because of definition differences or lack of knowledge that the state even exists. Would a marketing BI solution need prospects or only current customers, for example?

Missing alternate keys when using surrogate keys. There needs to be something “real” to tie to so that the development team can code successfully add from change logic, and so that the business
can retrieve a specific entity instance using the data element keys they are accustomed to using. For example customer might be identified by the surrogate key customer identifier. Yet when a new
customer record is received in the system or when a business person wants to run a report on a specific customer, which data elements are being used to retrieve that customer? This is usually the
alternate key.

Missing definitions. This is an all too common red flag that is easy to detect and important to repair, especially for BI applications where the definitions can be critical for knowledge workers to
make informed decisions.

As a proactive measure to improve the completeness of the data model, I have found it helpful to apply the correctness category best practices plus:

  • Identify any ambiguously defined terms and clarify any areas of confusion. This is often around the topic of states.
  • Have all documented issues been addressed? Make sure there are not outstanding issues that would compromise the completeness of the model.
  • Use a metadata checklist. I believe in providing project teams with a complete checklist of the types of information you expect them to capture during the analysis and design phases. Such a
    list provides “no excuses” when you identify gaps. At one point I played a game called “metadata bingo,” in which we identified a superset of the types of metadata that were of interest to our
    department. The Data Modeler’s Workbench has an entire chapter dedicated to this checklist.

The completeness category will be difficult to grade without written signed off requirements and without a metadata checklist which contains the types of metadata that are required and those that
are optional.

Stay tuned! Our next article will focus on the structural soundness category.


submit to reddit

About Steve Hoberman

Steve Hoberman is a world-recognized innovator and thought-leader in the field of data modeling. He has worked as a business intelligence and data management practitioner and trainer since 1990.  Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!) and is a popular, frequent presenter at industry conferences, both nationally and internationally. Steve is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler’s Workbench and Data Modeling Made Simple. He is the founder of the Design Challenges group and inventor of the Data Model Scorecard™. Please visit his website to learn more about his training and consulting services, and to sign up for his Design Challenges! He can be reached at