The Data Modeling Addict – July 2006

Published in TDAN.com July 2006

An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application and a bad data model can lead to a
bad application. Therefore we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into what
I call the Data Model Scorecard.

The Scorecard contains 10 categories:

  1. How well does the model capture the requirements?
  2. How complete is the model?
  3. How structurally sound is the model?
  4. How well does the model leverage generic structures?
  5. How well does the model follow naming standards?
  6. How well has the model been arranged for readability?
  7. How good are the definitions?
  8. How well has real world context been incorporated into the model?
  9. How consistent is the model with the enterprise?
  10. How well does the metadata match the data?

This is the second of a series of articles on the Data Model Scorecard. The first article on the Scorecard summarized the 10 categories, and each subsequent article will focus on a single category.
This article focuses on the first of the 10 categories, How well does the model capture the requirements? For more on the Scorecard, please refer to the book, Data Modeling Made Simple: A Practical
Guide for Business & IT Professionals. How well does the model capture the requirements?

This is the “correctness” category. That is, we need to understand the content of what is being modeled. This can be the most difficult of all 10 categories to grade, the reason being
that we really need to understand how the business works and what the business wants from their application. If we are modeling a sales data mart, for example, we need to understand both how the
invoicing process works in our company, as well as what reports and queries will be needed to answer key sales questions from the business.

What makes this category even more challenging is the possibility that perhaps the business requirements are not well-defined, or differ from verbal requirements, or keep changing usually with the
scope expanding instead of contracting. We need to ensure our model represents the data requirements, as the costs can be devastating if there is even a slight difference between what was required
and what was delivered. Besides not delivering what was expected is the potential that the IT/business relationship will suffer.

Here are a few of the red flags I look for to validate this first category. By red flag, I mean something that stands out as a violation of this category.

Modeling the wrong perspective. Usually one of my first questions when reviewing a data model is to identify why it is being produced in the first place. What are goals of the model and who is the
audience whose needs should be met with this model? For example, if there is a need for analysts to understand a business area such as manufacturing, a model capturing how an existing application
views the manufacturing area will not usually be acceptable. Although in reality it is likely both the manufacturing application and business processes will work very similar, they will be
differences at times and these differences can be large especially in the case of ERP packages.

Data elements with formats different from industry standards. For example, a five-character Social Security number or a six-character phone number. This is a red flag that can be identified without
much knowledge of the content of the model.

Incorrect cardinality. Assume the business rule is “We hire only applicants who completed a master’s degree.” Does the model in fig. 1 show this?


Fig. 1 Cardinality red flag

No. It shows that each applicant can obtain zero, one or many degrees. Yet the cardinality allows an applicant to have zero degrees, which violates our business rule. Also, degree includes all
possible types of degrees, with a master’s degree being just one of these. So if Bob has only a bachelor’s degree, that would not satisfy our business rule.

We will need to subtype to enforce the specific rule to master’s degree, as shown in fig. 2.


Fig. 2 Cardinality now matches business rule

In fig. 2, the subtyping symbol captures that each degree can be a masters degree. The relationship between applicant and masters degree captures that each applicant must have at least one
master’s degree, which supports our business rule.

As a proactive measure to improve the correctness of the data model, I have found the following techniques to be very helpful:

  • Normalize. Normalization forces us to understand the rules behind what is actually being modeled. We need to ask many questions, and the more we know the more our model will accurately support
    the business rules.
  • Use abstraction when in doubt. If the requirements are not known or are incomplete, abstracting allows us to accommodate the unknown by using generic structures. For example, if we don’t know
    and can’t confirm whether a customer has a main phone number, fax number, and mobile number; or whether they have a office number and home number, a simple abstract structure containing customer
    phone and customer phone type would accommodate all situations.
  • Understand similar situations. I was once able to leverage knowledge I had about the ordering process for a candy company while modeling how products were ordered for a health care provider.
    The products were completely different yet the process and a majority of the information were the same.
  • Require stakeholder signoff. Make sure your users are committed and supportive of the design by requiring their signoff upon model completion. They will take the review much more seriously and
    be more supportive of the resulting design. If the analysts would prefer not to look at the model, a popular technique is to validate the model through other mediums such as reports, business
    assertions, or prototypes.

Stay tuned! Our next article will focus on the completeness category.

Share

submit to reddit

About Steve Hoberman

Steve Hoberman is a world-recognized innovator and thought-leader in the field of data modeling. He has worked as a business intelligence and data management practitioner and trainer since 1990.  Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!) and is a popular, frequent presenter at industry conferences, both nationally and internationally. Steve is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler’s Workbench and Data Modeling Made Simple. He is the founder of the Design Challenges group and inventor of the Data Model Scorecard™. Please visit his website www.stevehoberman.com to learn more about his training and consulting services, and to sign up for his Design Challenges! He can be reached at me@stevehoberman.com.

Top