Published in TDAN.com July 2006
An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application and a bad data model can lead to a
bad application. Therefore we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into what
I call the Data Model Scorecard.
The Scorecard contains 10 categories:
- How well does the model capture the requirements?
- How complete is the model?
- How structurally sound is the model?
- How well does the model leverage generic structures?
- How well does the model follow naming standards?
- How well has the model been arranged for readability?
- How good are the definitions?
- How well has real world context been incorporated into the model?
- How consistent is the model with the enterprise?
- How well does the metadata match the data?
This is the second of a series of articles on the Data Model Scorecard. The first article on the Scorecard summarized the 10 categories, and each subsequent article will focus on a single category.
This article focuses on the first of the 10 categories, How well does the model capture the requirements? For more on the Scorecard, please refer to the book, Data Modeling Made Simple: A Practical
Guide for Business & IT Professionals. How well does the model capture the requirements?
This is the “correctness” category. That is, we need to understand the content of what is being modeled. This can be the most difficult of all 10 categories to grade, the reason being
that we really need to understand how the business works and what the business wants from their application. If we are modeling a sales data mart, for example, we need to understand both how the
invoicing process works in our company, as well as what reports and queries will be needed to answer key sales questions from the business.
What makes this category even more challenging is the possibility that perhaps the business requirements are not well-defined, or differ from verbal requirements, or keep changing usually with the
scope expanding instead of contracting. We need to ensure our model represents the data requirements, as the costs can be devastating if there is even a slight difference between what was required
and what was delivered. Besides not delivering what was expected is the potential that the IT/business relationship will suffer.
Here are a few of the red flags I look for to validate this first category. By red flag, I mean something that stands out as a violation of this category.
Modeling the wrong perspective. Usually one of my first questions when reviewing a data model is to identify why it is being produced in the first place. What are goals of the model and who is the
audience whose needs should be met with this model? For example, if there is a need for analysts to understand a business area such as manufacturing, a model capturing how an existing application
views the manufacturing area will not usually be acceptable. Although in reality it is likely both the manufacturing application and business processes will work very similar, they will be
differences at times and these differences can be large especially in the case of ERP packages.
Data elements with formats different from industry standards. For example, a five-character Social Security number or a six-character phone number. This is a red flag that can be identified without
much knowledge of the content of the model.
Incorrect cardinality. Assume the business rule is “We hire only applicants who completed a master’s degree.” Does the model in fig. 1 show this?
Fig. 1 Cardinality red flag
No. It shows that each applicant can obtain zero, one or many degrees. Yet the cardinality allows an applicant to have zero degrees, which violates our business rule. Also, degree includes all
possible types of degrees, with a master’s degree being just one of these. So if Bob has only a bachelor’s degree, that would not satisfy our business rule.
We will need to subtype to enforce the specific rule to master’s degree, as shown in fig. 2.
Fig. 2 Cardinality now matches business rule
In fig. 2, the subtyping symbol captures that each degree can be a masters degree. The relationship between applicant and masters degree captures that each applicant must have at least one
master’s degree, which supports our business rule.
As a proactive measure to improve the correctness of the data model, I have found the following techniques to be very helpful:
- Normalize. Normalization forces us to understand the rules behind what is actually being modeled. We need to ask many questions, and the more we know the more our model will accurately support
the business rules.
- Use abstraction when in doubt. If the requirements are not known or are incomplete, abstracting allows us to accommodate the unknown by using generic structures. For example, if we don’t know
and can’t confirm whether a customer has a main phone number, fax number, and mobile number; or whether they have a office number and home number, a simple abstract structure containing customer
phone and customer phone type would accommodate all situations.
- Understand similar situations. I was once able to leverage knowledge I had about the ordering process for a candy company while modeling how products were ordered for a health care provider.
The products were completely different yet the process and a majority of the information were the same.
- Require stakeholder signoff. Make sure your users are committed and supportive of the design by requiring their signoff upon model completion. They will take the review much more seriously and
be more supportive of the resulting design. If the analysts would prefer not to look at the model, a popular technique is to validate the model through other mediums such as reports, business
assertions, or prototypes.
Stay tuned! Our next article will focus on the completeness category.