The Data Modeling Addict – April 2007

Published in April 2007

An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application and a bad data model can lead to a
bad application. Therefore, we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into what
I call the Data Model Scorecard.

The Scorecard contains 10 categories:

  1. How well does the model capture the requirements?
  2. How complete is the model?
  3. How structurally sound is the model?
  4. How well does the model leverage generic structures?
  5. How well does the model follow naming standards?
  6. How well has the model been arranged for readability?
  7. How good are the definitions?
  8. How well has real-world context been incorporated into the model?
  9. How consistent is the model with the enterprise?
  10. How well does the metadata match the data?

This is the fifth of a series of articles on the Data Model Scorecard. The first article on the Scorecard summarized the 10 categories, the second article focused on the correctness category, the
third article focused on the completeness category, the fourth article focused on the structure category, and this article focuses on the abstraction category. That is, How well does the model
leverage generic structures? For more on the Scorecard, please refer to my book, Data Modeling Made Simple: A Practical Guide for Business & IT Professionals. How well does the model leverage
generic structures?

This category gauges the use of generic data element, entity and relationship structures. One of the most powerful tools data modelers have at their disposal is abstraction — the ability to
increase the types of information a design can accommodate using generic concepts. Going from Customer Location to a more generic Location, for example, allows the design to more easily handle
other types of locations, such as warehouses and distribution centers. This category ensures the correct level of abstraction is applied on the model.

In applying this category to a model, I look for structures that appear to be under-abstracted or over-abstracted:

Under-abstracting. If a data model contains structures that appear to be similar in nature (i.e., similar types of things), I would question whether abstraction would be appropriate. Factored into
this equation is the type of application we are building. A data mart, for example, would rarely contain abstract structures while a data warehouse, which requires flexibility and longevity, would
be a good candidate for abstraction.

See Figure 1 for an example of under-abstracting. If this structure is part of a data warehouse model that requires longevity in the face of ever-changing requirements, we would question whether
the customer’s phone numbers should have been abstracted. Removing the phone number data elements and creating a separate Customer Phone structure where phone numbers are stored as values instead
of elements will provide more flexibility.

Figure 1 — Possibly under-abstracting

Over-abstracting. Likewise, if I see too much abstraction on a model, I would question whether the flexibility abstraction can bring is worth the loss of business information on the model and the
additional cost of time and money to implement such a structure. Writing the scripts to load data into an abstract structure or extract data out of an abstract structure is no easy task. In fact, a
complete generalization, but I have found that modelers who were former developers tend to be the shrewdest abstracters because they understand the cost.

See Figure 2 for an example of over-abstracting. The purpose of this model was limited to obtaining a detailed understanding of Customer. Specifically, the business sponsor summarizes their
requirement as, “We need to get our arms around Customer. Our company has Customer maintained in multiple places with multiple definitions. We need a picture which captures a single agreed-upon
view of customer.”

Figure 2 — Definitely over-abstracting

A Party can be a person or organization, and that person or organization can play many roles. One of these roles is Customer. Although the final Customer model might contain such an abstract
structure, jumping straight to Party and Party Role before understanding Customer mistakenly skips the painful activity of getting a single view of customer.

As a proactive measure to ensure the correct level of abstraction, I recommend performing the following activities:

  • Ask the “value” question. As a proactive measure to ensure the correct level of abstraction, I find myself constantly asking the “value” question. That is, if a structure is abstracted, can
    we actually reap the benefits some time in the not so distant future? In Figure 1, for example, the Customer’s names are abstracted into the Customer Name entity. The “value” question might take
    the form of, “I see you have abstracted Customer Name. What are other types of customer names you envision in the next 2-3 months?”
  • Abstract after normalizing. When you normalize, you learn how the business works. This gives you a substantial amount of information to make intelligent abstraction decisions.
  • Consider type of application. Some types of applications, such as data warehouses and operational data stores, require more abstraction than other types of applications, such as data marts. A
    good rule of thumb is if the application needs to be around a long time, yet its future data requirements cannot be determined, abstraction tends to be a good fit.


submit to reddit

About Steve Hoberman

Steve Hoberman is a world-recognized innovator and thought-leader in the field of data modeling. He has worked as a business intelligence and data management practitioner and trainer since 1990.  Steve is known for his entertaining, interactive teaching and lecture style (watch out for flying candy!) and is a popular, frequent presenter at industry conferences, both nationally and internationally. Steve is a columnist and frequent contributor to industry publications, as well as the author of Data Modeler’s Workbench and Data Modeling Made Simple. He is the founder of the Design Challenges group and inventor of the Data Model Scorecard™. Please visit his website to learn more about his training and consulting services, and to sign up for his Design Challenges! He can be reached at