An application’s flexibility and data quality depend quite a bit on the underlying data model. In other words, a good data model can lead to a good application, and a bad data model can lead
to a bad application. Therefore, we need an objective way of measuring what is good or bad about the model. After reviewing hundreds of data models, I formalized the criteria I have been using into
what I call the Data Model Scorecard™.
The Scorecard contains 10 categories:
- How well does the model capture the requirements?
- How complete is the model?
- How structurally sound is the model?
- How well does the model leverage generic structures?
- How well does the model follow naming standards?
- How well has the model been arranged for readability?
- How good are the definitions?
- How well has real world context been incorporated into the model?
- How consistent is the model with the enterprise?
- How well does the metadata match the data?
This is the ninth of a series of articles on the Data Model Scorecard. The first article on the Scorecard
summarized the 10 categories, the second article focused on the correctness category, the third article focused on the completeness category, the fourth article
focused on the structure category, the fifth article focused on the abstraction category, the sixth article focused on the standards category, the seventh
article focused on the readability category, the eighth article focused on the definitions category, and
this article focuses on the real world category. That is, how well has real world context been incorporated into the model? For more on the Scorecard, please refer to my book,
Data Modeling Made Simple: A Practical Guide for Business & IT
How Well Has Real World Context Been Incorporated into the Model?
This is the only category of the scorecard that pertains to just one type of data model: the physical data model. Here we consider important factors to an application’s success, such as
response time, storage space, backup and recovery, security, reporting tool needs, and so on.
Here are a few of the red flags I look for to validate this category:
- Too much being done by the reporting tool. In a business intelligence (BI) application, if a report is taking two hours to run, then the reporting tool might be doing too much of the work.
There could be very complex calculations being done “on the fly.”
- Long null data elements appearing toward the top of the table on a physical data model. Long null variable length string fields should be moved toward the end of each entity. There is a good
chance the data element will not be filled in or be shorter than the maximum field length. This can save space at the end of each record, which is often a factor when we are dealing with very large
tables. For example, consumer comments text defined as a null 255-character string most likely will save space by appearing as the last data element in an entity instead of the first.
- No partitioning. Both vertical and horizontal partitioning can be effective techniques to save space and increase retrieval, update, delete, and create speeds.
- No or little indexing. I was asked to review a data model recently with the hope of identifying areas to be modified to improve query performance. After reviewing the SQL behind the queries, I
learned that many of the underlying data elements were not indexed. Adding indexes to these data elements would substantially reduce retrieval time. Indexes can also painlessly be added reactively
after a system goes live.
- Denormalization chosen incorrectly over views. In a majority of cases, views can be used instead of creating denormalized tables. Views can preserve the underlying integrity of the model and
also accommodate some structure flexibility that would not be possible using denormalized tables.
As a proactive measure to improve the physical data model, I have found the following techniques to be very helpful:
- Become familiar with what your database and reporting tools like and don’t like – not how the reporting tool works, rather an understanding of the ideal structures for a particular
- There are different types of views, and the physical data modeler or database administrator will need to know when to use each type of view. For example, when to use materialized views versus
regular views versus letting a reporting tool do the work. This is a tough decision, but depending on volumes and usage, materialized views are increasingly the preferred choice.
- Be very selective with denormalization. I believe denormalization should be chosen as the last resort because of loss of data integrity and flexibility. If indexes and views don’t seem to
do the trick, then consider minor denormalization.
- Are the physical structures easily navigated? In BI applications where users have direct access, it is important to have structures that are user-friendly and understandable.