Excerpt from Data Modeling for the Business: A Handbook for Aligning the Business with IT using High-Level Data Models
By Steve Hoberman, Donna Burbank, and Chris Bradley
This is the ninth in a series of articles covering the ten steps for completing the High-Level Data Model (HDM), which is also known as a subject area model or conceptual data model. In our article series so far, we covered an overview of the HDM and seven of the ten steps to building one: Identify Model Purpose, Identify Model Stakeholders, Inventory Available Resources, Determine Type of Model, Select Approach, Complete an Audience-View HDM, and Incorporate Enterprise Terminology. In this article we will discuss the eighth step, Signoff. Here are all ten steps as a reference (the step in bold is the focus on this article):
- Identify model purpose.
- Identify model stakeholders.
- Inventory available resources.
- Determine type of model.
- Select approach.
- Complete an audience-view HDM.
- Incorporate enterprise terminology.
After the initial information gathering, make sure the model is reviewed for data modeling best practices as well as the fact that it meets the requirements. The sign-off process on a HDM does not require the same formality as signoff on a physical design, but it should still be taken seriously. Usually email verification that the model looks accurate will suffice.
Validating whether or not the model has met data modeling best practices is often done by applying the Data Model Scorecard®. The Data Model Scorecard contains 10 categories for validating a data model. Here is a brief description of each of these ten categories:
- Model type. This question ensures that the high-level model being reviewed meets the definition for a high-level model. Concepts need to be intuitive to a business user with clear definitions. The model should fit neatly on one page. When we see a HDM that is fully attributed and a few steps away from actual database tables, it loses points in this category.
- Correctness. We need to understand the content of what is being modeled. This can be the most difficult of all 10 categories to grade because we really need to understand how the business works and what the business wants from their application. If we are modeling a sales data mart, for example, we need to understand the key concepts necessary for sales reporting, including both measures and the levels of detail these measures need to be understood by, such as by Month, Region, and Brand.
- Completeness. This category checks for data model components that are not in the requirements or requirements that are not represented on the model. If the scope of the model is greater than the requirements, we have a situation known as ‘scope creep.’ If the model scope is less than the requirements, we will be leaving information out of the resulting application, usually leading to an enhancement or ‘Phase 2’ shortly after the application is in production. For completeness, we need to make sure the scope of the project and model match, as well as ensuring all the necessary metadata on the model is populated. Regarding metadata, there are certain types that tend to get overlooked when modeling, such as definitions and stewardship. A HDM missing key concepts or definitions will lose points in this category.
- Structure. This is the ‘Data Modeling 101’ category. This category validates the design practices employed to build the model. Just like a blueprint for a house needs to be structurally sound, so does a data model for a database. For example, circular relationships are something we would capture in this category.
- Abstraction. This category gauges the use of generic concept and relationship structures. One of the most powerful tools a data modeler has at their disposal is abstraction; the ability to increase the types of information a design can accommodate using generic concepts. Going from Customer Location to a more generic Location, for example, allows the design to more easily handle other types of locations, such as warehouses and distribution centers.
- Standards. Correct and consistent naming standards are extremely helpful for knowledge transfer and integration. New team members who are familiar with similar naming conventions on other projects will not lose time learning a new set of naming standards. This category focuses on naming standard structure, abbreviations, and syntax. In reviewing a HDM, we would expect all concept names to be singular, they should represent information instead of processes, and relationship names should be verbs, for example.
- Readability. This question checks to make sure the model is visually easy to follow. Readability needs to be considered at a model, concept, and relationship level. At a model level, we like to see a large model broken into smaller logical pieces. At the concept level, one popular technique is to place child concepts below parent concepts. Child concepts are on the many side of the relationship and parent concepts are on the one side of the relationship. So if an Order contains many Order Lines, Order Line should appear below Order. At a relationship level, we try to minimize relationship line length, the number of direction changes a relationship line makes and the number of relationships crossing each other. We also look for missing or incomplete relationship labels.
- Definitions. This category includes checking all definitions to make sure they are clear, complete, and correct. Clarity means that a reader can understand the meaning of a term by reading the definition only once. Completeness ensures the definition is at the appropriate level of detail and that it includes all the necessary components, such as derivations and examples. Correctness focuses on having a definition that totally matches what the term means and is consistent with the rest of the business.
- Consistency. Does this model complement the ‘big picture’? The structures that appear in a data model should be consistent in terminology and usage to structures that appear in related data models and with the enterprise model, if one exists. This way there will be consistency across projects. If a Marketing HDM contains the term ‘Customer’ and the enterprise HDM calls this concept a ‘Prospect’, we would need to resolve this difference.
- Data. This category determines how well the concepts and their rules match reality. Collecting real data can be difficult to do early in a project’s lifecycle, but the earlier, the better so you can avoid future surprises that may be much more costly. If data surprises are not caught during the design phase, they will probably be caught during system testing or user acceptance testing at a much higher cost of time and money. A quick look at the data for the Customer concept, for example, can reveal whether Customer is always a company or can also be an individual.
Once the review is complete, you will need signoff from the model’s audience, most likely a business user or business manager. If the audience member or members responsible for signing off on the model have been involved in building the model, the signoff should just be a formality. Having signoff is critical to the credibility of the model.
In my next column I will go into detail on Step 9, Market.