The Data Modeling Addict – April 2010

The following is an excerpt from Data Modeling Made Simple, 2nd Edition , by Steve Hoberman, ISBN 9780977140060.

Precision with respect to data modeling means that there is a clear, unambiguous way of reading every symbol and term on the model. You might argue with others about whether the rule is accurate, but that is a different argument. In other words, it is not possible for you to view a symbol on a model and say, “I see A here” and for someone else to view the same symbol and respond, “I see B here.”

Going back to the business card example, let’s assume we define a “contact” to be the person or company that is listed on a business card. Someone states that a contact can have many phone numbers. This statement is imprecise, as we do not know whether a contact can exist without a phone number, must have at least one phone number, or must have many phone numbers. Similarly, we do not know whether a phone number can exist without a contact, must belong to only one contact, or can belong to many contacts. The data model introduces precision, such as converting this vague statement into these assertions:

  • Each contact must be reached by one or many phone numbers.
  • Each phone number must belong to one contact.

Because the data model introduces precision, valuable time is not wasted trying to interpret the model. Instead, time can be spent debating and then validating the concepts on the data model.

There are three situations however, that can degrade the precision of a data model:

  • Weak definitions. If the definitions behind the terms on a data model are poor or nonexistent, multiple interpretations of terms becomes a strong possibility. Imagine a business rule on our model that states that an employee must have at least one benefits package. If the definition of Employee is something meaningless like “An Employee is a carbon-based life form,” we may wrongly conclude that this business rule considers both job applicants and retired employees to be employees.

  • Dummy data. The second situation occurs when we introduce data that are outside the normal set of data values we would expect in a particular data grouping. An old fashioned trick for getting around the rigor of a data model is to expand the set of values that a data grouping can contain. For example, if a contact must have at least one phone number and for some reason, a contact arrives in the application with no phone numbers, one can create a fake phone number such as “Not Applicable” or “99” or “other” and then the contact can be entered. In this case, adding the dummy data allows a contact to exist without a phone number, which violates, but circumvents our original business rule.
  • Vague or missing labels. A model is read in much the same way as a book is read, with proper sentence structure. A very important part of the sentence is the verbs. On a data model, these verbs are captured when describing how concepts on the model relate to each other. Concepts like Customer and Order for example, may relate to each other through the verb “place.” That is “A Customer can place one or many Orders.” Vague verbs such as “associate” or “have,” or missing verbs altogether, reduce the precision of the model, as we cannot accurately read the sentences.


submit to reddit