The 10-Minute Data Model Review, Part 2

In this paper, Part 2 of a series, I will show you some of the techniques you can use to quickly review a data model for the most common mistakes made by data modelers. The list of mistakes is by
no means exhaustive – it’s a list of mistakes that I have identified over the course of 20 years, by working with and reviewing models developed by everyone from novices to experts.

6. Introducing “smart” attributes

“Smart” attributes aren’t. The concept of a smart attribute, wherein specific positions or values within the implemented field hold meanings unto themselves, indicate that the
data modeler has embedded several attributes into a single physical implementation. For example, I have seen cases where an attribute definition will say: “This attribute defines the
product type. The 1st character indicates the manufacturing plant, the 2nd and 3rd characters indicate the year the project was made… etc”.
It
is obvious that the modeler has taken several logical concepts and jammed them together into a single attribute – in other words, made a physical implementation decision and imbedded it
into a logical model.

Imbedding physical characteristics into a logical model invariably leads to trouble from both a maintenance standpoint and a technology change standpoint. For example, historically, in the United
States and Canada, the middle digit of the telephone area code was always a “0” or “1”, indicating to the phone switch that it was an area code, and thereby specifying 8
more digits were following (additionally, the middle digit of “0” indicated an entire state or province, while “1” indicated a smaller region) conversely, the middle digit
of the telephone exchange could not be a “0” or “1.” When the all of the combinations of 3-digit numbers were assigned, a massive reprogramming of all telephone switches
had to be performed as the 0/1 restrictions were lifted – in other words, the inclusion of “smart” codes led to massive changes.

7. Partially null primary keys

The creation of partially null primary keys (PKs) will invariably lead to trouble. A partially null PK occurs when an entity has a Foreign Key from another table, but some of the key attributes
in the second table are marked as “optional,” as shown in the diagram below. The optionality for Parent id in the Child table indicates that it is not required, but by the very nature
of being a key, it is required. While the example below shows the optionality problem as a Foreign Key, the same optionality issue can be applied as a primary, non-foreign key attribute, although
this occurs on a less frequent basis.


When the developer comes across a partially null in the database, an excellent developer will question the model and will seek additional information from the analyst/data modeler; however, the
majority of developers will “fix” the problem themselves, based on their knowledge of the business or how they have handled a similar situation in the past. While these developers will
resolve the problem to the best of their ability, most organizations do not want programmers setting corporate policy.

 

8. Inconsistent or incorrect normalization

Normalization, when properly performed, leads to an understanding of the meaning of data and the interrelationships between the data items. Poorly performed, it leads to a large collection of
boxes and lines, with virtually no meaning. There are established rules that explain why normalization should be performed, when it should not be performed and, most importantly, how it should be
performed. The process of normalization is often used as an excuse to insert physical considerations into a logical model under the guise of “that’s how we’re going to implement
it anyway.” Nothing could be further from the truth – normalization is a process to further “logicalize” the data – not “physicalize” it.

 

9. Lack of naming and definitional standards/conventions

In my previous article, I mentioned the need for useful definitions; in addition to good definitions, the items must be well
named and in accordance with organizational standards. As data modelers, we must ensure that the names we use are indicative as to the items’ logical meanings (not their use). For example,
“Customer Name” is a good title for an attribute as it indicates the meaning of the term; whereas “Customer Key” tells the reader how the item is used (physical
characteristic) and not what the item is.

Many organizations have established standards for naming attributes to ensure that a quick glance at the attribute’s name will provide some insight to its meaning. For example, some
organizations have standards that state:

  • “All numeric attributes must have the word ‘Number’ as the last word in the attribute’s name. (i.e., Social Security Number or Telephone Number).”
  • “All attributes that are used as references for other attributes must have the word ‘Code’ as the last word in the attribute’s name. (i.e., State Code or Product
    Code).”

Notice that both examples contain completely spelled out words (Telephone, Number, Product, etc.) not abbreviations. If you were to review the “Employee” entity, you would expect to
see attributes that were clearly related to employee name, employee phone, employee grade, etc. – and not attributes called emp_nm, emp-p-num and emp-gr.

If the organization does have naming standards, your job is increased in complexity – you have to ensure that the names are good and that they are in line with applicable standards (even if
they are your own standards). On the other hand, attributes that do not follow the standards are easy to identify – thereby making your job somewhat easier.

10. Missing a meta model

A meta model defines the “vocabulary” to be used when describing something. For example, when building a data model, we use the terms “entity,” “attribute” and
“relationship.” However, there are other terms that must be considered as part of a data modeling effort – some examples are “definition” and “allowable
value.” As I mentioned previously, all of the entities and attributes must have robust, non-overlapping definitions as well as all of the items specified in the meta model. Some
organizations state that in order for a model to be complete, each entity must have the following items: Name, Definition, Primary Key, Foreign Key (if required), Non-Key Attributes, and
Anticipated Growth Rate. Similar requirements may be created for attributes and relationships as well.


Summary

As a technique, data modeling does not need to be confined to IT projects. For example, as part of my MBA program, I was enrolled in a contract law class with some of my IT classmates. As the
professor spoke about cases, plaintiffs, defendants, appeals, etc., I took my notes in data model notation. As the professor walked past my desk, she glanced down at my notes and noticed boxes,
lines and a few words written down. Surprised, she asked me what I had been doing instead of taking notes. When I started to explain, one of my IT friends chimed in with a comment: “Is he
taking notes in data modeling format again? He always does that.” And my friend was correct, I take most of my notes via boxes, lines, cardinality symbols, and textual definitions; after all,
the professor was stating her business rules.

Share

submit to reddit

About Scott Benson

Scott Benson is a Senior Data Scientist at SynergyBIS and can be contacted at sbenson@synergybis.com .

Top