The 10-Minute Data Model Review, Part 3

In this article, Part 3 of the series, I will show you some of the techniques you can use to quickly review a data model for the most common mistakes made by data modelers. The list of mistakes is
by no means exhaustive – it’s a list of mistakes that I have identified over the course of 20 years, by working with and reviewing models developed by everyone from novices to experts.

11. Mandatory one-to-one relationships

By definition, a mandatory one-to-one relationship exists when every occurrence of entity A requires exactly one occurrence of entity B and every occurrence of entity B requires exactly one
occurrence of entity A. This indicates the two entities are very closely related and may, in fact, be pieces of the same concept. In this case, chances are the one entity was incorrectly divided
into two entities, poorly attributed or poorly defined. If the two items are always required together, as indicated by the mandatory one-to-one relationship, then they should be combined into a
single entity.

12. Shaped relationships

There are two major “shaped” relationships, triangular and circular. They are similar in cause and impact. They differ in appearance.

A triangular relationship exists when every occurrence of entity A has one or more occurrences of entity B; entity B has one or more occurrences of entity C; and entity A has one or more
occurrences of entity C (see Figure 2).



A circular relationship exists when every occurrence of entity A has one or more occurrences of entity B; entity B has one or more occurrences of entity C; and entity C has one or more
occurrences of entity A (see Figure 3).

In either of these two shaped relationships, there is a fundamental error in understanding. Chances are the definitions of the three items are overlapping, such that information may be placed in
any or all of the entities. (Remember the rule on non-overlapping definitions from my first article. For example, if the
definitions are:

Entity A – A multi-sided object
Entity B – An object with parallel sides
Entity C – An object with 2 pairs of parallel sides

How would you categorize a square? It has multiple sides (entity A), it has parallel sides (entity B) and it has 2 pairs of parallel sides (entity C).

13. Incorrect subtyping

Subtyping is the process of taking a general concept and splitting it into smaller and more defined types based on the subtypes, attributes, relationships, or both. The subtypes inherit all of
the characteristics (attributes and relationships) of the parent (super-type), plus they have their own unique characteristics. For instance, the entity “Employee” can be subtyped
into “hourly” and “salaried,” provided that there are business reasons that require a differentiation between the attributes and relationships that pertain to the specific
subtypes. For instance, “salaried” may receive benefits (vacation, sick, etc.) that “hourly” does not – these would be shown as relationships to different entities.
Subtyping can be performed multiple times as well. “Salaried” could also be subtyped into “executives” and “staff,” provided there are differences in
characteristics.

Additionally, a second partition could be established that differentiated between “union” and “non-union” employees if there were different business rules (attributes and
relationships) that applied. Subtyping is not a requirement – it is a modeling technique that can be used to diagrammatically show the distinction with in a general category.

14. Entity pairs with many relationships

Similar to the mandatory one-to-one relationship problem, when many relationships between entity pairs exist, it is an indication that the two are very closely related and may, in fact, be pieces
of the same concept. In these cases, chances are the two entities are partitioned incorrectly, poorly attributed or poorly defined. While it is possible to have multiple relationships between
entity pairs, once the number of relationships exceeds four or five, the area should be investigated and probably restructured.

For example, in Figure 6, Employee is shown with four separate relationships to Car. While all four may be technically correct, only those that are relevant to the business should be shown.

 

15. History, log, audit and system entities

History, Log, and Audit tables are an important part of any database and computer system. However, they are not part of a logical model. All three of these items are needed for speed reasons
(history), backup/recovery reasons (log) and tracking/liability reasons (audit), but very rarely are they needed for business (logical) reasons.

System entities should never be part of a logical model. System entities exist solely for physical implementation reasons (i.e., the system) and therefore should not be included in any logical
data model.

 


Summary

As a technique, data modeling does not need to be confined to IT projects. For example, as part of my MBA program, I was enrolled in a contract law class with some of my IT classmates. As the
professor spoke about cases, plaintiffs, defendants, appeals, etc., I took my notes in data model notation. As the professor walked past my desk, she glanced down at my notes and noticed boxes,
lines and a few words written down. Surprised, she asked me what I had been doing instead of taking notes. When I started to explain, one of my IT friends chimed in with this comment: “Is he
taking notes in data modeling format again? He always does that.” And my friend was correct. I take most of my notes via boxes, lines, cardinality symbols and textual definitions. After all,
the professor was stating her business rules.

Share

submit to reddit

About Scott Benson

Scott Benson is a Senior Data Scientist at SynergyBIS and can be contacted at sbenson@synergybis.com .

Top