In this paper, Part 1 of a series, I will show you some of the techniques you can use to quickly review a data model for the most common mistakes made by data modelers. The list of mistakes is by
no means exhaustive – it’s a list of mistakes that I have identified over the course of 20 years, working with and reviewing models developed by everyone from novices to experts.
Throughout this paper, I use the term project to define the scope of work – a project can be the development of a stand-alone system, the development of an integrated set of systems,
a departmental system, or the development of entire enterprise’s model. While your specific application of the word project will vary, the items described below will apply regardless
of the scope.
1. Not understanding the purpose of logical data modeling
Logical data modeling is a technique that enables an analyst to understand the data and data relationships within a defined scope (project). It is not a process modeling technique, it is not an
organizational modeling technique, and it is not a database or data warehouse design technique. It is a technique to enable people to understand data within a specific context (project).
A logical data model, with its components, (diagram and definitions), defines the rules under which the business, and therefore its systems, must operate. It defines what data is important
(within the scope) to the project and, just as importantly, what is not important (outside of the scope). Data modeling allows the project team to learn about the data from a technology, process
and organizationally independent view and thereby gaining an understanding of what is important – and not how it is gathered, used, stored or transmitted.
2. Building the model with a specific technology in mind
A logical model is just that – it is a model that describes the logical (non physical) data and data relationships used by the enterprise without considering physical aspects of
either the current or anticipated implementation.
- A logical model that considers the current technology will usually cause the next implementation of the data model, often in a different technology, to be based on the exact technology that is
being replaced! All of the characteristics of the current technology will be embedded as requirements into the new model’s design, ensuring they will become part of the new system
– the very system that is supposed to be an improvement. - A logical model that considers the future implementation will confuse the distinction between the what the business does and how it does it. If the logical model is based on
the future technology, what will happen when the database structure is updated – or the database technology itself is updated? You won’t be able to distinguish between
requirements and technology any more – and a new logical modeling project will have to be performed again instead of an updating of the model.
3. Building a model without useful entity and attribute definitions
A logical data model consists of two parts – the diagram that is hung on the wall to intimidate the uninitiated and the definitions that define what everything means. A failure to develop
robust, complete, non-overlapping definitions will result in a pretty picture, but no tangible use for it. One of the most difficult aspects of data modeling is the development of good definitions;
one of the first entities I look at when performing a model review is “Employee.” Typically, the definition I see is “Someone who works here.” At that point, the following
conversation usually occurs:
Scott: I am working here, am I an employee?
Data Modeler: No. So a better definition would be: “Someone who works here and receives a paycheck from the company.”
Scott: I am getting a paycheck.
Data Modeler: Ok, how about “Someone who is getting paid directly by the company”?
Scott: Do people who don’t work here get a paycheck?
Data Modeler: No.
Scott: So if I used to work here, and then retired, would I receive a paycheck?
Data Modeler: Uh, yes. So, I guess we do pay people who don’t work here. So a better definition would be “Someone who is, or was, directly employed by the company and
receives a paycheck.”Scott: Now, do you want to go back and review all of the other definitions you’ve developed?
Data Modeler: No, but I see that I really need to.
While the definition that resulted from that conversation may not fit your specific organization, you should be able to see the type of review and discussion that must occur as part of the
definition process.
4. Lack of understanding of cardinality and its implications
Simply put, cardinality is identifying the numerical relationship between occurrences of item A and occurrences of item B; or, more simply, knowing how many B’s are required for each A. The
cardinality embedded in a data model documents the business rules about relationships between data in a manner that is relevant to the scope. For example, some organizations have a 1:1 relationship
between an employee and a desk. Each employee has one, and only one desk; each desk has one and only one employee. This is radically different from an organization with a pool of desks available
for anyone in the office to use and is also radically different from an organization where a single employee may have multiple desks (usually at multiple locations).
There is no correct answer; it is a business-specific model. However, the implication of specifying the incorrect cardinality is significant. Using a different example, if you wanted to model the
relationship between an employee and child, most people would show that an employee could have 0, 1 or many children, but each child could only be associated with a single employee. The implication
of this cardinality is that both parents of a child could not work for that company. A situation that could force an employee to quit – or to not have children if their partner worked for the
company!!
There are multiple methods data modelers use to ensure they have “covered all the bases”:
Connect-the-Dots Model
In a Connect-the-Dots model, it appears that every entity is connected to every other entity “just in case.”
Mandatory One-to-One Relationships
Mandatory one-to-one relationships state that for every A entity there is exactly one B entity; and for every B entity, there is exactly one A entity. If this is true, then one of three scenarios
probably exist:
- The modeler does not have sufficient business knowledge to identify the true cardinality and optionality constraints;
- The two entities are really one entity;
- There should be two entities, but they are partitioned incorrectly in the data model.
Many-to-Many Relationships
In a completed, normalized, logical model, there will not be any many-to-many relationships; whereas for a model under development, many-to-many relationships will be present. As a reviewer, you
need to know the status of the model you are reviewing before determining the appropriateness of many-to-many relationships.
5. Not enforcing referential integrity
The proper use of referential integrity ensures that data relationships required by the business are identified and enforced via the model. If the business’s requirements are not correctly
modeled, then the database based on the model will be incorrect. Referential integrity ensures that if the business rules state that entity B’s existence is dependent upon the existence
of entity A, those rules are enforced via the key from entity A to entity B.
Referential integrity is enforced in the data model through the use of foreign keys, ensuring that the required keys contained in the child entity are already contained in the parent entity. The
implication of the dependency are quite simple. The child entity can’t be added unless the parent entity is already present, and if the parent entity is deleted, the child entity must also be
deleted.
Part 2 of this series will discuss five more techniques you can use to quickly review a data model.