
My name is Bill Burkett, and I am a data modeler.
I don’t call myself that often and sometimes have misgivings about doing so. I often get the feeling that being a “data modeler,” when considered in isolation of other engineering skills, is a less-than-flattering job title. To practical engineers, data modeling conjures images of eggheads sitting alone (in their ivory towers) and drawing box-and-line diagrams of no practical value while quoting philosophers like Aristotle, Wittgenstein, and Descartes — “I think, therefore it is.”
I created my first data model eight months after completing my undergraduate degree in Industrial and Systems Engineering and joining an industrial automation group in a large aerospace firm. I was assigned to a project developing data exchange standards for moving product design data between design and manufacturing systems. During the three months in which we created a set of initial data models, what struck me — very profoundly — was there was no way to tell whether our data models were any good. There were plenty of “experts” who provided feedback — “This is the right way to do it.” — “That isn’t the right way to do it.” — but it was all opinion. There were absolutely no rules, no metrics, and no guidance for developing quality data models. (Which wasn’t entirely true. I learned later about Codd’s normalization rules; normal form rules provided some measure of quality that ostensibly protected data from, for example, update anomalies.)
I had very little training in computing technology when I developed these data models. This, as the saying goes, was both a curse and a blessing. On one hand, I knew nothing about joins or foreign keys or automata or programming language theory, so I was handicapped when talking to software engineers, database designers, and database administrators. On the other hand, this very same lack of knowledge provided me with a unique perspective on the practice of data modeling. I had no computing technology biases, preconceived notions, or predilections when it came to data modeling. My biases were all systems engineering-oriented: models, simulations, human factors, quality control, and holistic system thinking. I found these engineering considerations almost completely lacking in the world of data modeling.
So being a systems engineer, I began a quest to identify engineering design principles and rules for what made a data model “good.”
What Makes One Data Model Better Than Another?
There are many, many data modelling books that provide a wide, wide variety of data modeling advice for developing “good” data models, and many, many experts practicing in the field. The data model design principles presented below are those I’ve cultivated and found most useful in separating “good” data models from “bad.” They are my small contribution to that large body of data modelling advice.
You may wonder what data model design principles have to do with engineering skills — and, thus, how data modeling made me a better engineer. I have found that the most useful data model design skills really have nothing to do with data. Data modeling techniques like normalization or dimensional fact modeling have less impact on the usefulness of a data model than plain old conventional engineering practices like proper scoping, clear design specifications, and lifecycle management.
Three “beacon” data modeling principles that I think separate the “good” data models from the “bad” are:
- If you don’t know where you’re going, you might not get there.
- Scope! Scope! Scope!
- Choices have purpose. Choices have consequences. Choice affect outcomes.
On the surface, these principles have nothing to do with data modeling. They’re not about data structuring, column naming, or data types. Most books about data modeling fully address topics like normalization, and keys, and so forth, and they do a fine job at that. However, when it comes to modellng the “real world,” usually in the context of a conceptual data model, they say “choose things important to the business” and leave it at that. Not very helpful advice. Lame advice, actually. The data modeling principles presented here take the “choose things important to the business” guidance a few steps further and focus on making better choices.
1. If You Don’t Know Where You’re Going, You Might Not Get There.
Products are designed and engineering designs (e.g., blueprints) are created to meet a set of functional requirements: What is the product supposed to do? For data models, the question is: What is the data supposed to do?
This principle as stated is a mash-up of two separate Yogi Berra quotes. It’s about requirements: If you aren’t clear on the functional requirements that must be met by your data model, you’re basically either guessing or fumbling around in the dark.
What are functional requirements for a data model? There are different categories of functional requirements that depend on who’s using the data. One category of users are the people who are executing business processes (e.g., manufacturing line workers, customer service representatives) who need information (provided by data) to do their jobs — business process information requirements. Another category is the developers creating the software systems; they need efficient data structures to make their software work more effectively (e.g., denormalization), and very often, they need data that the end-user never sees (e.g., flags).
The needs of just these two user categories are often at odds with one another, requiring less-than-ideal compromises. This is why creating conceptual data models is so difficult. (And why I prefer to separate a conceptual data model into a concept model and a physical data model and map between them, so the needs of each community can be met more effectively.)
2. Scope! Scope! Scope!
In my experience, one of the two biggest shortcomings of data modeling efforts is the failure to clearly define the scope of the data model as a whole (boundaries) and the things within it individually (foci). (The other shortcoming is the lack of good definitions — see below.)
A data model is a model: What is a model of? And what isn’t it a model of? Any systems engineer will tell you immediately that models have limitations and that consciously choosing what is represented in the model, and what is left out, is absolutely critical to the model being good for anything.
The scope of a data model is the set of real-world things that the data governed by the data model is intended to represent and convey information about (the subject domain of the data model). The data model is a model of selected information about the subject domain, not the entire subject domain.
One way to describe the scope of a data model is by describing the end-user processes the data is supposed to support. The processes establish the functional information requirements for the data model — what the data has to be able to “do.” I’ve seen plenty of data models created in a process vacuum with an off-hand explanation that “The data is for anybody to use how they like — It’s not tied to any particular process.”
Is this good engineering? No, it is not. If I handed you a blueprint for a new woodworking tool, would I say “Oh, I don’t know how it’s supposed to be used, but look at it — it sure looks useful — I’ll bet a lot of people will use it?” — would you take me seriously as an engineer? I think not, again.
3. Choices Have Purpose. Choices Have Consequences. Choice Affect Outcomes.
Good engineering requires a lot of deep, critical thought. Data models need a lot, lot more deep critical thought than they currently get. Engineers are taught to break down the problem, to isolate separate concerns and then design solutions to address those individual concerns. Taken one step further, identifying functionally independent aspects of the problem and its solution is key to a robust, high-quality design.
(3.1) Choose Functionally Independent, Mutually Exclusive Things
So, how does this apply to data models? Well, an entity type in a data model corresponds to a class (or group) of objects in the modeled subject domain, e.g., a “person” entity corresponds to a group of people in the subject domain and an instance of the person entity type corresponds to one of those people. Creating a high-quality data model requires (1) choosing entity types corresponding to classes that don’t overlap in terms of membership and (2) defining those entity types clearly enough so that membership is unambiguous. If your data model has both a PERSON and CUSTOMER entity type in it, non-overlapping means that you don’t and can’t have data about a single individual human being in both a PERSON and a CUSTOMER table. Eliminating questions like “Where does data about this person go?” is a huge step in improving the quality of your data model.
(3.2) Start with Objective, Unarguable Things
The key to designing a high-quality data model is making it as clear and unambiguous as possible. Among the challenges that make attaining this goal difficult is the “eye of the beholder” that is creating the data model. People see different things, different things are important to them and their agenda (be it overt or covert), and what they see is equal parts perception of and experiences with what they’ve seen before. (Which is all recognizable as pop psychology, but few data modelers will admit that these factors bias their modelling skills, practice, or models.)
“Unarguable” means selecting things as starting points for the data model that are the least likely to result in arguments. I’ve learned to start the modeling process by “grounding it” in objective, unarguable things — things that the data modeler, members of the data modeling team, and any reviewers of the data model can physically “point at” (e.g., with their finger) and objectively see more-or-less equivalently. Human beings, trees, buildings, cars — these are all objective things that people essentially see and perceive the same way. Once identified and modelled, they can be combined with events, situations, and other intangible concepts. For example, a commercial purchase transaction (e.g., you buy something at the store) is an intangible event pattern that consists of objective physical things: an entity that purchases, an entity that sells, a good or service that is purchased, and the currency used to purchase.
(3.3) If You Can’t Define It, You Don’t Know What It Is
The second major shortcoming of data modeling efforts is the lack of high-quality definitions for elements1 of the data model — or of lack of definitions of any kind, for that matter. I’ve seen many data models that are just diagrams or lexical schemas — any meaning/understanding of the model had to be derived from looking at the names used for the elements and their relationships to one another. Schemas and diagrams do convey a lot of information, but the model is not complete — it’s missing explanation and context and intention. Clear definitions and full model documentation are necessary if the model is to be considered a full and complete engineering specification.
Crafting clear and high-quality definitions is not easy. I’ve seen many data model element definitions that are just a list of examples, or explain how the element is supposed to be used, or burden the definition with all kinds of unnecessary filigrees, or are worded in way so buried in the context of the author that they’re impossible for an “outsider” to understand. I use a very simple heuristic when evaluating the quality of definitions:
Given a definition D for term T and an arbitrary object X, is D clear, complete, and unambiguous enough for any average reader of English to ascertain with a very high degree of accuracy and confidence whether or not X is a T?2
Are good definitions part of good engineering? Absolutely! If your design specifications are ambiguous in any way, you can be sure that a developer, craftsman, or manufacturer will interpret your specification in an unintended way and the resulting product will not perform as intended.
Data Modeling Principles and Engineering Design Principles
What I eventually realized was that my training as a systems engineer was influencing how I saw the process of data modeling and what I felt was wrong with many data modelling efforts. I further realized, to my surprise, that the data modeling principles I was cultivating and applying were making me a better engineer overall. My critical thinking was honed, and my communications were clearer. I was becoming far more practical and realistic in what I was trying to do and far less prone to the magical thinking of good idea faeries that seem so prevalent in the computing world.
When I did some research on “What makes a good engineer,” “What makes a good engineering design,” and “Engineering design principles,” I was again surprised to find a great deal of correlation between my data model design principles and the principles of sound engineering design.
For example, some characteristics of a great engineer are[1]:
- Possesses a strong analytical aptitude
- Shows an attention to detail
- Has excellent communication skills
- Takes part in continuing education
- Creative
- Shows an ability to think logically
- Mathematically inclined
- Has good problem-solving skills
- A team player
- Has excellent technical knowledge:
Wouldn’t you like to work with a data modeler with these skills?
Some characteristics of a good engineering design are[2]:
- Makes a product useful
- Innovative
- Makes a product understandable
- Aesthetic
- Makes products easy to transport, store, and maintain
- Long lasting
- Environmentally friendly
- Less design
And wouldn’t you like to find and use data models that possess these properties? (The “environmentally friendly” could be interpreted as the trees saved by publishing good data models rather than the rambling, unfocused iterations of bad data models.)
Is “Data Modeling” an Engineering Practice? Is a Data Model an Engineering Design?
If I haven’t convinced you that the answer to these questions is a resounding “Yes,” and that engineering design principles and practices are eminently applicable to data model design, then I have poorly engineered this article.
A data model is every bit as much an engineering design as a blueprint for an aircraft wing spar. And — like the spars and bulkheads that form the structural core of an aircraft — the databases and data assets constructed from data models form the structural core of enterprise IT systems that brings all the components of those systems together into an integrated whole.
Data modeling not only made me a better engineer, but changed the way I look at engineering. In the IT design guidance and system models and specifications I’ve created, I’ve begun to think that I have it backwards. It’s not that data modeling is engineering. I’ve begun to think that, in reality, all engineering is just data modeling.
References
- engineeringschools.com/resources/top-10-qualities-of-a-great-engineer
- vitsoe.com/gb/about/good-design
- Jackson, Michael. Software Requirements & Specifications: A Lexicon of Practice, Principles and Prejudices, ACM Press Books, 1995.