Data Modeling: Understanding and Being Understood

Originally published in TDAN.com in October 2006. From time-to-time TDAN.com shares oldie but goodie articles as Features that are relevant for the times. This article, written by Graeme Simsion, was popular in 2006 and hopefully it will be of interest to people now.

Introduction – The Number Two Challenge

Imagine you’re in an elevator, and someone asks (as they do in elevators) “what’s the purpose of data modeling?” My guess, supported by some research, is that your between-floors response will include the theme of communication – and particularly communication with business stakeholders. When I ask participants in advanced data modeling classes to nominate their greatest challenge, the second-most-common group of responses (after justifying the need for modeling – see my article Tackling Data Modeler’s Toughest Challenge in the April 2005 TDAN.com Newsletter) relates to understanding business requirements, and getting the business to understand data models. In this article, I have summarized ten of the most important principles and techniques for addressing each of these challenges. I’ve favored breadth over depth, and in some cases suggested some further reading if you want to explore the ideas further. Needless to say, they reflect my own philosophy of data modeling, in particular, my view that it is a design discipline (see You’re Making it Up – Data Modeling: Analysis or Design? October 2004 Newsletter).

In the spirit of Stephen Covey’s[1] “Seek first to understand, then to be understood,” I deal first with the problem of understanding business requirements, then look at communicating data models to business people. But first, a few principles that apply to the process as a whole:

The single most important factor in communication is a belief on the part of the business that it matters. People will work hard to understand (and push you to help them understand) if they recognize that the consequences of not doing so are serious. Be concrete: don’t tell them you’re building a representation of business reality (or, heaven forbid, the universe of discourse); tell them you’re specifying a database that will determine what their systems are able to do for a long time to come. This is assuming you’re doing data modeling in the context of systems development, an assumption I’ll make for the purposes of this article. Strategists and metadata managers should adapt the advice accordingly. It’s tempting to spend time on the mechanics of communication – languages, formats, tools. Do this by all means, but recognize that time spent on getting “buy in” is likely to be even more productive.
Look to architects as role models. Architect is the most common noun in the titles of people who come to my classes, and I think the metaphor is a good one. The architect’s approach to eliciting requirements and reviewing designs translates well into the data modeling field – and you have a ready-made analogy for explaining what you’re doing to other stakeholders. Bryan Lawson’s book (see references at end of this article) is an insightful introduction to how designers from other fields – in particular architecture – think and work.
Maintain a separation between requirements and modeling, analysis and design, and “their language” versus “our language.” The central question for the requirements phase is “have we got it right;” for the modeling phase it is “will this work?”

Seek First to Understand

Most data modelers recognize the value of a distinct business requirements phase, separate from modeling, even if they are not always certain as to what the deliverable should look like. Some methods, particularly those that characterize data modeling in terms of describing or mapping reality, omit this phase. I don’t see data modeling as reality-mapping, but as problem-solving – so I strongly support the inclusion of a phase that helps to define the problem. Here are some suggestions on how to get the most value from it.

Don’t expect to get it all. Don’t expect to get a complete set of requirements. On the contrary, once you have a draft model, it will become the focus of future work, and will suggest questions that the requirements didn’t address. Architects typically prefer to start with a short brief, and then move to a conjecture-analysis mode – making suggestions that they test with the client. Let the users choose the language. Don’t force the business users into using an unfamiliar language (that comes later!). Be prepared to capture requirements in whatever form they come. Resist the temptation to record them as data models and tying yourself into a single solution.
Aim for a holistic understanding. Detailed analysis should take place in the context of the bigger picture of what the business is trying to achieve. It’s hard to ask the right questions or make sense of the answers without an overall understanding of the business area and problem – and good design is usually at least in part a holistic response rather than a simple mapping of components of the problem. Understanding context will also help with identifying gaps, recognizing opportunities to use patterns, and spotting integration issues.
Work from the middle out. Actually gaining a holistic understanding can be a challenge. I suggest you start in the “middle” with straightforward questions about the purpose of the database and then work both upwards (“why” questions) and downwards (“what” and “how”) to respectively establish business context and flesh out the detail.
Use an object class hierarchy to capture data requirements. My co-author Graham Witt developed the idea of using simple hierarchies of data objects – including candidate entities, attributes, and derived data – rather than data models to capture user data requirements. The technique has most of the advantages of using data models to record data requirements, but avoids imposing the strictures of a formal data modeling language too early (“that’s not a real entity, sorry”). It is described in some detail in Data Modeling Essentials[2], but the basic concept is simple: don’t filter or criticize what the business people say; just capture it and organize into categories.
Hold hands with the process modelers. You can’t define processes without implicitly defining data, and the ultimate database is going to have to support all those processes. Working closely with the process modelers, particularly when they meet with the business users, can do much to supplement the CRUD matrix in making sure that data and process fit together at the end. And you’ll hear things you might have missed in your own interviews.
Ride the trucks. One of Daniel Moody’s “seven habits of highly effective modelers” is see for yourself, and another former colleague of mine used to advocate “riding the trucks.” The message is the same; don’t rely on interviews with people who “represent the users” but make time to meet the real prospective users of the proposed (and current) system. Even if they don’t tell you anything new (an unlikely scenario), you will gain a far more concrete sense of the business.
Ask about the timeframe. This is a rather specific item in the midst of more general principles, but it’s worth emphasizing. As you develop the data model, there will almost always be tradeoffs between flexibility and other goals (ease of programming, enforcement of business rules, ease of understanding, performance…). A key factor in choosing the best balance will be the expected life of the database. Too often, the argument is essentially “more is better” without adequate regard to the need or costs. Get the best estimate you can up front, and use it explicitly when making these decisions.
Don’t forget the DBAs. It may be “only for performance reasons” but the DBAs will need information that you are well placed to obtain. Talk to them first about what they need (and get the relationship off to a good start). The obvious example is populations of the various entities (something that will add to the big picture for data modeling as well).
Log the requirements. All of the requirements should be recorded in a form that enables them to be checked against the final model. A simple check-list, with references to other documents (e.g. process map, object class hierarchy, business rules) will do the job. This is the first half of the bridge between requirements and model.

Then to be Understood

Once you have a model, the challenge is to get business stakeholders to understand it and verify that it meets their requirements. Much research, both academic and practical, has gone into making models more understandable – and if there’s one clear message, it is that most models go forward without the users really understanding them or their implications. Here are a few things you can do to improve this situation.

Map the model against the requirements. Nothing makes a model more convincing than showing how it meets a list of business requirements (which the reviewer recognizes as including everything he or she told you). This is the second half of the bridge between requirements and models.
Re-emphasize the purpose. Reiterate why it’s important that the business stakeholders verify the model. You’re specifying a database to support their business; you want to be sure it will meet their needs. Simple stuff, but too often this task is presented only as “verifying that our model reflects reality.”
Consider their ultimate relationship with the database. How will these stakeholders interact with the database? Through an application or relatively directly via a query language? Some research found that users understood tables better than diagrams because they were accustomed to dealing with implemented databases. If they will be dealing with the end product, it makes sense to show them the data in that form rather than require them to learn yet another representation. If they’ve been developing their own databases using end-user tools, they will also have some familiarity with at least one representation. If not, choice of language is an interesting challenge; as you simplify, you risk insulating the users from decisions that may have business consequences. I’m inclined to start with a simplified model (e.g. many to many relationships not resolved, common representation of category attributes[3]), then take the business stakeholders with me as later decisions are taken.
Don’t send them a diagram. An emailed E-R diagram with attached definitions and perhaps even instructions for interpretation (a circle means optional) is simply not going to be intelligently reviewed by anyone without prior, solid experience in modeling. A sign-off obtained under these circumstances is only good as an excuse – and shouldn’t even stand up as that.
Start with a high level model. Remember the “seven plus or minus two” rule. Most models simply have too many objects on the page for overall comprehension. A high level model, built around supertypes, and leaving off some of the “minor” entities that don’t supertype readily, is a good starting point. Daniel Moody’s (1996) research in this area contains some useful ideas – and some evidence of the value of higher-level models for assisting understanding.
Build a prototype. A simple prototype can be worth hours of abstract explanation, not only to business users, incidentally, but to process modelers and programmers. This is particularly true if the model uses unfamiliar concepts – for example, if it adopts a highly generic structure in the interests of stability. In lieu of or in addition to a prototype, some familiar (and even unfamiliar) scenarios can help explain how the database will work.
Consider the assertions approach. Graham Witt has done some nice work on representing entire models as a set of plain language business “assertions.” The beauty of the approach is that the reviewer can step through these one by one, marking them “agree,” “disagree,” or “don’t know” with some assurance that they are covering the entire model. The approach is in Data Modeling Essentials, but the relevant extract was published in the April 2005 edition of this Newsletter. Graham continues to update this technique, and our hope is that one or more of the tool vendors will take it up.
Walk them through. Whatever technique you use, you need to be there to hold the reviewers’ hands. Even if you’re using something as apparently straightforward as the assertions approach, the reviewers need an overview of concepts, definitions, and rationale. This may mean a group presentation, or individual reviews, but you need to personally explain the implications and be available to hear tentative feedback that the reviewer might not feel sure enough about to put in writing.
Show more than one alternative. It’s always tempting to show only your preferred model, particularly if you’ve only produced one model! But showing more than one option will help the reviewer understand (a) that there is more than one option[4], (b) what choices you’ve made – and that they may wish to question and (c) that just because the model is workable doesn’t mean it can’t be improved.
Present it as a work-in-progress. The best answer is seldom achieved at the first attempt. The architect analogy is useful here: the “plan” is continually refined as more information and ideas come to mind, and all parties reflect on the problem and solution. You’ll be back – and in the meantime will be open to ideas and queries.
An elevator pitch – Twenty points, plus the introduction, is a lot of advice. If I could suggest only one principle (an elevator pitch), it would be that if you can convince the users of the importance of the data model, and hence of their contribution, the rest will follow. Committed users won’t let you get away with inadequate communication.

References & Endnotes

Lawson, B. (1997). How Designers Think: The Design Process Demystified. Oxford, Architectural Press.

Moody, D. L. (1995). The Seven Habits of highly effective Data Modelers (and Object Modelers?). ER’95 Entity-Relationship Approach 1995 LNCS 1021, Springer Verlag.

Moody, D. L. (1996). Graphical Entity Relationship Models: Towards a more user understandable representation of data. 15th International Conference on Conceptual Modeling, Cottbus, Germany.

[1] The 7 Habits of Highly Effective People, Free Press, 15th Anniversary Edition, 2004.

[2] Simsion and Witt, 2005, published by Morgan Kaufmann.

[3] Rather than, for example, an Account Type entity (with attributes) and a relationship to Account simply to show that the attribute Account Type can take only a nominated set of values.

[4] I know that there are many who would argue that there is only one right answer. This is a topic for another time!

MenuMenu

Introduction – The Number Two Challenge

Seek First to Understand

Then to be Understood

References & Endnotes

Share this post

Graeme Simsion