Introduction – Our Own Worst Enemies?
Recently, I began a Data Modeling “Master Class” by asking participants to nominate the toughest challenges facing them in their professional work. Virtually all the responses were about “soft”
or “political” issues: persuading project teams to include a data modeling phase, negotiating territory with database technicians, gaining access to business people, and, of course, staying
employed. No mention of normalization, star schemas, time-dependent data, exploiting new DBMS features, choice of language or indeed anything remotely technical.
This response would not surprise anyone who has attended data modeling discussions or panels at recent conferences. It seems that, as a group, data modelers feel under-appreciated, under-used and
perhaps a little unloved. And not without some justification: at a DAMA conference earlier this year a prominent proponent of agile development approaches[1]
said that whenever he sees the word “data” in someone’s job title his first assumption is that “they’re a useless old relic who will be part of the overall problem.”
The good news is that we have, at least to some extent, brought this situation upon ourselves – good news because it means that we are in a position to do something about it. Data modeling is
relevant and important – and so are data modelers – but we have to learn to deliver the message effectively and stake out a clear role in the systems life-cycle.
Here are a few key things we can do. There are seven of them, and I’ve resisted the temptation to give them the status of “Seven Deadly Sins” or “Seven Habits”, because they do not even begin
to cover the full breadth of the data modeler’s work. There are plenty more things that professional data modelers need to do besides communicating the nature of their work, and I only touch
briefly on the analysis and design activities which are the core of data modeling work. But we won’t get to do them if we can’t persuade others of their value.
1. Develop an Elevator Pitch
As part of a research project I’ve been undertaking over the past three years[2], I’ve asked data modelers in Europe and North America to respond to the
question “What is data modeling?” I give them three minutes – certainly more time than they would have for the proverbial “elevator pitch”, and probably more than many of their IS and business
colleagues might allow them. Reading the responses is a fascinating exercise. Bluntly, few would make sense to someone outside the field, and very few address the purpose of modeling.
As supposed experts in definition, we might reasonably be expected to have a good common definition of our own field – “data modelers, heal thyselves!” In the apparent absence of that, we should
at least have personal “elevator pitches” pre-prepared for IS and general audiences (so that next time the taxi driver asks “what do you do”, you won’t have to say “I’m in computers” – and
you can take the opportunity to update your partner and kids). If we can’t explain clearly what we do, then we can’t expect people to support it.
At the risk of being overwhelmed by correspondence and denigrated for oversimplification, my own summary for the general public, which includes anyone who isn’t an IS professional, is that I
design[3] database structures to meet business requirements.
“So you design databases?” “Pretty much – not all the technical detail, but I specify what structures need to be in place to meet the business needs. Someone else builds it according to that
specification.”
It’s not perfect, but it does convey the purpose and the need.
Enterprise modelers need to address the same question, and their answer will depend on what they are trying to achieve in their organization. Recently I spoke to a senior IT consultant who had been
brought in to review a major project. He commented that the data architects had been the only group unable to give a clear account of what they were doing. Regardless of who’s to blame, it’s our
problem to solve.
2. Stake Out the Territory
If data modelers are to be seen as valuable contributors to systems development, they need to claim primary responsibility for at least one key deliverable in the information systems development
cycle – and in my (strong) view that deliverable is the conceptual schema[4] (I hesitate to use a less formal term because of the variations in
definitions – but in relational database terms I would say “the definition of the base tables”).
I suggest the conceptual schema because:
It has a physical manifestation in the final application: it is the database schema that will be seen by programmers or users. It’s easy to point to: “we did that”.
It addresses a distinct type of requirement, and accordingly requires a particular set of skills to develop. Essentially, conceptual schema design requires an understanding of the meaning of data
and of sound logical data organization. By contrast the designer of the internal schema (indexes, physical file placement, etc.) requires an understanding of how data will be accessed and how to
use available tools to achieve performance goals. The conceptual schema designer needs to understand the business; the internal schema designer should be an expert in the use of the target DBMS.
Despite what the textbooks might say or imply, development of the conceptual schema does not usually proceed according to a neat “waterfall” of stages: earlier decisions are frequently revisited,
and it therefore makes sense for the same person to manage the total task rather than trying to break it into stages.
There are arguments for the business-focused data modeler stopping earlier, perhaps with a “conceptual data model” which is DBMS-independent (or even DBMS-architecture-independent), but we still
need someone to complete the task, and I would see that person as a data modeler.
A survey of data modelers[5] showed that the majority (over 80% in fact) saw the data modeler’s responsibility as spanning all of the conceptual schema design
tasks, except those primarily related to performance, such as de-normalization and introduction of redundant data. On these latter tasks, which the respondents placed in sequence after the
non-performance-focused tasks, opinion was split roughly as to whether they should be the responsibility of the data modeler or the DBA, with a few nominating a joint responsibility.
My position on this is clear. I don’t believe it is appropriate for a database technician to take responsibility for the completion of the conceptual schema design. Despite claims to the
contrary, it is no easy matter to isolate transformations which can be made to the conceptual schema without impact on its representation of business data and rules – and on the process modeler and
programmer’s work. Nor is it easy to restrict a database technician to making only such changes! Of course the DBA should be a participant in performance related decisions, and may well provide
many of the ideas. But the responsibility for making sure that the ultimate schema represents the best result for the business should remain with the modeler.
Unfortunately, in many organizations, the data modeler walks away from the conceptual schema at a relatively early stage of design, leaving important decisions which require business data knowledge
and have business impact to the DBA. Programmers wait until the real designer has stepped forward, and liaise with him or her to clarify data meaning and usage (54% of survey respondents nominated
this task as a sole or joint responsibility of the DBA).
For some data modelers, this is a comfortable situation, as they avoid taking responsibility for the conceptual schema being workable. The result can be a vicious circle: the modelers realize that
their work will not be taken as definitive and accordingly are less concerned about producing implementable designs; the DBAs perceive the data modelers’ designs as inadequate and become
increasingly bold in making change, reinforcing the data modelers’ perceptions…
And people begin asking – what do we need the data modelers for?
3. Use a Good Analogy
Once we get past a simple definition of modeling and its deliverables (and arguably even at that first stage), one of the most useful props for communicating with other stakeholders is a good
analogy with something more familiar. For most people, data modeling concepts served up raw make for a pretty indigestible dish.
Steve Hoberman emphasizes the value of good analogies and explores several in his book “The Data Modeler’s Workbench[6]”. I’ve found the analogy with
architecture particularly productive, even to the extent of encouraging me to re-examine some of my own assumptions about modeling based on what architects do. In particular:
The architect sits between the client / user and the builder, and the respective roles map quite well onto data modeling roles.
The architect is a respected professional with extensive expertise – even though some builders and clients believe they could get along without him or her.
Even if no-one with the formal title “architect” works on a project, there is still the task of developing a plan; no building was ever built without one, just as no database was ever built
without a data model, even if it only existed in someone’s head. (Of course, after the database or house has been built, the plan is relatively easily extracted: we can then all opine on whether
it was a good or bad one).
Architects work within town planning regulations (and indeed town plans) – so the analogy extends to working within an enterprise model.
I’ve found that we can take the analogy much, much further without being constrained by it: it’s perfectly reasonable to start an explanation with “unlike architects…” and still gain from the
grounding that the analogy provides.
4. Clarify the Process and the Terminology
My research has confirmed what most data modelers (and, unfortunately many of our non-modeling colleagues) have observed for themselves: data modelers are not in accord as to where data modeling
begins and ends in the database development life-cycle, what activities should be classed as data modeling, and who should perform them.
Nor is there a consensus as to the stages of data modeling and their boundaries.
If asked to nominate the principal phases in the overall database design process, from recognition of the broad need through to an implement-able design, data modelers[7] will commonly nominate “Requirements Analysis, Conceptual Data Modeling, Logical Data Modeling, Physical Data Modeling or Physical Database Design” or some relatively minor
variant. So far, so good!
Unfortunately the agreement ends there. For example, survey respondents were roughly evenly divided as to whether Requirements Analysis and Physical Data Modeling were data modeling activities, and
whether they were the data modeler’s responsibility. Likewise there was no consensus as to what activities were performed and what decisions were taken in each phase. When is normalization done?
When are primary keys finalized? When is derivable data removed? In fact, activities were typically assigned to more than one phase – on average each activity was assigned to two different phases.
Perhaps some of this confusion arises because modelers’ real work and deliverables do not follow these textbook phases. Many CASE products do not provide for the documentation of distinct (but
cross-referenced) conceptual, logical and physical models, let alone a formal requirements model. In the real world, we often see iterative development of a single model, without clear lines
between the phases.
When theory and practice do not align, we should not automatically call for re-education of the practitioner. Practices arise for a reason; rules are broken or ignored for reasons – sometimes good
ones, but almost always relevant ones. If we look to the architecture analogy, we see the iterative development of a design, with progressive addition of detail and little maintenance of earlier
deliverables, as a legitimate way of working. So there is some theoretical support for this approach, as well as the practical argument that plenty of working databases have been built this way.
In writing Data Modeling Essentials, Graham Witt and I chose to present data modeling using a conventional four-stage approach, but, from my point of view at least, this was more to
provide some structure for the mass of activities and decisions than because I believed that it was the only way to organize a project. My thinking continues to develop on this one, informed as
much by “what works” as “what should work”.
The crucial point is to be clear and honest about describing what you do, rather than describing a textbook approach and then surreptitiously replacing it with an approach that works. Those around
you will understand what you are saying, and will see that your deeds match your words. And you will be better placed to reflect on and continue to improve what you do.
5. Don’t Dumb it Down
Defining an activity clearly and precisely is not the same as suggesting that it is easy to do. Unfortunately many descriptions of data modeling and its activities imply that it is a relatively
mechanical, descriptive exercise involving extracting information from users and representing it using a simple set of conventions. In that scenario, the data modeler’s principal skills are the
ability to conduct an interview and familiarity with the conventions. Data modelers and authors are their own worst enemies here: “learn the conventions and you can develop your own model” they
tell the user. Small wonder that the user cannot see what the data modeler is being paid for.
I have drawn much flak (and equally much support!) for my longstanding contention that data modeling is a design activity – that data modelers create a model in response to a set of
business requirements rather than merely documenting those requirements. I have learned that simple summaries of the argument will not persuade those who see data modeling as essentially
descriptive rather than prescriptive, so will not attempt to justify the position here (see the TDAN.com article, referenced earlier, or Data Modeling Essentials[8] for such an argument). But the position supports a view that data modeling is creative, not amenable to simple recipes, and, above all, difficult. Real data modelers, doing
real data models struggle – the answers do not pop up unbidden. The analogy with architecture is particularly pertinent here (and conversely, if you don’t believe that data modeling is
design, then it is probably better to use a different metaphor[9]).
We don’t need to shout or complain about how difficult our job is, but we should be confident that what we are doing is a design discipline that takes time to learn and a long time to master.
Conveying that message to others starts with believing it ourselves.
6. Flaunt the Technical Expertise
The data modeler needs a substantial body of knowledge to make the journey from a hazy set of user requirements to a fully-defined conceptual schema design. Some of that knowledge relates to the
business (and business in general), some to generic principles for structuring data, and some to the facilities available within a particular DBMS or DBMS architecture. The data modeler needs to be
the acknowledged expert at least in the last two of these.
We work in a technical field, surrounded by people whose status is based on their knowledge of some technology and how to use it. In our case, that technology is the logical data structuring
facilities of the DBMS. We need to know these cold. Unfortunately, DBMS knowledge is often seen as the domain of the DBA, and many DBAs know more than their data modeling colleagues about what
structures and constraints can be supported. Small wonder they end up taking over the modeling, often through some loosely defined “handover” in which business knowledge and ideas are lost. If
you are a data modeler, and are not thoroughly familiar with your target DBMS (or current relational database structures in general), this should be you next training goal. Conversely, as an expert
in the field, you can expect to enjoy some respect from the “techies” with whom you work.
The need for us to be experts in logical data structures should be beyond debate; this is what our job’s about. We do, of course, need to convince others that such expertise is necessary to their
project. In a recent (public) debate, a participant questioned the need for specialist data modelers. I responded by presenting him with a scenario: you have three entities, each in a many-to-many
relationship with the other two. You mechanically resolve the relationships with three all-key tables. Someone suggests combining (joining) the three tables into a single table with a three-part
key. What are the pros and cons of the two options?
My fellow panelist admitted with good grace that he didn’t know the answer, and conceded that he would need to ask an expert. However, I fear he could have severely weakened my position by asking
the data modelers in the audience to provide that answer.
This is a standard structure and quite common – even more common in a simpler two-relationship form. If you don’t know the answer, it’s about time you hit the books again[10]. What about the practicalities of fourth and fifth normal forms? I ask this question of attendees at advanced data modeling classes and am lucky if anyone claims to
know. Again, this stuff matters, and knowing it puts you in a position to provide authoritative answers to common issues[11].
I don’t belong to the school that berates modelers for failing to understand the mathematical underpinnings of the relational model, simply because I don’t see that a major source of modeling
errors. You can be a very good modeler without understanding relational calculus – and probably not much better if you do understand it. We can do without intellectual snobbery. But there is a
wealth of knowledge which we do need to have, and it is the core of what we do. We need to make time away from the day tohon day politics of modeling to ensure we keep that body of knowledge up to
date.
It’s one thing to complain about being unloved because we’re misunderstood; another to deserve it because we don’t know our craft.
7. Stick Around
My former colleague Daniel Moody identified, in the manner of Stephen Covey’s 7 Habits of Highly Effective People, seven habits for data modelers[12].
His seventh habit was follow the job through to completion. This contrasts with Ernest Hemingway’s advice on writing screenplays: when you’re finished, throw it over the California state
line and get the hell out of there.
Far too often, the data modeler, whether by choice or not, ends up “doing a Hemingway”. Frequently we see data modelers moving on to new work before program designers have gained a proper
understanding of the data structures and how they are intended to be used. The task of interpretation then typically falls to (who else but) the DBA.
Time and again, I have seen good data models undermined by incorrect interpretation of their intention: “Yeah, I know it’s called ‘Party’, but it means ‘Customer’, the ‘Party Type’ column
isn’t used for anything, so use it for Customer Rating”.
The job of data modeling is not finished until the model is effectively communicated to all who need to use it. Just because it is now manifest in a physical database, hopefully well documented,
does not mean that the knowledge of what it means and how it works has been magically transferred to its new custodians.
And of course, we know well that blame for problems is typically assigned to those who are no longer around to defend themselves. So stick around, communicate the model, make sure changes are
handled properly, and become part of the team that delivers the final product.
[1] Scott Ambler, Data Modeling Panel, International DAMA / Metadata Conference, Los Angeles, May 2004. The exact quotation is from correspondence associated
with the presentation.
[2] The research, focusing on whether data modeling is better characterized as an analysis or design activity, but also looking broadly at backgrounds,
attitudes and approaches of experienced practitioners and thought leaders, is due to conclude in 2005. I will be progressively publishing summaries of research results, with a practitioner audience
in mind, on www.simsion.com.au in late 2004 and 2005.
[3] The word “design” reveals a personal position which not all may share, and which is covered in more detail in an earlier (1st Quarter, 2005) TDAN.com
article). “Specify” is a more neutral alternative.
[4] I’m aware that not everyone is familiar with this term which at least has the advantage of a reasonably tight formal definition. If you’re one of those
people, please refer to Section 6.
[5]The survey of 55 attendees of advanced data modeling seminars, approximately half from Europe and half from North America, was carried out as part of a
research project for the University of Melbourne, still in progress.
[6] John Wiley, 2001
[7]. Results are from the same survey as referenced in Footnote 6.
[8] Simsion and Witt, Morgan Kaufmann, 2005.
[9] Accounting, for example, seeks to provide an objective description of a situation. “Data Accountant” does not have quite the ring of Data Architect or
Information Engineer, but perhaps this is only a question of familiarity.) As an aside, an accountant might well consider the development of a chart of accounts for an organization as an exercise
in design (and it is, of course, a form of data modeling…)
[10] It’s covered in Chapter 13 of Data Modeling Essentials, and in many other texts, although the clarity of explanations varies considerably.
[11] See previous reference!
[12] Published in various places, including Guidelines for Data Resource Management, 4th Edition, Data Management Association.