Thought Leaders on Data Modeling, Part 2

This article presents the second excerpt from Chapter 6 of my new book Data Modeling Theory and Practice (Technics Publications, 2007). The first excerpt appeared in the April 2007 issue of The Data Administration Newsletter. The book as a whole looks at the nature of the data modeling
task through review of the literature, interviews, surveys, and “laboratory” experiments. This chapter is titled What the Thought Leaders Think and reports the results of
interviews with seventeen influential people in the data management field, most of them specialists in data modeling, who generously responded to my request to videotape them for my research. The
interviews were conducted in 2002 and, therefore, reflect my understanding of the interviewees’ opinions and positions at that time.

In this excerpt, we continue the discussion of a central question about data modeling: Is it a descriptive or a creative (design1) process? In the first excerpt, we looked at the data
modeling environment, and data modeling problems. Here we cover two further dimensions: process and product, including the “one right answer” issue.

Process

We should de-skill the modeling, we should automate the design, and we should become full-time analysts and concentrate on understanding what the business needs. – Harry Ellis

Most interviewees talked in terms of a core three-stage process of requirements identification/negotiation, logical data modeling, and physical database design, and indicated that as data modelers
they performed or were involved in at least the first two stages. The term conceptual data modeling was used by some as a synonym for logical data modeling and by others to reflect a
preliminary “high level” stage.

I initiated discussion on the transition from data model to conceptual schema and on the role of creativity. No other common themes emerged, but one comment is worth noting as relating to one of
Lawson’s Properties of design:

“You reach the point of diminishing returns – it will never be perfect.” (Lawson’s Property: The design process is endless).

From Data Model to Conceptual Schema

You can push a button and from that generate your E-R model, logical models, your DDL. –Terry Halpin

The transition from data model to pre-performance-tuning conceptual schema was generally presented as mechanical: (the database design is) “automatic if you have a good model of the
business.”

Harry Ellis offered a perspective on the ready “translatability” of data models that is consistent with observations from some teachers of data modeling (de Carteret and Vidgen 1995 p
xi; Simsion and Witt 2005 p33) and with Atkins’ (1996) observation that if the target is a relational database, the most useful categories in a conceptual model will be seen to be normalized,
candidate relations:

“I was actually visualizing a working data structure. People ask ‘What is an entity?’ I came to realize that … the bottom line was an entity is something you could
have a table for. A bit of a con; deep down in my subconscious I was creating an effective data structure, although openly, publicly, I was actually modeling the business. I wasn’t really.
I thought I was, and everybody else thought I was, and it worked. People who had no notion of what a good data structure was couldn’t do it – they would use the notation beautifully,
but it was completely useless – they’d have an entity for something you couldn’t have a table for.”

Tuning for performance was seen as involving important choices and even an “artistic component” (this comment from an interviewee who was a firm supporter of the descriptive position on
modeling).

Creativity

It’s a creative activity when I do the model, a documentation activity when I write the definitions (laughs). – Len Silverston

Presentation has much more creative content than coming up with the categories. – David Hay

Views on creativity were in line with positions on the description/design question. Those who supported the design characterization assigned it a central role. Several recounted personal
“eureka” moments from their data modeling experience: “Sometimes you can just get an insight and say ‘Hey, gee, if I do it this way, it’s going to work
better.’” Some supporters of the description characterization recognized a place for creativity; but in peripheral areas such as the layout and presentation of the model or the approach
to understanding the business: “There’s nothing creative about it from the standpoint of ‘I make it up’; it has to be a discovery of what the business must know in order to
operate effectively.”

Adjectives were used liberally by some in the design school: “Managed properly, it is a highly creative activity – or should be”; “Anything we do when we are
developing information systems is an extremely creative process”; I prided myself on my …great creativity.”

Karen Lopez voiced a concern that creativity should not be interpreted as license: “I’d say it’s a creative endeavor but constrained by what I believe are standards of practice
about what’s right, what’s appropriate, what’s practical…there is creativity but I don’t believe that data modeling is an art.”

Some saw creativity as intrinsic to “creating” the objects in the model: “The people you interview don’t dictate the entities you have; perhaps 10-15% of entities are
obvious and everyone agrees with them, but (beyond that) the actual choice of entities requires a lot of imagination and creativity.” Creativity was not seen as being the sole preserve of the
professional modeler. Ron Ross observed that bringing people together with different points of view to obtain a consensus often results in new ideas that might not have been anticipated at the
outset and Alec Sharp prefers to “set up situations where the business people can be the ones who are creative.” Conversely “IT people” (a group that most of the
interviewees clearly did not see themselves as belonging to) were seen as impediments not only to creativity but to data modeling itself.

Product

Three key themes emerged in the product dimension: product diversity, the role of patterns, and product quality. Product diversity and quality drew a range of conflicting views, broadly
– but not entirely – in alignment with positions on the description/design question. In contrast, there was general agreement on the value of patterns2, if not their exact
role. Some interviewees also commented on data model quality. We look at each of these themes in turn.

Diversity of Product – One Right Answer?

If we are both experts, we should come up with the same solution. – Eskil Swende

Every little bit of the (CBML) model down to the most minute detail is a precise expression of something that is either right or wrong. And that’s not a matter of choice. –
Harry Ellis

Given the same set of business rules, two very very good modelers will come up with completely different models. – Terry Moriarty

Why should all houses be the same, why should all cars be the same, why are there competitive businesses around? – Richard Barker

The “one right answer” question – Will different (competent) data modelers, faced with the same set of business requirements produce different data models? – generated
strongly conflicting responses, but not exactly in line in with positions on the description/design question. Some interviewees who believed that requirements were negotiable were less sure that
the models would vary once requirements were settled.

Several interviewees in fact contended that there should, and would, be a single right model, allowing for variation only in notation and (perhaps) the naming of objects: “They should come up
with the same answer which is the correct description of the business; we may have different notations but the underlying names and definitions should be identical; the relationships should be
identical” or as another put it “not the same words but the same number of entities”. The situation was seen as different from process modeling because (a) data was intrinsically
stable and/or (b) data modeling was mathematically based.

There should be, said one interviewee, borrowing a common statement about data and applying it to data structure, “only one version of the truth about the data that the business needs.”
Asked “Are you chasing truth rather than coming up with something?”, David Hay responded “Absolutely, these are inherent categories: I’m very platonic in that regard.”

The primacy of the business view of data was a common theme here. “I try to start hearing the categories they have in their heads” said one practicing modeler. Differences could and
should be resolved by “going back to the business.” These included differences in naming and in levels of abstraction.

A second group contended that different modelers would produce different models, but attributed the differences to perception. “It’s all in the eye of the beholder” said Bob
Seiner. Richard Barker, citing his own definition of an entity as “something of business significance,” said “We will see different significances.”

Those who saw differences in models as a natural consequence of the design paradigm spoke in terms of utility rather than truth: “Models are more or less useful – there is no absolute
right or wrong.” Ron Ross used an example of a model in the natural sciences: “Benzene is modeled as having a circular structure and that is a useful description whether or not that
circle truly exists in nature… let’s assume it and move on. But if we assume it and it’s not the best description, we’re going to have some limitations downstream.”

Two interviewees with strong process modeling experience offered comparisons. Karen Lopez saw more variation in abstracted process models, but observed that they came together at the primitive
level. Conversely (workable) data models “start out being wildly different and stay wildly different.”

A few interviewees proffered the basic theoretical position of choice in classification (“when you’re trying to conquer and divide the world conceptually… I don’t think
there can be a single correct answer”), but most who argued against the “one right answer” position drew primarily on personal experience. Instructors noted that students produced
different workable models in response to case study scenarios. The difficulties of integrating data within and across organizations (the case of mergers was cited) was also seen as evidence that
different workable models could be implemented for the same data.

The trade-off between level of generalization and enforcement of business rules was a central theme amongst those who believed in choice, and was seen as an area for expert decision making. It was
also seen as a key source of difference as different modelers made different decisions. It was apparent that there were disagreements between groups that might be described as the
literalists (concepts should be modeled as per common business use); the moderate abstractors (introducing some generalizations within traditional applications) and the rule
removers (deliberately removing business rules for representation elsewhere).

Ron Ross, as an advocate of the business rules approach, clearly subscribed to the last group, and argued that stability was not innate, but required deliberate design:

“I had to unlearn a lot of data modeling practice and experience… when it comes to using a full rule-based approach in addition to a data approach, because the central
opportunity you have is that you can generalize to an extent that is reasonable and productive (emphasizing those words, because you don’t want to generalize beyond that), and then let
rules handle current business practices within that more generalized database structure. That gives you a much higher degree of flexibility than has been possible in the past because rules,
however implemented, are going to be far easier to change than the underlying data or knowledge structure.”

One interviewee attributed her view to learning from a prominent authority who believed that there was “an absolute statement of truth” and then finding her experiences inconsistent
with that belief. Others simply discovered alternative models that they could not refute: “I can think of lots of cases where I’ve built models and someone else has come up with a
different model and I look at it and say ‘Hey, that would work.’” Len Silverston described the result of having another expert review his models:

“A lot of lunches he’d say ‘Len, I feel so bad because you’ve spent all this time, done all this research … and now I’m giving you a completely different
idea.’”

Some modelers found alternative and better solutions themselves, without feeling that they could discard the earlier solutions as wrong: “Every time I look back, I think, I wouldn’t do
it like that again.” Nevertheless, few modelers said that they deliberately generated alternatives.

The possibility of different models arising from subtly different names or definitions of data model objects was raised by two interviewees. Terry Moriarty said, “When you allow your data
analyst [a different person from the data modeler] to write definitions for the terms that are being used in the data model, and use those definitions as a way of validation, you know that
there’s more than one way of doing the same thing.” Len Silverston provided a detailed example of changing the name of Order Line to Order Item, prompting a change to
the way order adjustments were represented: “The name change showed me a lot of semantics.”

Finally, Steve Hoberman offered some experience beyond the personal. Through his website and e-mail list, he convenes a series of “data modeling challenges,” each presenting a
problem to which modelers are invited to submit solutions. He observed that he receives a variety of diverse solutions to each problem. Some are unsound or unworkable, a large group comprises
variants of the solution that he envisaged in setting the problem, and a few are “out of left field” innovative solutions. He noted that there were some modelers who regularly
contributed such innovative solutions.

Patterns

If you look at my model, Len Silverston’s, David Hay’s, the IAA IBM insurance industry model – all of these are very good workable solutions to the business problem –
we’ve just had different biases…I just think mine’s better. (laughs) – Terry Moriarty

Interviewees from both the descriptive and design camps considered patterns as important input to data modeling. Predictably, the former group saw patterns as representing general
“truth” (“there’s a certain basic set of generic entity classes”).

David Hay, author of two books of data modeling patterns (Hay 1996a; 2006), said:

If you’re going after things that are of fundamental importance to the business, you’ll come up with things that are common across all businesses – people, organizations,
products, contracts are pretty standard in the world of commerce. If you use these as the basis of your organizing, you’ll come up with a model which is concrete enough that people will
recognize and understand it, but robust enough…”

He added that sometimes his clients asked that he start with a “blank slate” rather than use patterns, but “lo and behold – it ends up being close to the pattern after
all.”

Terry Moriarty agreed that “there are things that businesses have to do and have to know just to be in business”, but argued that the patterns were design products, departing from the
familiar architect metaphor to liken the pattern developer to a musical composer. She also pointed out that commonality across business did not equate to obviousness: “Some of these things
are very obscure.”

Len Silverston, also the author of two books of patterns (Silverston, Inmon et al. 1997; Silverston 2001), described his “universal models” as tools to support exploration
(“patterns and alternatives that you might well not have thought of”), and described deliberately generating multiple models: “In my course, I give five different variations
– any one could be used in any situation and I point out the pros and cons.”

Product Quality

Larry English nominated three critical characteristics of a quality data model: (1) stability (no destructive change when adding new applications); (2) flexibility (databases able
to adapt with minimal destructive change in the face of business change); and (3) reuse (the models and resulting databases are reused with only additive change as opposed to destructive
change. A few examples of poor models (“groaners”) were cited: “Person was a subtype of Agreement.”

Several interviewees criticized the quality of models that they encountered in industry and the quality of work done by some professional modelers. But they generally acknowledged that assessment
was subjective. Len Silverston said: “Recently I asked a very confident data modeling consultant about choosing between two and he said ‘I’d lean towards the way you did it
recently by about a 55% margin.’” Describing why he preferred one model over another, he said, “I thought that there was a sense of truth to the latter that just felt
better.” Those from the descriptive school were more likely to see the assessment decisions as simpler: “With experience these are ‘binary’ choices – one or other
configuration – nothing terribly complicated. I don’t find it challenging.”

References:

The term design is used here in its plain English sense rather than as a stage in the applications development life cycle.
Use of patterns might seem better placed under process; in Lawson’s framework they are discussed under the Product property design solutions are a contribution to
knowledge.
www.stevehoberman.com

MenuMenu

Thought Leaders on Data Modeling, Part 2

Graeme Simsion

MenuMenu

Share this post

Graeme Simsion