This article presents an excerpt from Chapter 6 of my new book, Data Modeling Theory and Practice (Technics Publications). The book as a whole looks at the nature of the data modeling
task, through review of the literature, interviews, surveys, and “laboratory” experiments. This chapter is titled, What the Thought Leaders Think and reports the results of interviews
with seventeen influential people in the data management field, most of them specialists in data modeling, who generously responded to my request to videotape them for my research. The interviews
were conducted in 2002 and, therefore, reflect my understanding of the interviewees’ opinions and positions at that time.
In this excerpt, we skip the introduction of the interviewees (most of whose names will be known to TDAN.com readers) and the discussion of the protocol and cut directly to a central question about
data modeling – is a descriptive or a creative (design [1]) process? We then look at relevant aspects of the data modeling
environment and data modeling problems. In a later excerpt, we will cover three further dimensions: process, products and people.
a Description vs. Design – Explicit Positions
I’m sure I could find people who fit both those ends of that spectrum very well and are adamant that they’re right about it.
– Karen Lopez
Opinions on the description / design question, presented directly, covered the full spectrum. Table 1 summarizes my overall classification of the interviewees’ positions, based both on their
direct responses and the wider-ranging discussions that followed.
The assessment shown in Table 1 is included primarily to illustrate that each characterization had support from several interviewees and that the diversity of views reported in this chapter was not
a product of including one or two “outlier” individuals. The “position depends on language” row in the table reflects the view of Harry Ellis who argued that data modeling with
entity-relationship-based approaches should be strongly characterized as design, but that his current work using the CBML language [2] is properly characterized as highly descriptive. As such, his comments provided qualified support for both positions.
Interviewees clearly understood this question and their views were largely articulate and unambiguous:
It bears re-emphasizing that the context of the question was development of a new database rather than (for example) enterprise modeling, creating reference data for metadata repository mappings,
data warehouse design [3] or reverse engineering. Several interviewees were concerned that their views be reported only in
that context.
Proponents of the descriptive characterization as well as those of the design characterization generally contended that the resulting model could be translated relatively mechanically into a
conceptual schema (logical database design was the common term), or at least a default schema prior to performance tuning. They did not see a descriptive model as merely an input to a design stage.
“There is no translation: we model the tables,” said one advocate of the descriptive view.
Key themes in the descriptive characterization were:
- The view that the system should mirror the real world: “Our goal is to minimize the gap between the real world and the representation of the real world.”
- A focus on the business rather than the database: “data modeling is an attempt to describe the structure of the organization as exhibited by the structure of its data – that is, the entities
are things of significance to the business.” - The use of the term requirements (in contrast to solution) to characterize the data model: “(A) data model is one method of doing a specific articulation of a user requirement; I’ve always
considered data models as descriptive of certain types of requirements” - Discovery rather than mere documentation. The business may not understand its own data or may need help to describe it properly: “there is a set of information requirements that may or may not
be known by knowledge workers or management”; “the business may have bad terminology or missing entities.” - A requirement for a high level of skill and even creativity: “It is creative, but it’s creative in the scientific sense of discovering things, of figuring things out…”
The concept of a single objective reality was often at least implicit. One interviewee discussed a situation in which he had, on reflection, changed the way that the product concept was represented
in a data model. Was this a case of coming up with a better design? “No,” he said, “we finally figured out the product for this particular organization.” Finally? “I have no claims of
infallibility. It’s perfectly reasonable that someone would be smarter … and I’d look at him and say ‘yeah you’re right…'”
The practical impact of the descriptive position is illustrated by one interviewee’s account of his testimony as an expert witness in an intellectual property case. He argued that “the data model
is a description of the problem and therefore by definition one data model will look pretty similar to others. It’s not a patentable or copyrightable thing… My model is my best description of my
understanding of the nature of things and therefore I can’t patent that because it’s reality.”
Proponents of the design position offered less elaboration at this point. Three key themes that they raised – negotiability of requirements, diversity of product and the role of creativity – are
discussed later.
The “design” group saw data models as solutions. John Zachman used the Zachman Framework (Zachman 1987; Sowa and Zachman 1992) to distinguish the descriptive “business owner’s view” (Row 2)
from the logical data model of Row 3. In this formulation, the logical data model [4] is a solution to the business
owner’s well-articulated problem: “Once you’ve defined what the things are you’re trying to manage… someone has to invent the filing system… ” John Zachman was among several of the design
proponents who drew an analogy with architecture.
Two interviewees’ responses highlighted the role of the modeling language. Terry Halpin, noting that he was “biased” from using ORM [5], saw data modeling as both description and design. On the one hand, ORM supports a descriptive approach to requirements – “In ORM you verbalize the data requirements and
that verbalization itself is the model – essentially” – on the other, there are numerous opportunities to modify the default conceptual schema that results. Harry Ellis echoed some of the earlier
proponents of the ORM method when he argued that “if the language was adequately rich, what the domain expert is saying could be precisely and accurately written down … in such a way that the
technical applications would be absolutely definitive.”
Environment
Sometimes “it’s an art” is used as an excuse for not following generally accepted practices or internal standards.
– Karen Lopez
Describing only in terms of data misses the big picture.
– Ron Ross
Two themes fell under the heading of Environment – the context in which data modeling takes place. The first was the impact of an enterprise model on data modeling at the application level. Here
the choice and creativity associated with the design characterization were seen as impediments to the consistency needed to support data integration:
becomes problematic – and generally it isn’t.”
By establishing standards for data representation, an enterprise data model or architecture should (and would) render data modeling a more descriptive process. An enterprise model could become a
surrogate for the business as an “absolute statement of truth.” As one interviewee put it, if you still insist on being a soloist, you’ll be asked to leave the orchestra. Enterprise models were
also seen as valuable for encouraging a broader view at the application level: “if you see only one line of business your model is going to be very different than when you’re looking at it across
the entire enterprise.” All but one of the interviewees who raised the role of an enterprise model as a vehicle for enforcing conformity was speaking from the position of enforcer rather than
conformer, and the scenario was seen more as a goal than a current reality.
The second Environment theme was the need to see data modeling as only one technique amongst many, particularly in the context of understanding or negotiating business requirements. Several
interviewees pointed out the danger of over-reliance on data models as a means of understanding the business and its requirements. Other techniques nominated included process and workflow modeling,
Critical Success Factor and Key Performance Indicator analysis, Use Cases, and business objectives. These were seen as adjuncts or (often) precursors to data modeling. In the context of the
description / design question, they suggested a separate “requirements elicitation” stage rather than direct, descriptive mapping of business concepts onto a model.
Problem
The negotiability of requirements
Data modeling is all about helping a business come up with a better way of doing business
– Alec Sharp
Data modelers should not resolve business problems
– Michael Brackett
If the description position was more comprehensively argued when the description / design question was presented directly, the balance was restored in the discussion of the negotiability of
requirements. Consistently, the proponents of the design position argued not only that business requirements were negotiable, but that data modelers should be active in exposing new ways of doing
business. The preferred method was to inform business stakeholders of the (negative) consequences of existing perspectives (to “expose the business to itself”), using the data models to
facilitate discussion. Business rules may reflect the limitations of past technologies or systems, and need to be challenged rather than blindly accepted. Peter Aiken stated that “if the users
aren’t by a third to a half way through the session jumping up to the board and saying ‘this is wrong’ then … I don’t consider the modeling session a success.” Alec Sharp expressed the view
even more strongly: “it’s criminal not to do something to help people see the consequences of having chosen a particular reality.”
Implicit (or indeed explicit in some cases) is the view that the business does not know what it wants – or at least what is best for it. Modelers were seen as being able to make suggestions (“Have
you thought about doing it this way?”) and to bring in their own general or industry-specific business knowledge to provide new perspectives. One interviewee cited a case in which the business had
specified some 500 attributes to be included in a database for reporting; after the data modeler reviewed how they would be used in practice, the number was reduced to 150.
More broadly, Alec Sharp talked of the “myth of requirements,” arguing that the view that the business knows them already and that the job of the analyst is to extract them is “patently false”
and a legacy of the early days of computerized systems. Data modelers who buy into the myth “may end up with a better data model but not with a better business.” Architecture was invoked as a
very close analogy (“don’t tell me how to do it, tell me what you need to do”) including recognition of the right of the client to say “thanks for your idea but no thanks.”
Some of the aims and claims for business change were ambitious, even grandiose: “a skilled data modeler might help a business transform itself”; “the client said ‘this has been a
revelation'”; “the real benefit of data modeling … is in synthesis of new ways of looking at things”; “I can help companies see themselves differently”; “help them define where they want
their business to go – and model that”; “(changing) the management practices of the organization itself.” Alec Sharp, addressing the apparent “scope creep” in the definition of data modeling
commented: “I’m perfectly happy to do this from my role as data modeler, because I don’t choose to limit that role…”
Richard Barker offered an opposing view: “I used to play around in that area but until I became a main board director of a company and learned the essence of running a business, I didn’t really
understand. That was a massive change.” Interviewees from the description camp also supported the primacy of the business in determining its data model: “What we’re modeling is what the domain
expert says is right. You have to presume that the domain expert knows exactly the way the business is or wants to be – every little bit of it. The modeler is only articulating that…”
On the subject of challenging business requirements, one interviewee simply stated, “I never have that conversation.” Another said that the business’s view of its data should be questioned only
in rare cases. And the final decision definitely lay with the business: “when there is a discrepancy it is the business which answers the discrepancy, not the data modeler”. This extended to
changing data names in the interests of precision: “Right up front you put your own spin on the business rather than letting the business have its say.”
[1] The term design is used here in its plain English sense rather than as a stage in the applications development lifecycle.
[2] Corporate Business Modelling Language (Department of Defence (UK) 2005; Ellis and Nell 2005).
[3] The issue with data warehouse design was not the use of different modeling languages (viz star schemas) but the constraining effect of accommodating
existing (legacy) data structures.
[4] Most interviewees used the term “logical data model” to denote what in academic work would generally be called a conceptual data model.
[5] Object Role Modeling – (Halpin 2001)