For an industry that is supposed to be helping the world at large get its use of language sorted out, the data modeling industry has been example number one of how not to do it. All you have to do is to start referring to “logical” data models, and you will get into an argument. Yours truly has certainly made his contribution to the controversy.
Whenever I have gone into a company, I have found that there is invariably one term that describes the very most important concept in the company. As it happens, that is the word for which it is impossible to find a single definition. For example, for the Alberta Ministry of Transportation, the word was “road.” They are in the business of building and maintaining roads. The problem is that a road is either a line, describing a path from one place (like the airport) to another (like the hotel); an area, if they are planning rights of way; or a solid, if they are engineering its physical construction.
So, the word “road” could not show up anywhere in my data model for them.
For us in the data modeling world, there are three words in this category: “conceptual model,” “logical model,” and “physical model.”
Danette McGilvray and I laid out a table of the kinds of models that exist and the way they are described by various players in the industry. This is reproduced here as Table 1.
The table shows four kinds of “data” models, and three sets of names associated with each. The three “approaches” are:
- Terms as used by Graeme Simsion and Graham Witt,1 and David Hay2
- Terms as used by Steve Hoberman3
- Terms as used by Donald Chapin4
Level | Description | First Approach Term | Second Approach Term | Third Approach Term |
1 | This model is a sketch that includes primary entities. There are many-to-many relationships and virtually no attributes. | Context | Conceptual or subject area | Environment |
2 | This model describes the semantics of the business. The semantic data model consists of boxes representing sets of things of significance to the business (“entity classes”), such as “Person” or “Activity,” and lines to describe relationships between pairs of such entity classes.
This contains a more complete list of entity classes (often in a more abstract form). Nearly all attributes are shown, and many-to-many relationships may be resolved. This model contains basic and critical concepts for a given scope and is used to communicate between a data modeler and business stakeholders. It specifies data that might be held in a database, independent of the technology and the actual physical implementation that might be used. It includes both diagrams and supporting documentation. |
Conceptual |
Logical | Class of Platform (Technology) Independent Model |
3 | This model arranges data in terms to be used by a particular data management technology, to accommodate technical constraints and expected usage. Examples might be in terms of relational tables and columns, object-oriented classes, or XML tags. Structures from model level 2 can be implemented using database management systems, object-oriented programs, or XML schemas. But they are independent of specific database software (Oracle, DB2, etc.) or reporting tools. | Logical | Physical | Class of Platform (Technology) Specific Model |
4 | This model organizes the data on one or more physical media. It is concerned with physical table spaces, disk drives, partitions, and so forth. This includes changes made to logical structures to achieve performance goals. It is embedded in a particular vendor’s database approach. | Physical |
— | Vendor Platform Specific Model |
Essentially, the differences seem to be between (1) “overview models” (2) “models of the business,” (3) “models of a DBMS-specific data structure,” and (3) “models of the underlying physical database structure.” These seem to be reasonable categories. The problem is whether the term ”conceptual” refers to (1) or (2), does “logical” refer to (2) or (3), and does “physical” refer to (3) or (4).
The problem originated with the original “Three Schema Architecture,” laid out by The American National Standards Institute (ANSI) in 1975.6 This saw the world of data described from three perspectives. First of all, every individual makes sense out of the world in terms of structures assembled in his own head. This view of reality is called in ANSI’s view, the “external schema,” Various people will look at the same chunk of reality and form different internal schemata. There is (according to ANSI, at least) an underlying reality that is the source of all of these internal views. This underlying reality could be represented by a “conceptual schema.” This is the schema from which the others are derived. With this, different schemata can be derived to use in support of various data technologies. Each of these is an “internal schema.” The “internal schema” can then be subdivided into the “logical schema,” which represents the conceptual structure translated into a particular data storage approach. (In 1975, the principal candidates were “hierarchical” and “network”), and the “physical schema” that deals with the actual physical storage media.
In 1987, John Zachman organized the world in a different way. He addressed other things besides data, but for our purposes here, we’ll discuss the “Data” column of his “Framework for Information (now “enterprise’) architecture.”7 As he saw the world in 1987, these were the perspectives of interest:
- Ballpark View (scope/description) – a list of the things that are important to the business, and therefore, that the business manages.
- Owner’s View (model of the business) – a view of the real things in the business—“business entities.”
- Designer’s View (model of an information system) – from the designer’s perspective, an “entity” is the record on a machine describing the real world thing.
- Builder’s View (technology model) – a view of the physical implementation or data design for the conceptual model of the information system.
- Out-of-Context View (Detailed description) – for example, the data definition language for a relational database.
- Actual System
In the years following 1987, I and others became active in producing business-oriented data models that tried to find ANSI’s “conceptual” schema. We attempted to model the business in a single, more coherent view rather than simply reproduce the different (and often conflicting) owners’ views. In Mr. Zachman’s terms, this was somewhere between the owner’s perspective and the design for an information system. Various of us tried to use entity/relationship models to describe Owner’s views literally, but as a simple repetition of what we were told, they were not very illuminating.
More interesting—even though they were relatively abstract—were what I and my colleagues called “conceptual” models. Contrary to what many people expected, these were in fact presentable and understandable to business management. Moreover, they were also instructive to them.
For example:
In the data models I and my colleagues produced, about 10% of the content was what we were told. The other 90% were the logical implications of what we were told. This turned out to be very illuminating to the management group as well as to the modelers.
The result was a model that truly represented the enterprise, but in terms much more fundamental than any revealed in interviews with the people immersed in the daily grind of carrying out the business.
So, when I wrote my book, Requirements Analysis: From Business Views to Architecture, I took the liberty of squeezing another perspective into the framework. Between the business owner’s view and the designer’s view, I inserted:
This moved the Designer’s View to row 4, which seemed appropriate, since the bottom three rows should have been about technology, while the top three rows should have been be about the enterprise.
The Builder’s and the Out-of-Context View got collapsed into a single “Builder’s View.” This was not unreasonable, since the distinction between the two in Mr. Zachman’s description seemed less than clear, to me at least.
I got heat from some of Mr. Zachman’s followers (although not from him) for tinkering with the sacred Zachman Framework. Much to my pleasant surprise, however, a few years later, he and Stan Locke showed me their updated version of the framework. In this one, the views were these:
- Strategists (Scope)
- Executive Leaders (Business)
- Architects (Systems)
- Engineers (Technology)
- Technicians (Components)
- Workers (Operations)
Since by “system” Messrs. Zachman and Locke meant “a regularly interacting or interdependent group of items forming a unified whole”8—which is to say, for example, an enterprise—then this organization for the Framework is exactly what I had in mind. The architect is charged with defining a coherent view of the enterprise as a whole system.
Meanwhile, there kept being an issue of what exactly it meant to model the business owner’s view. The topic of semantics kept coming up. The problem is that people in different departments used different language. How can we systematically address the language used by business owners to describe their work? This was not a trivial task, since different people in different departments often described the same thing in different terms and used the same terms to describe very different things.
Out of this came the effort by the Business Rules Group that eventually was taken over by the Object Management Group to describe “The Business Vocabulary and Business Rules.” This was published in 2008.9
Here, for the first time, you had a comprehensive approach to describing an enterprise’s rules in a consistent form—and by virtue of that, you had the ability to describe the enterprise’s structure (as seen by the business owners) in a consistent way. The result of this is linguistic, not graphical, but it was a major step toward capturing the semantics of the enterprise.
Meanwhile, other people, completely outside the world of business and databases, who had been working in the area of linguistics for, say, several millennia, were inspired by the advent of the world-wide web to bring their knowledge of semantics and ontology into that world. It turned out that those architectural data models my colleagues and I were building were examples of something called an ontology.10 This is originally a word describing the branch of ancient Greek philosophy concerned with finding and describing “what exists.” (This is a lot trickier than you might suppose. Look at our efforts in this regard. Philosophers have been at it for a very long time. There are still some questions about that today, but that is beyond the scope of this discussion.) In modern terms, an ontology is a collection of terms to describe what exists in an enterprise. They must be meaningful in a particular context, with well-defined relationships and a means for drawing inferences from them. Out of conversations that included Sir Tim Berners-Lee and the World Wide Web Consortium (W3C) came the Semantic Web, a way of linking data (not just pages) over the world-wide web. From that, came the semantic languages RDF and OWL.
Without going into detail, the Resource Description Network (RDF) is a way of describing the world in terms of simple sentences. The Web Ontology Language (OWL—don’t ask about the acronym) builds on RDF to create a much more powerful, structured, language for describing the world. With these two languages, it was now possible to describe an enterprise with the language provided by its workers—and then use software called “inference engines” to identify discontinuities and conflicts of terminology. An architectural model can be directly mapped into these two languages, by the way.11
In short, the issue of how to deal with the semantics of an organization (Row 2 of Mr. Zachman’s Framework) is finally being addressed.
So, where does this leave our original problem with conceptual and logical models? I hereby modify my original organization described above. Harking back to the original ANSI ideas about the “External,” “Conceptual” and “Internal” schema, as updated by the upgraded Zachman Framework, I propose the following definitions:
- Conceptual Model – Any model that describes the business. It may be one of the following:
- Strategic Model – This may be a model of basic terms, linked with many-to-many relationships, if desired, but focusing on establishing basic categories.
- Business Owner’s Model – This is about the semantics of the organization. If appropriate, entity/relationship models can be developed, but more useful is an SBVR analysis, and OWL descriptions. This is the “Semantic Model” (the “external schema” in the original ANSI view).
- Architect’s Model – This is an entity/relationship model of fundamental entity classes, encompassing as much of the enterprise coherently as possible. This is the “Architect’s Model” (the “conceptual schema” in the original ANSI view).
- Technology Model – Any model that reflects the technological environment being addressed. It may be one of the following:
- Designer’s Model – In the data world, this is the model that accommodates the technology being used for data management. It may be in terms of tables and columns, object-oriented classes, dimensions, XML tags, or whatever. This is the “Designer’s Model” (the “logical” part of the “internal schema” in the original ANSI view).
- Builder’s Model – This is the configuration of physical databases, tablespaces, or even cylinders and tracks of the physical database. The builder is the one who spreads the “People” table over three continents (the “physical” part of the “internal schema” in the original ANSI view).
- The working system
Figure 1 shows how the middle three rows of the Framework play out in the data column for these categories of models. This should be a little clear than “conceptual,” “logical,” and “physical.”
References:
- Graeme Simsion and Graham Witt. 2005. Data Modeling Essentials, Third Edition (Boston: Morgan Kaufmann). Page 17.
- David Hay. 2003. “What Exactly is a Data Model?” DM Review (Vol 13, No. 2). 2003.
- Steve Hoberman. 2005. Data Modeling Made Simple, 2nd Edition. (Bradley Beach, NJ: Technics Publications). Page 51.
- Donald Chapin. 2008. “MDA Foundational Model Applied to Both the Organization and Business Application Software.” Object Management Group Working Paper. March 2008.
- David Hay and Danette McGilvray, Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information. (Boston: Morgan Kaufmann). Pages 48-49.
- American National Standards Institute (ANSI). 1975. “ANSI/X3/SPARC Study Group on Data Base Management Systems; Interim Report”. FDT(Bulletin of ACM SIGMOD) 7:2
- Zachman, John. 1987. “A framework for information systems architecture”, IBM Systems Journal, Vol. 26, No. 3. (IBM Publication G321-5298)
- Merriam-Webster. 2010. Merriam-Webster OnLine Dictionary. Retrieved 9/14/2010 from http://www.merriam-webster.com/dictionary/system
- Object Management Group. 2008. “The Semantics of Business Vocabulary and Business Rules” (SBVR). OMG Available Specification formal/2008-01-02.
- For a description of the relationship between data modeling and semantics, see David C. Hay. 2006 “Data Modeling and OWL, Parts one-three,” The Data Administration Newsletter. Available at http://www.tdan.com/view-articles/5025/, http://www.tdan.com/view-articles/5001, http://www.tdan.com/view-articles/4594.
- For a description of how to convert a data model into OWL, see David Hay. 2008. “Semantics, Ontologies, and Data Modeling.” Cutter Report on Business Intelligence, Vol 6, No. 7.