In conceptual (“ontological”) data modeling, relationships describe the structure of the connections between pairs of entity types. It is therefore inappropriate to use the part of speech that describes actions—that is to say, verbs—to describe them. Rather, since a model is intended to describe simply “what exists,” the structure should be represented by prepositions and prepositional phrases. What is the relationship of instances of one entity type to instances of another entity type? The only verb involved is “to be”—possibly in the form of “must be” or “may be.”
The year 2020 so far has produced three new books on data modeling. Each of these attempts to introduce a particular way of modeling that is oriented towards the businesspeople whose enterprises ostensibly are being modeled.
All three are very good at defining the nouns (the things of significance) in a domain. But where relationships among those things of significance should be describing static structure, each of these authors instead use verbs to describe, in effect, what one entity type does to the other entity type. This is the domain of process modeling, not data modeling. For this reason, these books represent a very good canvas on which to make the arguments here.
This article is Part One of Two. In this article I address the first two books:
- Joseph Danielewicz: Models, Metaphor, and Meaning: How Models Use Metaphors to Convey Meaning[1]
- Steve Hoberman: The RoseData Stone: Achieving a Common Business Language using The Business Terms Model[2]
The second article in this series will address:
- Ron Ross: Business Knowledge Blueprints Enabling Your Data to Speak the Language of the Business[3]
Buzzword Compliance
David Hay also wrote, a couple of years ago, a book on the subject of organizing data. His book, Achieving Buzzword Compliance: Data Architecture Language and Vocabulary,[4] however, takes on the industry itself. The books that are the subject of this article are examples of how data consultants are eager to help the world out there (commercial enterprises and government agencies, for example) arrive at a “common language.” But are they not just as guilty of mangling language within their industry? So, here Mr. Hay took it upon himself to sort out the words used by data architects themselves.
One answer to what Mr. Hay discovered is that, while various authors have various ideas about what a “conceptual” model is, it is simply any representation of information structure that is not concerned with the technology that might be there to manipulate it. In fact, there are (as of 2020), at least three flavors of conceptual data model in play:
- Overview – the executive view, with a relatively small number of broad concepts that encompass a large part of an enterprise.
- Semantic – a detailed exposition of language, as used in an enterprise. This includes jargon, technical (enterprise technology, that is), and departmental lore. Addressing this requires extensive work to reconcile the various points of view that exist throughout the enterprise. This can be a very large model.
- Essential – a more abstract, integrated view that uses patterns to encompass multiple departments. This should be a smaller model than the semantic one.
Another problem in industry language is that the view of data for many people is not business data at all, but the technology for managing those data.
For purposes of this evaluation, your author will contend that, for the most part, the subject authors are trying to address the need to capture an enterprise’s semantics—although there are several examples here that appear to be addressing essential industry patterns, as well.
In each case the author’s model is “conceptual”, in that it is independent of any database technology that might be used to implement it.
Ontologies
Among other things,, the models presented here appear to be playing the role of enterprise “ontologies.” The word onto-logy is a 17th century construction (“study of things”) to describe the branch of philosophy that in fact originated with Aristotle in the third century BCE, as the study of being or the essence of things that exist.[5]
In modern times, the term is used to describe any catalogue of terms describing the things that are assumed to exist:
- In a domain of interest
- With rules governing how these terms can be combined to make valid statements,
- Along with sanctioned inferences that can be made.[6]
Think of “ontology” as the world’s first three-thousand-year-old hot new buzzword.
All three of the authors described here are very good about capturing the names of things that are assumed to exist—or at least categories of those things. (The boxes these gentlemen are presenting in their diagrams represent entity types—classes of things—not entities—the things themselves.) What is noteworthy is that these categories of things are labeled with nouns. The means for arriving at definitions for nouns is well established (also by Aristotle).
All three authors, however, follow the pattern for naming relationships that was instituted by Thomas Bruce[7] and James Martin:[8] They label them with verbs.
This is unfortunate for several reasons:
First, since our objective here is to describe what exists, the only verb that applies is “to be.” These are assertions that a relationship exists.
Second, the part of speech that is a verb describes processes and functions. The definition of a thing that exists does not include what actions it may take on another thing. The part of speech that describes the structure between two classes is the prepositional phrase. Remember Sesame Street and how Grover taught you about “over,” “under” and “around”?
Now, Douglas Adams has something to say about defining relationships correctly: “The editors of The [Hitchhiker’s] Guide [to the Galaxy] got sued by the families of those who had died as a result of taking the entry on the planet Traal literally (it said ‘Ravenous Bugblatter Beasts often make a very good meal for visiting tourists’ instead of ‘Ravenous Bugblatter Beasts often make a very good meal of visiting tourists.’)”[9]
Picking the right word (even if it is a preposition) is important.
UML is a remarkable notation precisely because it does not describe data in terms of their structure. Rather it describes each class (analogous to an entity type) in terms of its “behavior.”. This behavior then describes how it acts upon other classes. That’s fine for the object-oriented world, but if we are to define the nature of what things exist and how they are related to each other, this definition should not include a description of what one may do to another.
The evaluations of each book described here in each case recommend a re-labeling of relationships according to a syntax originally developed by Richard Barker and his colleague Harry Ellis in the 1980s:
Each
<subject entity type>
must be
or
may be
<relationship name>
one and only one
or
one or more
<object entity type>.
For example, “each Project Assignment must be of one and only one Person.” (This is the first example shown in the next section below.) Here, while the structure is determined by the modeler, each of the phrases underlined here is for the business-oriented subject matter expert to evaluate as to whether it is true or not. So in this case, is it possible that you can have a Project Assignment without a Person? Can a Project Assignment be of more than one Person?
Joe Danielewicz:
Models, Metaphor, and Meaning: How Models Use Metaphors to Convey Meaning
In this book, Joe Danielewicz sets out to describe “data modeling for information systems. But we will also examine modeling in general, why we model, and the cognitive basis for finding meaning in models. Although data modeling is a specialized activity within information systems development, I propose to show how modeling of any kind is another form of a language game with its own syntax, semantics and metaphorical tropes. I will also attempt to clear up some of the confusion between data modeling and information theory.” The book covers several philosophical topics, including information theory.
For his modeling exercise, Mr. Danielewicz made use of Information Engineering (IE) notation.
Example: Projects
Figure 1 shows that “(Each) Employee works on many ProjectAssignments.” Unshown is the assertion that (Each) ProjectAssignment is assumed to have an (unnamed) relationship with one and only one Employee. Similarly, “(Each) Project is worked on by many ProjectAssignments.” Unshown is the assertion that (Each) Project Assignment is assumed to have an (unnamed) relationship with one and only one Project.
This has several problems. First of all, the relationships exist and should be named going from Project Assignment to the other entity classes. Second, the labels describe what the symbol also shows, but they do so imperfectly. To say that an Employee works on many Project Assignments eliminates the instances where ‘e may only work on one. The proper term in each case should be “one or more”.
Notice, as was described in the Introduction, there an alternative syntax is available:
Each <subject entity class> may be|must be <relationship name> one and only one|one or more <object entity class>.
This means that the following sentences could follow directly from the model shown in Figure 2:
- Each Project Assignment must be of one and only one Person, and must be to one and only one Project.
- Each Person may be subject to one or more Project Assignments, each of which must be to one and only one Project.
- Each Project may be the object of one or more Project Assignments, each of which must be of one and only one Person.
There are no missing sentences. The word “many” doesn’t have to be part of the relationship name, since it is contained in the syntax of the model itself. Even though each of the sentences above is a compound sentence, it consists of two pieces that exactly follow the syntax described above. The objective here is for the model to be reflected in natural, English language, sentences.
Moreover, while the structure was created by the modeler, the truth of each underlined phrase must be determined by a business-oriented subject-matter expert who knows nothing about modeling.
As once said by ERD Master, Michael Lynott, “the objective of presenting one of these models is not to be congratulated for your cleverness. It is to be wrong!” Only once the subject matter experts have corrected it, can you be sure that it truly represents the enterprise being addressed.
Properly done, the sentences should seem obvious. Note, however, that coming up with those “obvious” sentences can be hard. This is a linguistic exercise, not a technological one. When you come up with an intersect entity type like “Project Assignment,” for example, the relationship names that define part of it are pretty easy. The assignment is defined to be of a Person to a Project. But you may have to think about it for a while before you come up with the fact that the Person his- or herself is subject to the assignment, and the Project itself is the object of that assignment. This requires some subtlety in the modeler’s understanding of the nature of the relationships.
But that’s why not using verbs is important. Using a verb that was used casually in conversation is too easy. What does the relationship really mean? That’s harder.
Steve Hoberman:
The RoseData Stone: Achieving a Common Business Language using the Business Terms Model.
Just as the Rosetta Stone provided a communication tool across languages, Steve Hoberman’s book The RoseData Stone: Achieving a Common Business Language using the Business Terms Model, is a means for improving communications among business people. “Similar to how the Rosetta Stone provided a communication tool across languages, the RoseData Stone, called the Business Terms Model (BTM), or the Conceptual Data Model, displays a Common Business Language of terms for a particular business initiative.”
Example 1: Cookies
Mr. Hoberman also uses the Information Engineering (IE) notation, and begins by describing its syntax. This is shown in Figure 3
The only problem is that, using this approach, he doesn’t have the logic for creating sentences. To be sure, the verb can be used for one sentence: “Each Cookie must contain many Ingredients,” but a second one has to be “made up,” since it is not apparent from the diagram. In the text, Mr. Hoberman does assert that “each Ingredient may be used in baking many Cookies.”
Among other things, this has the same problem we described for Mr. Danielewicz’s example, above: By saying “many,” an instance of a Cookie may not ever have just one Ingredient. And, on the other side, an instance of an Ingredient may not just be used in just one Cookie. The proper term is “one or more.”
An alternative example is shown in Figure 4. The expression “contain” here is replaced by the concept “composition.”
Notice, as was described in the previous section, that there is a presumed syntax for the sentences created with Figure 4:
Each <subject entity class> may be|must be <relationship name> one and only one|one or more <object entity class>.
Thus, the diagram in its entirety represents the following sentences:
- Each Cookie must be composed of one or more Ingredients.
- Each Ingredient may be part of one or more Cookies.
As described for Mr. Danielewicz’s book above, the objective again is for the model to be reflected in natural, English language sentences. The underlined phrases can each clearly be evaluated as true or false by a business-oriented subject-matter expert who knows nothing about modeling.
Example 2: Teaching and Cars
In the next example (Figure 5), Mr. Hoberman has modified the IE notation to show sub-type entity type boxes inside a super-type box. This is a very reasonable thing to do. Normally in IE (and in UML), sub-type boxes are shown separately from super-type boxes, with a “sub-type” relationship line between them. That makes it harder to “see” that clearly (for example, in this case) each Teacher is a Person. Similarly, each Student is a Person.
The relationship with Car is more problematic. It asserts that each Person may drive many cars. There is no information here about what a Car may do to a Person.
An alternative is shown in Figure 6.
Again, two prepositional phrases much more clearly describe the structure represented by the diagram. Moreover, the resulting sentences are definitive:
- Each Person may be driver of one or more Cars.
- Each Car may be driven by one or more People.
Note that the sub-type structure also allows us to assert that:
- Each Teacher may be driver of one or more Cars.
- Each Car may be driven by one or more Teachers.
…and…
- Each Student may be driver of one or more Cars.
- Each Car may be driven by one or more Students.
Example 3: Accounts
Another example of unfortunate use of verbs is in Figure 7.
Again, the drawing only shows one relationship label between Customer and Account. As put by Mr. Hoberman,
- Each Customer may own many Accounts.
- Each Account may contain many Account Balances
Both missing relationships in each case are added (and the labelling changed) in Figure 8.
- Each Customer may be the owner of one or more Accounts; Each Account must be owned by one or more Customers.
- Each Account may be composed of one or more Account Balances; Each Account Balance must be part of one and only one Account.
While the prepositional phrase “owner of” is indeed constructed from a verb, the part of speech is now different.
Example 4: Materials and Material Roles
Figure 9 shows a sub-type structure that is, simply, incorrect. According to Merriam-Webster, a “role” is “a function or part performed, especially in a particular operation or process.”[10]
The problem with this drawing is that an instance of the thing Finished Material is not in fact an instance of a function that is performed called a Material Role. Nor is a Raw Material a Packaging Material. What they are examples of, however, is the thing (entity type), Material. Note, by the way, that one of the “Semi-finished” Materials (cookie dough, for example) may not only be in the role of Raw Material, but sometimes is sold as in the role of being a Finished Product.
Figure 10, however, shows what Mr. Hoberman may have meant by this:
Here, the super-type is, correctly, Material. That is each example of a Material must be either a Cookie, Flour, Cardboard Box, etc.
Each Material Role, however, is a separate entity type, that may be player in one or more Materials (such as Cookie), while each Material must be the player of one or more Material Roles. With this configuration, you are not confined to having “Cookie Dough just be a “Raw” material, but it could also be of the Material Role “Finished.” Similarly, some of the Paper Bags, with decorative logos might be sold separately as souvenirs (finished products).
The “many-to-many” relationship Between Material and Material Role raises a flag, however. What is the meaning of each instance of a Material Role’s being Related to a Material. A quantity of Cookie Dough may be a “Finished Material” for resale, or it could be an “Intermediate” in the creation of Cookies. What is required here is a representation of the “Structure” of the Materials. This is shown in Figure 11. This is a commonly used pattern to represent the fact that one (in this case) Material is connected to one or more other Materials. For example,
- Flour may be part of one or more Material Structures, each of which must be the use in one Material (Say, Cookie).
Going the other direction,
- Cookie may be composed of one or more Material Structures, each of which must be the use of one and only one Material (such as Flour).
This provides a way to discuss the roles originally introduced in Figure 14. It reveals that the concepts of “roles” introduced them are a lot trickier than originally appeared.
While a Cookie Dough might typically be considered a “raw material” (or more likely, an “intermediate”) for a kind of Cookie, Cookie Dough may be sold as a “finished product” in its own right. Similarly, while a particular kind of Cookie is usually considered a “finished product” it may also be crumbled and added to a Pie, taking on the role of “intermediate.”
Part Two of this article will address Ron Ross’ Business Knowledge Blueprints: Enabling Your Data to Speak the Language of the Business.
Bibliography
Aristotle. 323 BCE. Posterior Analytics, Book II. From Great Books Foundation, Ninth Year, Volume Two.
Armstrong Laboratory AL/HRGA. 1994. Information Integration for Concurrent Engineering. Knowledge based Systems, Inc.
Barker, Richard. 1989. Case*Methodä: Entity Relationship Modelling. Addison-Wesley.
Bruce, Thomas A. 1992. Designing Quality Databases with IDEF1X Information Models. Dorset House.
Danielewicz, Joe. 2020. Models, Metaphor, and Meaning: How Models Use Metaphors to Convey Meaning. Kindle Direct Publishing.
Hay, David C. 2018. Achieving Buzzword Compliance: Data: Data Architecture Language and Vocabulary. Technics Publications, LLC
Hoberman, Steve. 2020. The Rosedata Stone: Achieving a Common Business Language using the Business Terms Model. Technics Publications, LLC.
Martin, James. 1987. Recommended Diagramming Standards for Analysts and Programmers. Prentice Hall.
Ross, Ron. 2020. Business Knowledge Blueprints Enabling Your
Data to Speak the Language of the Business. Business Rules Solutions. LLC.
[1] Joseph Danielewicz. 2020.. Models, Metaphor, and Meaning: How Models Use Metaphors to Convey Meaning. Kindle Direct Publishing.
[2] Steve Hoberman, 2020. The Rosedata Stone: Achieving a Common Business Language using the Business Terms Model. Technics Publications, LLC.
[3] Ron Ross. 2020. Business Knowledge Blueprints Enabling Your Data to Speak the Language of the Business. Business Rules Solutions. LLC.
[4] David C. Hay. 2019. Achieving Buzzword Compliance: Data Architecture Language and Vocabulary. Technics Publications, LLC.
[5] Aristotle. 323 BCE. Posterior Analytics, Book II. From Great Books Foundation, Ninth Year, Volume Two. Page 1.
[6] Armstrong Laboratory AL/HRGA. 1994. Information Integration for Concurrent Engineering. Knowledge based Systems, Inc.
[7] Thomas A. Bruce. 1992. Designing Quality Databases with IDEF1X Information Models. Dorset House.
[8] James Martin. 1987. Recommended Diagramming Standards for Analysts and Programmers.
[9] Douglas Adams. 1982. The Restaurant at the End of the Universe. New York: Pocket Books, pp. 37–38.
[10] Merriam-Webster Online Dictionary. 2020. “Role”. Merriam Webster. [Retrieved 5/31/2020 from Merriam-Webster.com/Role.