Recently, Steve Hoberman published an article in TDAN.COM, “Three Situations that Weaken Data Model Precision”1 (April 1, 2010), that stressed three flaws common in data models: inadequate definitions, using dummy values when a value is required, and using vague labels for relationships (or leaving them out altogether).
While he is absolutely correct in criticizing those who leave out relationship names, he makes one assertion about those names that troubles me – even though it expresses an opinion common in the data modeling world: “A very important part of [a proper] sentence is the verbs.” In this, he is supported by no less than Graeme Simsion and Graham Witt, along with Ron Ross, among others.
I, on the other hand, wish to contest that assertion.
Interestingly enough, while the data modeling community sees relationship end names as verbs, the object-oriented community sees “association” ends (“roles”) as nouns. That is, they portray a role name as a label for a target entity class. As a label, it is a noun.
I contest this as well.
A relationship is not a verb. It’s not a noun. It’s a preposition.
PremisesFirst, please understand that this conversation is about conceptual data models. It is conceptual models that describe classes of things in the world and the relationships between them. Logical models are more oriented to the database technology, so relationship names are not as semantically significant. The relationships in physical models are concerned with foreign keys and other mechanisms. With that in mind, let’s understand the nature of a conceptual data model. It has several characteristics that distinguish it from a logical model or a physical database design. These characteristics also distinguish it from a process model:
- It is about classes of things that are significant to a business or government agency (called here “entity classes”2), as well as the structure inherent in the relationships of these things to each other.
- It is a description of the semantics of the organization. That is, it describes the language used by a company or agency.
- It is not about what these classes do to each other. It does not represent actions, processes, or functions.
- Thus, it is about nouns (the entity classes) and prepositions (the relationships).
In other words, it describes what exists in the organization, and the (static) relationships among those things.
Note that a conceptual data model is a kind of ontology – a word from Greek philosophy that describes “the branch of metaphysics concerned with identifying, in the most general terms, the kinds of things that actually exist.”3 (I love using a hot new buzzword – especially when it’s 2500 years old!)
You can think of Aristotle as the father of data modeling.
In modern times, the word “ontology” means “a catalog of the types of things that are assumed to exist:
- …in a domain of interest
- …with rules governing how those terms can be combined to make valid statements
- …and ‘sanctioned inferences’.”4
In a model of what exists, then, the only verb of interest is “to be.” And the nouns involved describe classes of things, not relationships among them.
Two Common ViewsThere are two approaches commonly taken to naming relationships: one by the data modeling community, and one by the object-oriented community. They are different.
Data modelers
Mr. Hoberman’s books and course materials describe use of “verb phrases” to describe relationships. For example, in the article, he cites “A Customer can place one or many Orders.” In his concern for precision in a data model, he correctly points out that the verb must have a meaningful content, so things like “has” and “associated with” are not useful.5
Graeme Simsion and Graham Witt also use verbs in their relationships. For example, “Each Customer may make one, or more Purchases,” and “each Purchase must be made by exactly one Customer.”6
Ron Ross has often said that he is not producing “data models,” but rather “fact models.” For example, he cites as a list of facts:
- Party owns Property
- Party leases Property
- Party completes Loan Application, which requests financing for Property.
As it happens, many UML authors also use verbs for association names, but that is a name that applies to the entire relationship – ostensibly in both directions. In practice, the modeler usually picks a direction, picks a verb, and labels it for that direction – along the same lines a data modeler would use. Those who take advantage of the UML feature that allows labeling “roles” at each end of an association, see the role name as a noun, essentially describing the entity class that is its object.
In UML, a role describing an association end “represents the behavior of an element”8. This sounds like verbs again. But in fact a role name “provides a name to identify an association end within an association, a well as to navigate from one object to another using the association”.9 This name is usually a noun. It describes the part played by the property that is a related class. (“Properties” in UML may be either attributes or related classes.) In the example above, a Party would be a customer in an Order. In this case, two association roles would be expected between Party and Order, so the role names customer and vendor work well.
Note that to the object-oriented modeler, though, a UML “association” does not represent a relationship in the semantic sense. It represents the path to be taken by a program in navigating from one entity class to another. So the point of view is focused on how to find the other entity class, rather than on what the relationship means. From that point of view, all that is required is a label on the other class. Indeed, as often as not, the entity class name itself is deemed sufficient.
The standards organization ACORD has developed a UML model describing the insurance industry. Among other things, in an association from Contract to Contract Header the role played by Contract Header is labeled contractHeaderElement.10
Issue OneFirst of all, as stated above, a conceptual data model is fundamentally a representation of classes of things significant to an organization and the relationships of those things to each other. Data modelers are often keen to refer to “employee”, “customer” and “vendor”. However these are not classes of significant entities. The significant entity classes are Person, Organization, and their super-type which, by convention is most commonly called Party. “Employee” is a Person with a defined relationship (“employed by”) with a company.
The roles that are implicit in these names should be represented in the names of relationships, not buried in the class names of people and organizations playing those roles.
Thus, a Party (which may be either a Person or an Organization) may be a customer in one or more Orders. Other Parties may be vendors in the same Orders.
Now this is a point that has been made to death in the past, and many modelers do avoid the “Customer” entity class. Indeed, Mr. Ross did in his presentation. He modeled only People and Organizations. Similarly, the ACORD model strongly emphasizes Party. Unfortunately, neither Mr. Hoberman nor Messrs. Simsion and Graham are so rigorous.
Issue TwoThe camouflaging of relationships in entity class names is but one problem. A more significant issue has two parts. The first is simply this:
Verbs describe activities or processes. These are more appropriately the subject of process models. To assert that a Customer places an Order is describing a business process. The input is presumably a requirement of some sort, and the output is a request for services or materials.
On the other hand, if you want to describe the Order as a thing of significance to the enterprise, then you probably also want to represent its relationship to other things. In this case, two of the related things are instances of Party, one of whom is presumably the customer in the Order, and the other is presumably the vendor in the Order, as described above.
No, the verb is not the part of speech to describe a relationship.
The second part of this issue is this:
In the ACORD insurance industry model, mentioned previously, the role describing the target entity class in the association from Contract to Contract Header was labeled contractHeaderElement.11 Another relationship is from Contract to Contract Element with Contract Element playing the role, elementsIncludedinContract.12 Note that what is being labeled is not the relationship between Contract and these elements. It is not about the nature of the association. Rather it is the answer to the question, “How can I identify that entity class when viewing it from the point of view of this entity class?”13
In these examples the target entity class is part of the relationship name. In many cases it is simply the object entity class name itself. For example, the relationship between an Order and Line Item might be simply “lineItem.”
In UML, both attributes and relationships to other classes are considered properties of a class. This means that, in the second example, above, “elementsIncludedinContract” is a property of Contract, identifying the class that it is related to. That is, it is the name of the role played by Contract Element in describing Contract. As it happens, “included in” is a reasonable relationship name. The problem is that it would be a property of Contract Element, not Contract. That is, “each Contract Element must be (“1,”) included in one and only one {“,1”} Contract.”
To make the role name a property of Contract, you have to say something like “each Contract may be associated with one or more (“0,*”) Contract Elements, each of which must be (“1,”) included in one and only one (“,1”) Contract.”
This is not just a little convoluted.
In the example above, a Java program would be expected to navigate from Contract to Contract Element guided by the path labeled “elementsIncludedinContract”. “Included in” is a clue to the role involved here, but this is fundamentally just a way to find the Contract Element. Some practitioners would simply label the role “contractElement.” Note that there is no indication of the meaning of the role.
No, a noun as a role name does not work in a conceptual model attempting to describe the world.
From the above two premises, we conclude that:
Remember the children’s program Sesame Street, and Grover’s words “over,” “under,” “around,” and so forth? He was teaching kids about the relationships between physical things – but the part of speech is the same even if the “things” are human beings or intangible concepts.
Now don’t get the idea that verbs aren’t part of relationships. It’s just that in a model describing what exists, the only verb of interest is “to be.” This can be extended to describe optionality, in the form of “must be” and “may be.” That is, every relationship sentence should have the structure:
< entity class 1 > (noun)
must be
(or) (verb)
may be
< relationship > (prepositional phrase)
one and only one
(or) (adjective phrase)
one or more
< entity class > (noun)
Indeed, among the “verbish” examples, you’ll find such things as “is assigned to,” “is the parent of,” and so forth. The “verbish” part of these relationship names is “is.” The heart of the relationship name is in the preposition, even if the modeler doesn’t realize it.
Mr. Hoberman’s example, “A Customer can place one or many Orders” suffers from two problems. First of all, the entity class isn’t “Customer.” That name encodes the relationship into the class name. The entity class is either Person, Organization or Party. It is the relationship name, not the entity class name, where “customerness” should be captured.
In addition to providing too much information about a relationship, “customer” doesn’t tell us enough about the nature of the underlying class: Is it a Person, an Organization, or either – a Party? Taking the most general view, then, the sentence could read:
(Note also that “may be . . . one or more” is a little more graceful than “can . . . one or many.”)
In the case of Messrs. Simsion and Graham, they do have a rigorous structure for their relationships:
< entity class 1 > (noun)
must
(or)
may
< relationship > (verb)
one and only one
(or)
one or more
< entity class 2 > (noun)
This is consistently applied and rigorous, but it does mean that a lot of the verbs begin with “be,” as in “each Operation must be managed by a Surgeon. To be sure, they can now say, going the other direction, that “each Surgeon may manage one or more Operations. But that camouflages the fact that each Surgeon is in fact playing the role of being the manager of the Operations.
In the case of their relationship, “each Customer may make one, or more Purchases,” since “purchase” is more specific than “order,” this could be rendered as:
Again, I am compelled to assume that either a Person or an Organization may be a buyer in the Purchase, but I don’t know that because it is not clear in the original sentence.
Mr. Ross made his name as an advocate for documenting business rules, and he asserts that his “fact models” are essential as the basis for describing business rules. This is certainly true. The question is whether the syntax described above to make model relationships can also be used to support business rules. For example, Mr. Ross describes the following business rules:
- A Vehicle must not transport more than 4 Passengers.
- A Person may not lease a Vehicle that the same Person owns.
This is a reasonable way to describe the rules, and it appears to be clear enough to gain acceptance (or rejection) by a business person. Since Mr. Ross asserts only that this is a “fact” model, he’s under no obligations to exclude sentences with verbs as facts.
If we are concerned with the precision of the rules, however, it does not reduce their clarity to say instead:
- A Vehicle may be a transport for no more than 4 People.
- A Person may not be the lessee of and the owner of the same Vehicle.
Indeed, it could be said that in addition to being more precise, it is in fact clearer.
. . .
In deference to the object-oriented readers, it is true that these are not prepositions but prepositional phrases, and they do include nouns. Indeed, in the Vehicle example just cited, “transport,” “lessee,” and “owner” are nouns. But note that they describe the first entity class, not the second. The “roleness” is about what the first entity does, more than what the second entity does. The nouns chosen by object modelers, however, describe the second entity class, or worse, simply reproduce its name, as in contractHeaderElement.
The object-oriented modeler might say that in the relationship between Order and Party “customer” is a role played by Party, with “Party as customer” as a “property” of Order. But from the semantic perspective, “customer in Order” is a predicate of Party, not Order.
In the ACORD examples, the “correct” relationships should have read something like this:
- Each Contract may be composed of one or more Contract Elements.
- Each Contract may be identified by one or more Contract Header.
Note that “identified by” may not be the correct name for this relationship. This was simply my guess. The original role name “contractHeaderElement” has no information as to exactly what the Contract Header means to the Contract. If you cannot figure out what a role name really is describing, then to go back to the subject-matter expert who was the source of your model – to find out exactly what was meant.
Think prepositions, not verbs and not nouns.
ConclusionThe “relationship” part of “entity/relationship” modeling is far more challenging and subtle than most people realize. One complaint your author often hears when teaching the approach to naming relationships described here is that it is hard!
That is true. If you are successful, the person reading the relationship sentence will find it to be obvious. Before you found the right words to make it seem that clear, however, the essential nature of that relationship was not so obvious.
In data modeling, to come up with the right name to describe exactly how two things are related to each other – so as to make it sound obvious – requires you to understand the fundamental nature of the relationship to a much greater degree than your reader will. What exactly is the role being played there? You can see it. You know that it is true. But coming up with the right word is, well, challenging. It requires skill in using language.
For a novelist to come up with the right word to convey an image or a person’s temperament – that’s hard, too. This is what literary skill is all about. If you went into computer science because you couldn’t write well in college, you are now officially in trouble.
End Notes:
- Hoberman, S. “Three Situations that Weaken Data Model Precision” The Data Administration Newsletter. April 1, 2010.
- When he invented data modeling, Dr. Peter Chen addressed “entity types” – classes of entities – not the individual entities themselves. Over the years, data modelers have gotten careless and come to using the word “entity” to describe classes (types) of individuals, not the individuals themselves. When object-orientation arrived on the scene, data modelers got grief for apparently not understanding the difference between objects and classes.
But we did understand the difference! We were simply careless in our language.
For this reason, in deference both to Dr. Chen and the object-oriented community, your author will henceforth refer to “entity classes” when describing the boxes in a conceptual entity/relationship model.
- G. Kemmerling. Philosophical Dictionary. http://www.philosophypages.com/dy/o.htm#onty. 2002.
- Knowledge Based Systems, Inc., Information Integration For Concurrent Engineering. Prepared for Armstrong Laboratory AL/HRGA. 1994.
- Hoberman. Op.cit.
- Simsion, G. and Graham Witt, Data Modeling Essentials, Third Edition. (San Francisco: Morgan Kaufmann Publishers). 2005. Page 99.
- Ross, R. “Verbish Models: How to Coax Semantics into Your Data Models”. Presentation to Houston DAMA, February 16, 2010.
- Rumbaugh, J., Ivar Jacobson, & Grady Booch. The Unified Model Language Reference Model. (Reading, Massachusetts: Addison-Wesley). 1999. Page 414.
- Ibid.
- Neugebauer, F., Boris Bulanov, & Kenneth Ekers. The ACORD Information Model: A Primer. ACORD Corporation. 2009. Page 6-48.
- Neugebauer, F., Boris Bulanov, & Kenneth Ekers. The ACORD Information Model: A Primer. ACORD Corporation. 2009. Page 6-48.
- Note that in this description, I have taken the liberty of inserting spaces between words in the entity class name, to make these names comparable to the other examples. The object-oriented practice of not including them is a technological constraint and not appropriate for conceptual models. I have kept the relationship names in “camelCase” to illustrate the object-oriented approach.
- UML Secret: The true rationale behind this approach to naming roles is that, in the world of Java programming, each class defines a “namespace,” which contains all its attributes and relationships as properties. But the related classes are not in this namespace, so effectively they cannot be “seen” by the program. All that can be seen of those classes is the role names that identify them.
This is clearly a technological constraint that should have no effect on how conceptual data models are described.