UML as a Data Modeling Notation, Part 4

UML as a Data Modeling Notation, Part 1

UML as a Data Modeling Notation, Part 2

UML as a Data Modeling NotationPart 3

The series of articles was originally presented in three parts. Part 1 set the stage, describing the basic differences between
the notations and, in principle, how they can be reconciled. Part 2, went into more detail, addressing sub-types and
constraints, along with both what elements in UML should not be used in a data model, and what has to be added (unique identifiers). Part 3 discussed the aesthetics of modeling, as well as some quirky aspects of UML that were worth noting.
A Postscript

Dave Hay used the approach to conceptual entity/relationship modeling described in this series published in the last three issues of TDAN to present a version of the Information Management Metadata
Model to the Object Management Group for review. Specifically, the technology-independent models of both entity/relationship modeling and relational design were presented using this form. In
general, they were well received, but one viewer brought up the problem that UML people viewing this model required information about its context if they were not to misinterpret it. The suggestion
was made that perhaps UML should not be used in the case where the objectives of an entity/relationship model were clearly different from those of a more typical UML model.

Dave is receptive to that, since that was what he had expected to do in the first place, but he’s invested enough in this approach that he would like to see it work. So, he and Mike have come
up with the following modifications to the technique, in an attempt to make the results more palatable to the UML community.

Understand that the principal difference between entity/relationship modeling and UML modeling is in its contents: entity/relationship modeling – as presented in these articles – is
constrained to represent only objects and classes that represent a particular universe of discourse (such as a business). UML in its pure form has no such restrictions. An object can be a
cursor or a window on a computer screen, a database or any such computer artifact. These are not included in a technology-independent entity/relationship model.

A UML model is typically created by an object-oriented designer to provide to a programmer, while an entity/relationship model is created by a business analyst to be reviewed by subject area
experts, and then to be submitted to a physical database designer. These viewers of the model have very different perspectives on the issues at hand.

But this difference should not be insurmountable in allowing the UML and data modeling communities to share the notation.

As we’ve seen in the previous three articles of this series, there are two areas where the approaches differ:

  • Sub-types
  • Role name representation


Most developers of UML models and some developers of entity/relationship models favor the approach of representing sub-type boxes outside super-type boxes, connected by lines denoting
specialization. We have argued here for the “box-in-box” notation (shown in Figure 1) for two reasons:

  • It is more compact. Given the constraint that a model must fit on an 8½ x 11 (or A4) piece of paper, having to take up space
    for sub-types is a cost.
  • It is more representative of the business reality. An instance of a sub-type really is an instance of its super-type(s). This notation makes it clearer that an attribute or
    relationship for a super-type is clearly also an attribute or relationship of all the sub-types.1
  • Even so, we recognize that that the box-in-box notation has already been taken. In UML version 2.0 a composite structure diagram is used to describe run-time
    architectures that aren’t clear from a typical object or class diagram. “UML 2 has added a composite structure diagram that shows the participating elements and their relationship in
    the context of a specific classifier such as a use case, object, collaboration, class, or activity.”2

“A composite structure is a set of interconnected elements that collaborate at runtime to achieve some purpose. Each element has some defined role in the

A composite structure diagram is a larger rectangle, with its components contained as rectangles within it. To look at the diagram shown in Figure 1 as a composite structure diagram is to imagine
that PERSON and ORGANIZATION are components of PARTY, not sub-types of it. Note that this is different from the composition diamond on a relationship. That denotes the class model
idea that an instance of one class is composed of instances of another class. A composite structure diagram asserts that a system component is composed of other system components.


Figure 1: Prior Example

Indeed the drawing in Figure 1 does show generalization, not composition. To clarify this, we recommend including the generalization lines in the boxes. This is shown in Figure 2. This keeps the
aesthetic orientation we are looking for, but signals the correct meaning to UML aficionados. This should not really be an issue because any viewer of this model should understand that it is a
conceptual model describing an enterprise and not a run-time model describing a system (and this should be annotated in every diagram’s legend), but the additional notation should help.


Figure 2: New Example

In your authors’ entity/relationship world, the pair of sentences describing a relationship are two assertions about two entity classes.4 Each sentence is in the form:
subject (first entity class) | predicate (role) | object (second entity class). Along with this is the analogous assertion that an entity class has an
attribute, with “described by” implied as the predicate and the attribute itself playing the part of the object.

In Figure 3, a sample relationship is described by two role names:

  • Each Association End must be owned by one and only one Association.
  • Each Association must be the owner of one or more Association Ends.


Figure 3: Original E/R Example

When your authors first learned that “roles” and attributes are “owned by” or “properties of” a UML class, this seemed very compatible with the way we looked at
entity/relationship entity classes. What mystified us was the way UML modelers name the roles. What we now realize is that to the extent that one can create a sentence from a UML role name, the
role name turns out to be a property of the object rather than a property of the subject.

Figure 4 shows the UML version of this model.5 Here, the UML sentences would be:

  • Each Association End has (as a property) the role of an owning association with respect to one or more Association Ends.That is, “Each Association End has the role of [an Association’s being] an owning association with respect to one or more
    Association Ends.”
  • Each Association has (as a property) the role of one or more owned ends with respect to one and only one Association.That is, “Each Association has the role of one or more [Association End’s being] owned ends with respect to one and only one


Figure 4: UML Example

That is, to a UML modeler, each role is a property of the object of the sentence (Association End and Association, respectively, above) rather than its
subject (Association and Association End).

The way we would do it (“Each Association may be owning (the owner of, actually) one or more Association Ends.”), the role of
owning” is a property of Association, the subject of the sentence. Going the other direction, we would put it “Each Association End may
be owned ends of (or rather, owned by) one or more Associations.”

In recognition of the different points of view, your authors have no problem with putting the role name at the other end of the line, as is shown below in Figure 5. We can still follow the
convention of reading in a clockwise direction, finding the cardinality symbols at the far end on the same side of the line. This differs from the way we originally portrayed this (as shown in
Figure 3, above), where, for example, owned by would be next to Association.


Figure 5: Updated ER Example

This revised approach has the advantage of putting the role name next to the entity class playing the role, which may be more comfortable for UML readers. The UML reader can still interpret it to
say that each Association End has the property of one Associations being owner of one or more Association Ends. Similarly, each
Association has the property of one or more Association Ends being owned by one and only one Association.

The UML modeler can think of the role as describing the second entity class, but being a property of the first entity class, and data modelers can think of it as a predicate of the first entity
class that is in terms of the second entity class. We have the policy of reading in a clockwise direction to preserve our sanity when dealing with multiple notations, but if one wants to read it in
the other direction, that’s okay too.

A word about the role names themselves. As stated in a previous article, because of the nature of the relationship sentences, in an entity/relationship model they must be prepositional phrases or
gerunds. Nouns don’t work. It is the preposition that is the part of speech for describing relationships. (Remember “Grover” words?) Nouns describe things, and we already have
entity classes to do that.

Note that this is still an entity/relationship model, so the entity class names and role names have spaces in them.

A further change from common UML practice is the fact that here entity class names are not reproduced in the role names. That bit of redundancy in UML apparently comes from the fact that in Java
programming, a class only “knows” what is in its namespace. The other class in the role is not in its namespace, so apparently the role name has to communicate what that is. This is
clearly a technology-specific requirement, not appropriate for a technology-independent model. The conversion, however, of the entity/relationship model to a design model could automate appending
the entity class name to the role name.

One issue is that many tools only allow a role to be the “property of” one entity class. Given our disagreements of interpretation of “property,” this will not work. To
resolve this, simply make all role names properties of the relationships they are in. Given the current state of tools, this means it will be ambiguous to convert this to either a relational design
or an OO design, but that’s an assignment for the tool makers to resolve.

In spite of all our best efforts, in this approach to using UML as a data modeling notation, clearly the meanings of many of the symbols are slightly different from those seen when UML is used to
support object-oriented design. This is natural, just as the symbols have different meanings when the notation is used to support relational database design:

  • An entity class is a thing of significance to the enterprise. This is technology-independent.
  • An object-oriented class is a piece of program code, representing any kind of object. This is dependent on object-oriented technology.
  • A relational table is a collection of rows and columns stored on a computer. This is dependent on database technology.

Just as a transformation is required to convert an entity/relationship model into a relational database design, so is one required to convert an entity/relationship model into an object-oriented
design. This may involve an automated process of attaching class names to role names, as well as manual efforts to add UML design adornments such as navigation and composition (and, of course,

Because the meaning of the models is different, should the notations be different? There are strong arguments for making it so, but these articles attempted to show that this is not required for
the models to make sense. Whatever notation is used, precise, semantically clear models can be produced. To do so is worthwhile, regardless of the particular experiences of the modeler.

UML as a Data Modeling Notation, Part 1

UML as a Data Modeling Notation, Part 2

UML as a Data Modeling Notation, Part 3

End Notes

  1. Yes, we acknowledge that this arrangement precludes representing multiple inheritance (a sub-type having more than one super-type), but it is our view that situations apparently requiring
    multiple inheritance should be modeled differently. The controversy continues.
  2. Eriksson, H-E, Magnus Penker, Brian Lyons, David Fado. UML 2 Toolkit. Indianapolis, Indiana: Wiley Publishing, Inc. Page 34.
  3. Wikipedia, “Composite Structure Diagram.”
  4. Back in the days when Dr. Chen invented “thing/relationship modeling” and the real-time programming community invented “thing-oriented programming”, they used different
    thesauri to come up with the language we use today. Dr. Chen called things “entities” and classes of things “entity class types”. The entity/relationship community got
    sloppy and lazy over the years and started calling the classes “entities.” When the two modeling communities started talking to each other this caused some confusion.For this reason, your authors are calling classes of entities “entity classes” and instances of entities “instances of entity classes”. In this paper, an “entity
    class” is simply the kind of class being addressed here.
  5. Our thanks to Jim Logan of Model Driven Solutions for this example.



submit to reddit

About David Hay

In the Information Industry since it was called “data processing”, Dave Hay has been producing data models to support strategic and requirements planning for thirty years. As President of Essential Strategies International for nearly twenty-five of those years, Dave has worked in a variety of industries and government agencies. These include banking, clinical pharmaceutical research, intelligence, highways, and all aspects of oil production and processing. Projects entailed defining corporate information architecture, identifing requirements, and planning strategies for the implementation of new systems. Dave’s recently-published book, “Enterprise Model Patterns: Describing the World”, is an “upper ontology” consisting of a comprehensive model of any enterprise—from several levels of abstraction. It is the successor to his ground-breaking 1995 book, “Data Model Patterns: Conventions of Thought”–the original book describing standard data model configurations for standard business situations. In addition, he has written other books on metadata, requirements analysis, and UML. He has spoken at numerous international and local data architecture, semantics, user group, and other conferences.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept