This is the second in a series of articles on the relationship between the object-oriented approach to system development and the information engineering approach it appears to be
replacing.
How exactly is “object modeling” different from entity/relationship (data) modeling?
The components of an object model are exactly the same as those for an entity/relationship model. Object modeling calls the boxes “classes” where data modeling speaks of “entity types”. The
lines are called “associations” in the first case, and “relationships” in the second. The underlying concepts of things and how they are related to each other are fundamentally the same,
however. (1)
Differences between notations are less significant, however, than the differences between the two primary ways in which data/object models are used. In one case they are used as part of the
requirements analysis process to describe the things of significance to the business, about which it wishes to hold information. Such models are pure representations of the information of the
business, without regard for technology.
A second use of data models is to represent database or class design for a physical system. Here the representation very much reflects the technology involved.
It is in this second case that object modeling techniques have additional characteristics to describe physical design issues – reflecting the fact that object-oriented design is different
from relational design.
When developing models of data structure to support requirements analysis, however, there should be no difference in content between an object model of a business and an entity/relationship model
of the same business.
When data modeling information engineers refer to entities as the “things of significance to a business” (as they have done for some fifteen years, now), the argument could well be made that
these are the “object classes” of the business. Occurrences of these entities are the business’ “objects”. When we address a business as object-oriented analysts, if we limit ourselves to
business object classes and their associations, we are in fact being information engineers.
But creation of “object modeling” itself suggests that more than the business objects we have been looking at are involved. The data structures described by Mr. Rumbaugh and his colleagues is
nothing other than that which we information engineers have been doing for a long time. For them to appropriate credit for these techniques is relatively harmless. More dangerous is the argument
often made that the data structure part of object modeling is somehow more than data modeling.
Some of the books your author has encountered (including Rumbaugh’s) do limit their view of the analysis of data to the objects of significance to the business.(2)(3) Other authors, however, do
not limit themselves to business objects, nor do they even seem to be oriented toward identifying business needs.
There are three areas where object and data models differ:
- Behavior – yes, this is the special feature of object models, but its implications are not what they seem.
- Bad models – these are not the sole province of the object-oriented, but there seems a special affinity here.
- Class names – these say something about the practice.
Objects and Behavior
It is true that the most significant difference between object modeling and traditional entity/relationship modeling is the former’s inclusion of behavior in class definitions. As an approach to
software design this has been a significant addition to the body of knowledge in our industry. The extension of the idea to analysis models, however, is more problematic.
In the case of basic classes, it is simple enough to add documentation of behavior, without affecting the underlying structure of the model. Note that this has no effect whatsoever on the
question of what classes should be there in the first place. Indeed, the idea of adding behavior to an entity definition is not a bad one, and could result in useful insights. As Bob Brown has
said, “entities are just objects that don’t know how to behave.” (4) To do so during analysis, however, the behavior involved would have to be true business behavior, and not the behavior
anticipated to be designed into a system.
It would certainly be useful, for example, to describe the life cycle of an entity/object class. What events cause creation of occurrences of it? What events cause updates or deletions? What
operations are carried out in response to these events? These are legitimate extensions of the entity/relationship modeling technique. A technique called “entity life histories” (which predates
UML and other object-oriented analysis techniques by many years), addresses exactly this issue. (5) It was originally described by Michael Jackson (no, not the popular singer) in his 1983 book,
System Development (6), and it has been incorporated into the SSADM methodology widely used in Europe. (7)
Note, however, that these entity life history models are much more complex than can be conveyed by simply listing operations in a class box.
The examples found in most object-oriented texts typically don’t describe business behavior. They are written in a kind of structured English that looks suspiciously like program code.
Objects Without Data
Things get stranger when a logical conclusion of the object model approach is that you can have “objects” with behavior but no data. In particular, there are two categories of object classes that
are related to behavior only:
- “Control classes” are used to control processing in a system, and
- “Boundary” or “Interface classes”, that “interact with things outside the application”. (8)
Neither of these describe business data structure, and indeed, they don’t describe business function, either. They essentially programming constructs, describing a hypothetical system.
For example, one text specified “ProfessorCourseManager” and “AddACourseOffering” as classes. (9) The first is neither data nor function, but merely the fact that a professor can control a
course. If this were presented as an associative class between Professor and Course (and named something like “CourseManagement”), where an occurrence of the class is the role of a Professor’s
managing a Course, it would be a legitimate class. Indeed, it has attributes (“effective date”, etc.). But as presented, it is only related to Course, and only represents control of the course,
not a fact about it. See Figure 1.
Figure 1: Controlling with Classes
The second class, “AddACourseOffering”, is simply a function in disguise. In fact even an object-oriented analyst would recognize it as a method that should be captured inside the
CourseOffering class.
Object Behavior or Process Models?
Your author once sat in on a presentation of an object model which consisted of an object class that performed some function, then handed off data to another object that did another function, which
in turn handed other data off to yet a third object that did something else.
Excuse me, but how is this different from information engineering’s data flow diagram? (Except that it is being presented as a data structure diagram.)
Information engineering already produces process models, with one box per process. At this point the argument could be made that it is an extension of the object-oriented philosophy that
function/process boxes and control boxes should appear along with data boxes on object models – but adding behavior to an object class is not the same thing as adding function boxes
to an entity/class diagram. They really are different things, and should be the subject of different models.
While there is a good argument for adding the behavior of an entity to the entity’s description, cobbling together object and function models simply confuses the issue.
Bad Models
A second issue between information engineering and object-orientation is that the latter seems to promote the creation of object classes that constitute what are simply bad models:
In one case, for example, the additional object classes “CustInfoMulti” and “CustInfo1” were defined as separate sub-types of a class referring to customers. (10) These sub-types were defined
to distinguish customers with just one address from those with multiple addresses. This is nonsense, since the underlying object class is still “customer”. The number of addressees it has should
be shown by the association between Customer and Address. The association “Each Customer may be at one or more Addresses,” asserts that if a Customer may have one or more addresses, it certainly
may have one. Or it may have more. There is absolutely no reason for there to be two sub-types.
Now bad models should not be held against object-orientation. After all, information engineers have produced their share of bad models as well. Your author has found only a few object-oriented
books,(11)(12)(13) however, where the models were not perverse in the manner of the above example. The issue is not what could be done, but what object modelers are taught to do. The fact of the
matter is that, because of the aura surrounding “object models”, rarely are they taught the information engineer’s respect for the underlying structure of the information.
Class Names
Finally, object-oriented modelers are often not even oriented toward the business. This is evident in the language used. As a minor example, consider the names of entities and attributes. In UML
and other object modeling notations, spaces are not permitted between words in entity and attribute names. After all, programming languages can’t handle spaces. If, however, this is to be a model
of the business – validated by human beings – then conventional English must be used. The expression most business people will recognize is “purchase order”, not “PurchaseOrder” or
“Purchase_order”. The statement being made by this convention is that the model is being prepared, not in service to the people of the business, but in service to the computer.
(It is true that some practitioners do use spaces between words , and the original specification for UML does not specify a rule either way (15), but the prevalence of those who don’t allow spaces
says something profound about the state of the industry.)
The problem in fact goes beyond spaces between words in names: in at least one of the books your author has read, object classes (definitions of things of significance to the business, remember)
are called things like “CustInfo” instead of “Customer”. (16)
Another author at least spelled out “StudentInformation”, but she then defined this class as “Information needed to register and bill students.” As an afterthought she acknowledged that a
“student is someone who is registered to take classes at the University” (17), but this wasn’t the definition of the class.
The business object classes are not “student information” and “customer information” (even spelled out with spaces). The business object classes are “students” and “customers”.
Conclusions
Fundamentally, object models and entity/relationship models describe the same thing: The structure of data. Where before there were perhaps half a dozen notations available for entity/relationship
modeling, now the advent of object-orientation has doubled that number. Yes, the object-oriented notations add space for describing behavior, but the data structuring assignment has not changed.
UML promises to introduce a standardization in notation, which is a good thing. Your author has some reservations about UML, but they will be addressed in the next article of this series.
Much more important than issues of notation, however, the target of any data/object model can be either the data in the business or the data in a computer system. This distinction –
between alternative targets for models – is much more important than the differences in notation or terminology. It is the muddying of this distinction that is the common failing of many
system developers in both camps, and correcting that is what should occupy our attention – not religious wars over symbols.
(1) David C. Hay, “A Comparison of Data Modeling Techniques”, Data Base Newsletter, Mar/Apr ’95. See also www.essentialstrategies.com.
(2) Rumbaugh, et al., ibid.
(3) Craig Larman, Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design, Prentice-Hall PTR (Upper Saddle River, NJ:1998).
(4) Bob Brown, “Extended Modeling Language”, GUIDE Proceedings, November 1990.
(5) David C. Hay, “Object Oriented Data Modeling: Entity Life Histories”, Oracle CASE Special Interest Group, 1993. Also available at www.essentialstrategies.com.
(6) Michael Jackson, System Development, Prentice Hall, (Englewood Cliffs, NJ: 1983).
(7) Ed Downs, Peter Clare, and Ian Coe Structured Systems Analysis and Design Method: Application and Context, Prentice Hall INternational (UK) Ltd, (Hemel Hempstead, Hertfordshire:1988).
(8) Paul Harmon & Mark Watson, op. cit. pp. 123-125.
(9) Terry Quatrani, Visual Modeling with Rational Rose and UML, Addison-Wesley, (Reading, Massachusetts:1998). p. 54.
(10) Paul Harmon and Mark Watson, Understanding UML: The Developer’s Guide, Morgan Kaufmann Publishers, Inc., (San Francisco: 1998), p. 150-152
(11) James Rumbaugh, op.cit.
(12) Martin Fowler, UML Distilled, Addison-Wesley (Reading, MA:1997).
(13) Craig Larman, op. cit.
(14) Martin Fowler, op.cit.
(15) Rational Software Corporation, Unified Modeling Language Notation Guide, Rational Software Corporation, (Santa Clara, CA:1997), pp. 42-44.
(16) Paul Harmon and Mark Watson, ibid., p. 182.
(17) Terry Quatrani, op. cit., p.51