Published in TDAN.com January 2001
name and form.
Elective affinities, Book II, Chap. 7
Plutarch, Lives, Crassus
In the last two issues of tdan.com, articles in this space – A Repository Model – The Analysis Model and A Repository Model – The Relational Design Model presented the first components of a data catalogue (“metadata repository” in the current
argot), in data model form. In the first article, emphasis was placed on the elements required to support analysis – entities, attributes, relationships, and so forth. The second article described
relational design, with its tables, columns, and keys. This issue covers the wonderful world of object-oriented design.
Your author is indebted to Meilir Page-Jones for his excellent book, Fundamentals of Object-Oriented Design in UML,[1] which provided the
theoretical basis for this article. In addition, he graciously reviewed this article and answered questions.
In addition I would also like to thank Mark Spencer, Ed Landale, Joe Newcum, and Mark Gokman for their contributions to this and my previous two articles. Each provided extremely useful comments to
help me refine my points. Errors in this article, however, are mine, not my reviewers’.
As before, this article makes the point that, in most situations, there are relatively few very well defined things that we want to keep track of in a catalogue. To model these things should not be
very difficult. These articles present a relatively simple set of models to describe a catalogue that will support a typical application. Yes, these are sketches, and they could certainly be made
more elaborate. But they should accurately represent at least those things they set out to represent – concisely and in concrete terms.
This article is intended to describe object-oriented design. This is currently a hot topic, but one which is unfortunately often misunderstood. It is substantially more complex than relational
design, which has made this article much more difficult to write than either of the others. If I successfully described the subject, object-oriented developers will find what is written here
self-evident. If that happens, the meta-model presented here must be reasonably correct and the first objective of the article will have been met.
A second objective, however, is to provide assistance to those who are not as familiar with the object-oriented approach. The nice thing about data modeling is that, properly done, it is a powerful
tool for exploring and describing areas with which you would be otherwise unfamiliar. If some readers come away with a better understanding of just what object-orientation is, a second objective of
the article will have been met.
Readers are encouraged to disagree with the particulars of these models. The nice thing about data modeling is that it gives us a very good language with which finally to clarify what we disagree
about.
Classes
One of the claims made about the virtues of UML is that the same symbols can be used for the classes identified during requirements analysis as are used to describe the classes created in a
computer system. This is unfortunate, since these are not the same thing, and to use the same symbol and terminology to describe both is misleading. The extent to which they are not the same thing
will be demonstrated in detail by this article.
In the relational world the distinction is made between entities that represent things of significance to the business, and tables and columns that are representations of these things in the
computer. It is true that many data modelers in the relational world confuse them, but at least in principle entities and tables can be treated separately. Indeed, while the database design should
be based on the entity model, it is often appropriate for the designer to depart from that structure for reasons of performance or other physical characteristics of the system.
The UML approach means that the same symbol is representing very different things – classes in the world and classes that are computer artifacts. The confusion between these things can be
unfortunate. Even in the object-oriented world, the bits of code that describe classes are not the same things as the classes the code describes.
Because of the importance of distinguishing between the real-world “class” of requirements analysis from the computerized “class” of design, the former will be referred to as entity/class. The
latter will here be simply called class.
(In previous articles I referred to “object classes”, since I view the word “class” as referring to a wide range of things outside the world of system development. It has been pointed out to
me, however, that within the object-oriented domain, the things we are talking about here are simply “classes”. So, “entity/object class” from the previous articles is hereby renamed
entity/class, and henceforth this article will be about the entity class.)
Figure 1 shows class (implementation), representing the piece of code that describes a class. Note that this is not the same as the entity/class from previous articles that described a thing in the
world.
Unlike in the relational world, sub-types and super-types can be implemented directly. That is, each class may be a generalization of one or more other classes and each class may be
inheriting from one and only one other class. (Again, for philosophical reasons, in this article we are ruling out multiple inheritance – although, of course, the model could be changed to
accommodate it.) (Ok, if you insist, change the model to say that “each class may be inheriting fromone or more other classes.” You realize of course that this means you’ll have
to add an intersect entity.)
Meilir Page-Jones describes a class/implementation as being in one of four “domains”:
A business class represents something in the business, which may be either:
- an entity class, such as “Person”, or “Contract”,
- an attribute class, such as a “Balance” (of a bank account), or a “Unit cost” (of a product),
- a role class such as “Customer”, or “Patient”, or
- a relationship class, such as “AccountOwnership” or “PatientSupervision”.
An application class represents something specific to an application, and may be either
- an event-recognizer, which is a software construct that monitors input to check for the occurrence of specific events in (messages from) the environment. For example,
this might be as a “PatientTemperatureMonitor” that looks for the event “Patient becomes hypothermic”. - an event-manager that carries out the appropriate policy when an event of a given type occurs. For example, the event “Patient becomes hypothermic” is a message from
the appropriate event-recognizer to the class “WarmHypo-thermicPatient”, which in turn sends the appropriate messages to other objects to increase the patient’s warmth and summon medical
attention.
An architectural class concerns the specifics of an implementation in a particular computer. This might be one of the following:
- a human interface class such as a “Window” or “Command Button”,
- a database manipulation class such as a “Transaction” or “Backup”, or
- a machine-communication class, such as “Port” or “RemoteMachine”.
A foundation class which is usable widely. foundation classes include:
- a fundamental class such as “Integer”, “Boolean”, and “Char”,
- a structural class that implements a data structure, such as “Stack”, “Queue”, etc.
- a semantic class such as “Date”, “Time”, “Angle”, and so forth. These classes have richer meaning than the fundamental classes. In addition, their attribute
values may be expressed in specific units, such as “feet” or “seconds”.[2]
We can probably assert a business rule that a class that is in one of the domains listed can only inherit from other classes in the same domain (business class to business
class, foundation class to foundation class, etc.).
Note that an attribute of class is the “Program Code” that implements it. This is in addition to the “Name” of the class.
To provide for a bit more flexibility, the model redundantly also asserts that each class must be an example of one and only one class domain. These are the same domains represented as sub-types,
above. That is, “Business Class”, “Application Class”, “Architectural Class”, and “Foundation Class” are all class domains. Each class domain, however, may be composed of one or more other
class domains. That is, the class domain structure allows for specification of the sub-domains listed above, which are not shown on the model as sub-types.
Be aware, by the way, that there are other ways to classify classes as well, but we won’t go into those here.
In Figure 2, we show that each class may be described by one or more class elements. A class element is an attribute of one and only one class, describing it, just as an attribute
described an entity/class in the first article of this series. In object-oriented design, however, there are two kinds of attributes: an instance attribute takes on a different value for every
occurrence of the class. The attribute “Name” for the class “person” is different for each person. This is an instance attribute.
You can also have class attributes. In entity relationship modeling, these are usually handled by creating a parent entity, but in object-oriented design, they can be dealt with more directly and
more intimately within the entity being described. A class attribute for “contract”, for example could be “Next Contract Number”.
Instance attributes are of two kinds. Discrete instance attributes, such as “State”, or “Color”, take values from a discrete list. Other instance attributes, such as “specific gravity”, take
values from a continuous range. A particular kind of discrete instance attribute, state, will be described in detail, below.
A discrete instance attribute may be given one or more legal values. Since we don’t have “polymorphism” in this model to deal with varying formats, it is necessary to have the explicit
attributes of legal value be “Text Value”, “Date Value”, and “Numeric Value”. A business rule decrees that only one would be used for an instance of an object in
this class.
Note that class element also has the attribute “Visibility”. Is this class element (is this instance attribute, for example) visible to any part of a system outside the class it is part
of? Visibility is of at least three kinds:
- Public – the class element may be seen and used by any other class or operation.
- Protected – the class element may be seen and used only within its class implementation and those which are inheriting from that class.
- Private – the class element may only be seen within the context of its class.
There are other kinds of visibility that are implemented by specific object-oriented languages, but these three are the ones most commonly used.
Note that a class element may itself be the use of another class. For example, the instance attribute “Name” could itself be a class.
As implied by the definition of its domain described above, a business class may be derived directly from the entity/relationship (or object) model created during requirements analysis.
Specifically, as shown in Figure 3, a business class may be based on one or more class definitions, each of which is in turn the use of either an entity/class or a relationship.
That is, a class definition is the fact that a particular analysis artifact (entity/class, or relationship) is implemented as a class – specifically, a business class.
Similarly an attribute definition is the fact that a particular attribute from the entity/relationship model is implemented as an instance attribute. That is, an attribute definition is the use of
an attribute as an instance attribute. (As pointed out above, the instance attribute may itself be the use of another class.)
Objects
Figure 4 shows that a class may be embodied in one or more actual objects. That is, an object is an instance of one and only one class.
It is possible, for example, to define a class Flap (as on an airplane wing) and then to discuss a specific object leftFlap.Flap and rightFlap.Flap where leftFlap and rightFlap are
specific objects of the class Flap. Then, when the program is run, there may be many examples of leftFlap.Flap and rightFlap.Flap.
Note that in this model – and in program code – the object described here is still the definition of an object, not the object itself. It is a piece of program code that describes an
instance of a class. When the program is run, there will in turn be one or more instances of the object, each with its own identifier (or “handle” as it is known) and with its own values
for attributes.
When using a relational database implementation, it is these run-time occurrences of objects that will constitute rows in a relational table. Since there may be several different objects defined
for the class, the table will require an additional column (“Object Type” or some such), to identify which object this instance is an instance of. In a table describing “Flaps”, for example,
for any particular instance of one of the objects, the “Object Type” would be either “leftFlap” or “rightFlap”.
Thus we have three levels of instantiation in object-oriented design: the class, the object, and occurrences of the object. This is as opposed to the analysis situation where you have only
two: the entity and occurrences of that entity.
In another example (described in more detail below), when run, the statement New:hom1.Hominoid creates instances of the object hom1 of the class Hominoid.
(Mr. Page-Jones follows programming conventions and uses a period to separate an object name from its class in written descriptions. UML, on the other hand, uses a colon to separate an object name
from its class name.[3] In either case, the expression is underlined to denote its referring to an object, not a class.)
Persistence
Object-oriented programming may not have to be concerned with a physical database at all. It is perfectly common to define objects that only survive for the period that the program defining them is
running. In business applications, however, it may be necessary to preserve an object’s identity and data beyond the life of the program. That is, it is necessary to maintain
persistent objects. Given current technology, this is typically done by storing the objects in relational tables and columns.
Figure 5 shows that each class may be made persistent in one or more persistence mechanisms. Currently, the most common persistence mechanisms are tables and columns. Classes are typically
made persistent in tables, and instance attributes are typically made persistent in columns. This doesn’t mean, however, that others might not also be used. Historically, they
have been such things as ISAM files, network databases, and other kinds of data storage technology.
State
An instance attribute may be a state, which describes a condition for each object which is an instance of a class. A state is an instance attribute that is controlled by business rules which in
turn constrain how an object may move from one value to another. (See Figure 6.)
Note that the complete “state” of a class is the sum of the values of all its state attributes.
As stated above, a state, as a discrete instance attribute, may be given one or more legal values. In the case of state, however, a business rule states that it
must begiven one or more legal values.
A transformation, is a rule for changing the value of a state from one legal value to another. A business rule asserts that a transformation specifically applies to the conversion of one legal
value to another for a state, not for any other kind of discrete attribute instance.
Note, by the way, that this concept of state can also be implemented in a corresponding way in a relational design. Because object-orientation began in the real-time systems world, however, the
concept is more central to this approach.
Behavior
Figure 7 adds operation to the model. An operation is a function that is performed by objects in a class. Typically, an operation is on one or more instance attributes, although
it might not be.
“Visibility” is also an attribute of operation. That is, as with class elements, an operation may be seen throughout the system, within its own class, or only within its class and its sub-types.
Note that what relational programmers would consider an attribute may in fact be implemented as a call to an operation that returns the requested value. In object-oriented land, it
doesn’t matter whether the value was stored in a table or derived in some other way.
Mr. Page-Jones uses an example in his book, written in his version of a generic object-oriented language. His class Hominoid is a video game character that turns right or left and goes
forward. It can detect if it is facing a wall and must turn. It is described as follows:
Hominoid
Essentially, the definition of the class is in terms of its operations. These include New, which creates an instance of hominid at run time, plus turnLeft, turnRight,
advance, and display. It does have two instance attributes (location and facingWall), but as noted above, these are each a call to an operation that
will return a value. So, even the instance attributes refer to operations as well.
The instance attribute facingWall, by the way, is an example of a state, with legal values “Yes” and “No”.
An operation must be implemented by a method, a piece of program code that carries it out. Like other kinds of program code, this is a kind of module, where a module is any piece of code,
as we defined it in the previous article. Another kind of module is a package, which is a collection of classes. Actually, a module may be composed of other modules, so a method may be composed of
other methods, and a package may be composed of other packages.
An object behaves by having its operations send messages to other objects, thus triggering those objects’ operations. Specifically, as shown in Figure 8, each message is from one object
to another object. The message is actually sent by an operation that is performed by (the “from”)object in a class. Specifically, this is the class that is embodied in
the object that is the source of the message. Each message then acts as a trigger to invoke one of the operations that is performed by the class that the receiving object
is an example of.
If the messages are asynchronous, there may be a message queue in front of the receiving object to store messages until they can be processed. That is, messages which are
concurrent or asynchronous must be stored until the receiving object can process them. Hence, each message must be either to another
class, or to a message queue.
Again, in this model, as in the program code involved, we are dealing with the definition of a message (describing it, as well as its normal source and destination). Actual messages are
created when the program is run.
A message must be an example of a message type. Message types include “informative”, which provide an object with information to update itself, “interrogative”, which request an object
to reveal something of itself, and “imperative”, which requests an object to take some action upon itself.
Each message may include one or more input or output message arguments, as shown in Figure 9. Each message argument must be for a particular message and it must be a reference to another
object. Message arguments may be either input arguments or output arguments, as determined by the value of each message argument’s “In indicator” and “Out indicator”. Both indicators are
present, since the same message argument could be both an input and an output argument. In a program, these arguments are shown with input arguments first (optionally preceded by the word “in”),
followed by the word “out” and the output arguments, optionally followed by “inout” and any arguments that are both input and output arguments.
Each argument is itself typically a reference to an object, but this can be an object in a “Foundation Class” – such as a kind of “integer”, “character”, or some such. In the example above,
“noOfSquares” could be an object in the class “Integer”.
In Mr. Page-Jones’ example, if hom1 is defined as an object of class Hominoid then a message advance would be specified as hom1.advance(noOfSquares, out advanceOK), where noOfSquares is an input
parameter (the number of squares to advance) and advanceOK is an output parameter (whether or not the advance was successful).[5] Again, a run-time
occurrence of hom1 would have an object id and would in fact advance a particular number of squares (like “5”).
Figure 10 shows that a message to an object may be acting as one or more state triggers, each of which must be (the trigger) of one transformation from one legal value to another legal value. The
two legal values must be of a state that is part of the class that is embodied in the destination object.
In this model, the business rule governing the transformation is simply presented as a text attribute of state trigger. Perhaps a more sophisticated model could represent the structure of such a
rule more explicitly. This is left as an assignment for the reader.
A Personal Comment
I would be dishonest if I did not confess that this article was by far the most difficult of the three repository articles. Indeed, it is one of the most difficult I have ever written. I have been
pleased throughout my career to be able to take my data modeling technique to any industry and within a few weeks understand that industry better than many people who work there. This is the first
time I have taken it to my own industry. The experience has been very illuminating.
For the last several years there has been friction between the object-oriented aficionados and those more schooled in relational technology. I confess to having contributed my part to that
friction. The problem has been that the language and the perspectives of the two groups are very different. The fascinating thing about putting together this article has been that finally I have
been able to dissect the object-oriented terms in a way that (it is to be hoped) can make them clearer to all, and perhaps to clarify the sources of some of the disputes.
You will find personal observations in the article to be sure, but I have tried hard to be as objective and honest as possible in presenting each concept. Please feel free to take me to task if you
believe I have failed in that anywhere in the discussion.
And of course, correct me where I am simply mistaken. As I have stated in the previous articles, should you, dear reader, take exception to any of the models presented above – good! It is about
time we had a discussion on the specific content we expect in a repository, instead of being surrounded by fluff pieces talking about what a good idea it is.
The purpose of a data model is to be wrong. This one represents your author’s best guess as to the truth, and it is there for people to correct. Tell me exactly which assertions (entities and/or
relationships) you disagree with.
Please either write to me at davehay@essentialstrategies.com or post your disagreements to the Data Management Mailing list. You may subscribe to this list by sending an e-mail to
dm-discuss-subscribe@egroups.com, or go to its homepage at http://www.egroups.com/list/dm-discuss.
Alternatively, if you think these models are completely wrong, please submit your own article to TDAN.com describing your counter argument. Send it to rseiner@tdan.com. I am sure Bob Seiner would
be glad to hear from you.
In your disagreements, I ask only two things:
1. The model is a set of assertions in the form: “Each must be (where the line next to the first entity is solid) or may be (where the line next to the first entity is dashed) one or more (where
there is a “crow’s foot” next to the second entity) or one and only one (where there is no “crow’s foot” next to the second entity).
(For example, “Each column must be part of one and only one table; each table may be composed of one and only one column.)
Please express counter assertions in the same form. Yes, it is true that this is an unconventional approach to defining relationship names, and it is hard. But it is hard because to come up with a
reasonable name (one that sounds perfectly obvious to the reader), you must really understand the nature of the relationship.
If UML is used, each can be shown as a role name.
2. If you draw an alternative model, organize it so that the crow’s feet (or the asterisks, if you use UML) are to the left or the top of the model. This tends to put reference entities in the
lower right part of the diagram, and intersect or transaction entities in the upper left. It provides a consistent organization for the drawing, and makes it easier for all to see where the
differences are.
I look forward to hearing your comments and observations.
[1] – Meilier Page-Jones, Fundamentals of Object-Oriented Design in UML, Addison-Wesley, (Reading, MA: 2000).
[2] – Ibid., pages 233-240.
[3] – Grady Booch, James Rumbaugh, and Ivar Jacobson, The Unified Modeling Language User Guide. Addison-Wesley. (Reading, MA: 1999), page 185.
[4] – Ibid., page 6.
[5] – Ibid., page 22.