When I Learned About Entities


Everything Looked Like an Instance

Looking Back

The conference room had a glass wall; I guess to discourage any literal groping. Watching the debate through the glass set me reeling through near-lost recollections. My endless discussions were
now a blur but made the scene behind the glass familiar to me. Years ago I was the one who had been so earnest endlessly asking, “What is a customer? What is a product?” Teams spent days trying
to hammer out legalistic definitions of these Entity Types. Looking back, I realize none of it mattered. Attributes became columns; Relationships became foreign keys. What did we ever do with those
carefully worded definitions?

To me now, “What is a car?” seemed a stupid question for these analysts at a rent-a-car company to ask. Everyone in the conference room knew what a car was but had convinced themselves they
needed an airtight definition. They asked themselves, “What if it were not intended for use on the roads?” “What if we didn’t own it?” “What if we had not yet received it?” Did these
analysts realize that Plato had struggled with these kinds of questions 2,000 years ago? Had they reviewed current (and not to be recommended) literature on fuzzy logic? Had they ever successfully
used Information Engineering or OO techniques to build a system?

There is no single definition of “car” that will satisfy everyone who works with cars at this rent-a-car company. The concerns of the Acquisitions Department are different from the Rental
Department, and from Leasing, and from Disposal. Those analysts who put so much energy into the definition of “car” as an Entity Type are bound to be frustrated because the concept of Entity Type
alone cannot describe the various ways the owners conceptualize a car. To model “car” the way the owners use the term you must not only see “car” as an Entity Type. “Car” exists in various
States, as a part of a Superclass, and as a part of more complex business rules that are made of any number of these concepts used in combination. An analyst who understands only the concept of
Entity Type overworks that concept and misuses it. It is as if he or she only had a hammer and hence everything looks like a nail, including his thumbs

Insisting on arriving at a single concise definition of an Entity Type is a mammoth waste of time. To the extent that a definition advances your understanding of an Entity Type then keep on the
owners to define it. When word-smithing distracts from analysis, when every one knows what the Entity Type is but is just having difficulty agreeing on wording, then just drop it. Regardless of
what kind of definition you wind up with, your success depends on finding out from the owners all the Predicates and the Arguments on which the value of the Predicate depends.

(Editor’s note: I didn’t know what a Predicate or an Argument were either. Since he had said he didn’t like to define things, I also wondered whether the author was going to tell me
what they were. He does.)

Basic Parts of a Proposition

A proposition is “a formal statement of mathematical truth to be proved or demonstrated.” A data model is comprised of propositions of the form “x can be known of thing y.” Propositions are
made up of the Predicate (the x) and one or more Arguments (the y). The Predicate is what we want to know; the Argument is what we know about. The meaning of Argument we are using here is “one of
the independent variables upon whose value that of a function depends.”

Assume that data models only contain propositions that are true.

A Predicate describes some thing: for example, “How many US$ did the company receive” (for a car’s last rental.) The car’s last rental is the thing which is described by the Predicate, “How
many US$ did the company receive.” Be careful to distinguish that which you can know, a Predicate, and that which you do know. As the example suggests, Predicates can be stated as
questions. You can know how many US$ the company received. What you do know is expressed as a Statement: such as, “Car 54 last rented for $250.” (The concept Predicate is a
Superclass of the concept Attribute and the concept Relationship.)

The other half of a Proposition is the Argument. The Argument identifies the phenomena that the Predicate describes. As you read above, “(for a car’s last rental.)” is the Argument. The Argument
or Arguments are the givens, the Predicate is the goal. For our purposes, the Arguments are usually symbols representing some thing in the real world. In the proposition “Given x grade of gas, at
y gas station, then the price is n.” “grade of gas” and “gas station” are arguments.

A Predicate cannot be evaluated without first evaluating the Arguments; that is, the value of the Predicate depends on the value of the Arguments. (That should ring a bell, by Codd.) In the example
above, to determine n you must first know the values for x and y. The answer to the question “How many US$ did the company receive” depends on which car and the deal the company made with the
last renter. The Proposition can be stated, “‘How many US$ did the company receive’ is a function of (which car, which contract).”

Note that record-keeping requires us to substitute a symbol to stand for a car (the car’s identifier) and a symbol to stand for the actual contract (the contract number); it is nonetheless the car
itself that was rented. Predicates need to be dependent on real-world phenomena because only real-world things have the essential behaviors that are mirrored in the data model. One such behavior is
acquisition; when a car is purchased it is available to rent even before it is assigned a silly number. The things we can know about a thing depends on the thing itself and not on the identifier
even though we have become accustomed to saying the “value of the Attribute depends on the key and….” An Argument that does not represent something in the real-world must be questioned.

The following diagram of a Proposition should be familiar to you even if the term Proposition is foreign. Note that the Arguments are in bold type, the Predicate is normal type.

Car Contract US$ Paid

The proposition, “‘How many US$ did the company receive’ is a function of (which car, which contract)”, is more familiar to you as this record layout where the Arguments are the key fields and
the Predicate is the result field.

A system has any number of Predicates. Finding all the Predicates and understanding them is what Analysis is all about. Other Predicates might be “how many US$ did the renter pay in sales tax?”
or “how many miles did the renter drive?” Using the notation from above we would have:

Car Contract St. Sales Tax Paid

 

Car Contract Miles Driven

When you know all the Predicates, the job of assembling the data model is a relatively simple exercise of fixed rules.

First rule: Put all Predicates that have the same Argument or Arguments together. We usually extend the previous diagram to describe this set: (Again, the Arguments are in
bold.)

Car Contract US$ Paid St. Sales Tax Paid Miles Driven
Car Contract US$ Paid St. Sales Tax Paid

When you know all the Predicates, the job of assembling the data model is a relatively simple exercise of fixed rules.

First rule: Put all Predicates that have the same Argument or Arguments together. We usually extend the previous diagram to describe this set: (Again, the Arguments are in bold.)

Car Contract US$ Paid St. Sales Tax Paid Miles Driven

Looks like a BDAM record layout. Well, how else would you do it? We have to understand the meaning of each of the individual Predicates, but the set as a whole has no need of definition. The set of
all the Predicates that have the same Argument is our concept of a thing or phenomenon; there is little need for a definition if this set is properly constructed.

Getting all the Predicates together in this way is more than convenient; when you follow this rule you have put yourself in motion towards the ideal of integration. Integration means that
everything you know about something is in one place. One advantage of integrated data is that when any one of the Arguments of a Predicate vanishes then all the Predicates that depend on that
Argument disappear in the same instant. So, when a rental contract is canceled then the sales revenue and the sales tax payable disappear at the same instant.

The set of all Predicates for a given Argument has particular significance in the building of databases. But, it is often important to ignore some Predicates. Sometimes the owners do not
need to see all the Predicates, some of the Predicates are enough. Another way of saying, “some of the Predicates” is “a subset of the Predicates.” This subset of Predicates is called a
Projection. The following is a Projection of the Predicates for the Argument (Car, Contract).

Car Contract US$ Paid St. Sales Tax Paid

Since there are a very large number of possible projections for any set of Predicates, Projections are only documented when there is a need for the Projection. The example given above might be
needed to remit sales taxes.

A State is a Projection for which owners give rules regarding the inter-relationship of the Predicates and the evaluation of specific Predicates. For example, the owners may decree that “a
completed car contract should have an amount charged, a resolution of the sales tax, and a number of miles driven.”

Any Proposition may have any number of consistent States. Analysts who do work with the concept of State are probably familiar with the State Transition Diagram, an example of which is shown below.

This diagram describes meta-data about how States interact but it does not show how the Predicates of a Proposition are utilized by the State. Since a Predicate can be used by many States, and a
State can use many Predicates then a matrix is suggested as a way to document this interaction. For example the Proposition whose Argument is (Car, Contract) might have Predicates and States (shown
below in ALL CAPS) as follows:

  US$ Paid St. Sales Tax Paid Miles Driven
New Contract Not Applicable Not Applicable Not Applicable
Complete Contract Required Required Required

The concept of State does not replace the concept of Proposition or Class. It is an additional description of the behavior of the owner’s world.

Data

So far I have only discussed data about data. What about the data itself, that seems pretty important. Data is the evaluation of the Predicates for any Argument.

Car Contract US$ Paid St. Sales Tax Paid Miles Driven
54 ABC $250 $12.50 444

Each row of data (italicized) represents some phenomenon and is called an Entity Instance. You can have as many rows as you have phenomenon that interest you. Note that some analysts who confuse
the ID number with the real thing may think that you can have only as many rows as you have available IDs. These people need more fresh air.

Any system worth writing will have many Entity Instances. Typically we like to keep these all together. The set of all Entity Instances that have the same Predicates is called a Class.

Car Contract US$ Paid St. Sales Tax Paid Miles Driven
54 ABC $250 $12.50 444
55 DEF $300 $14.00 555

It really is that simple, and it must remain so. The concept of Class or Entity Type gets overly complex when analysts try to get these simple ideas to describe every complex behavior of objects in
the owner’s real world. For instance, rental cars become used cars. Cars come in varieties such as van or truck. A car could be available for rent before it is in inventory. You simply cannot
describe this entire behavior with just the concepts we have reviewed so far; Predicate, Argument, Proposition, Projection, Entity Instance, or Class. We have to add to our vocabulary, State,
Superclass, and Selection.

In the above example of a Class I showed all the Predicates; that is, US$ Paid, St. Sales Tax Paid, and Miles Driven. What if I were to ignore a Predicate; what if I were to create a set of all
Entity Instances that had this subset of Predicates: US$ Paid, St. Sales Tax Paid. Would the set of all Entity Instances that had the Attributes of a Projection be different from the set of all
Entity Instances that had all the same Predicates? If you said no, you would be right if you were to assume the only Entity Instances were those that we had already listed. But what if there were a
cellular phone rental record as well? Now I propose that a thing called a cellular phone rental exists and that it has the Predicates of US$ Paid, St. Sales Tax Paid, and Minutes Used.

Phone Contract US$ Paid St. Sales Tax Paid Minutes Used
A 123 $20 $1.60 10
B 456 $40 $3.20 20

Then the set of all Entity Instances that have US$ Paid and St. Sales Tax Paid is:

Item Contract US$ Paid St. Sales Tax Paid
A 123 $20 $1.60
B 456 $40 $3.20
54 ABC $250 $12.50
55 DEF $300 $14.00

The Superclass, a set of Entity Instances that share a subset of their Predicates, is an extremely useful concept. You can imagine that a small number of Predicates could balloon into a large
number of Superclasses. As with the Projection, we only document those Superclasses for which the owners have some use. The Superclass is not for the convenience of the analyst; the Superclass
documents an important way of looking at their data. Depending on the context, owners will use a Superclass just as they would a Class. Do not invent Superclasses to create phony inheritance
schemes. The example gives the owners the set of all those things for which we will need to remit sales tax. If the Superclass is implemented as a union set then new types of transactions that
involve sales tax can be added to the system as easily as adding another file to the union specification. Programs that use the union set are unchanged as new members are added.

One of the principles of good data stewardship holds that at any point in time the data must be in a consistent State.

Please do not confuse Superclass and State as many authors have done. A cell phone contract does not become a car rental contract; but an employee does become a manager. Even if a cell phone
contract were nominally turned into a car rental contract then nothing of the old phone contract would be meaningful to the new contract. This is in contrast to the way a car can go through states.
The data about an accident involving a rental car is still useful to the used car salesperson that is trying to disguise the fact.

All of this dodges the hardest working concept of all, the Selection. I say it is hardest at work because it is so widely used. If you ask someone to give an example of a Class they are likely to
say something like, “all the blue things.” This is a set; but, for purposes of information systems development it is better to reserve use of the word Class as a particular kind of set. A Class
is a set that defines membership in terms of what can be known about something not what is known about something. The set of all things that have a color is a Class. The set of
all things that are blue is a Selection. Classes are based on the existence of certain Predicates. Selections are based on the values those Predicates might take on.

Selections of Entity Instances can be of Classes or of Superclasses. The Selection below is of those Entity Instances whose sales taxes paid is greater than $2.00.

Item Contract US$ Paid St. Sales Tax Paid
B 456 $40 $3.20
54 ABC $250 $12.50
55 DEF $300 $14.00

This example shows how the concepts can be combined to create a practically infinite number of ways for an owner to refer to the Entity Instances they are concerned with.

Conclusion

There are a number of critical concepts needed to adequately describe the business rules that drive your information systems. If you do not know how to use all the concepts you will be tempted to
misuse or overuse the concepts you have mastered. The result is mundane.

You cannot describe the world without using the concepts of Predicate, Argument, Proposition, Projection, Entity Instance, Class, State, Superclass, and Selection. Together these concepts form a
new grammar suited for systems development. The noun, verb, object model you learned in school is less well suited to systems development. Both the old model and the new model require the student
to understand all the pieces to put together a meaningful sentence, that is a sentence that tells us something about the real-world. (Imagine! A sentence without a noun.)

Share

submit to reddit

About Bob Schmidt

Bob Schmidt consults for Stone Carlie Consulting in St. Louis, Mo and authors course work on data modeling. His CBT is distributed by IBM, Sybase, and agpw. His book, Data Modeling for Information Professionals, was published in August 1998 by Prentice Hall (ISBN-0-13-080450-9).

Top