A Word to Readers . . .This article represents your author’s first public use of the term “essential data model”. This is equivalent to what he has always referred to as an “architectural data model”, serving Row Three of his version of the Zachman Framework. He was never comfortable with Mr. Zachman’s use of the term “designer’s view” for that row because it was still about the nature of an organization, not about designing a solution. He struggled for a long time, really wanting “philosopher’s view”, but figuring that would be too arcane. Finally, he settled on “architect’s view”, which actually turned up in Mr. Zachman’s and Stan Locke’s Version 2 of the Framework.
Over the last few years, however, two things have happened: First, the advent of the Semantic Web and recognition that data models are really examples of ontologies has made the use of more traditional philosophical language more acceptable; second, the use of language in the data modeling world to describe conceptual/logical/physical/?? models had become completely confused. See your author’s 30-minute video on the subject at http:// tiny.cc/6c9zow.
More recently, he remembered that his company name “Essential Strategies, Inc.” had, after all, been derived from his reading of John Palmer’s and Steve McMenamin’s “essential data flow diagrams” in their book, Essential Systems Analysis.1 Moreover, he realized that all these years he’s been overlooking the obvious. He’s been building “essential data models” – that is, models that describe the “essence” of an organization – the relatively simple structures that underlie the complexity that everyone sees. So, starting now, the kinds of models he’s describing are “essential data models”.
Derived AttributesBack in ancient times (say, around 1980), your author encountered a combination manufacturing planning and database management system called the “Mitrol Information System”, or “MIMS”. Mitrol was a small company based in Cambridge, Massachusetts, founded by several MIT graduates. This was early in the life of both the manufacturing planning software and the database management industries, and the product was better than most examples in both. This meant, of course, when General Electric bought the company, they had no idea how to market the product, and it died after a few years. This is a pity, because it had some great features.
It had commands to manipulate product structures and do materials requirements planning – as part of the language. SQL still doesn’t do that very well. More significantly, the network DBMS understood the concept of the “computed field”. This allowed the database designer to define a field, not with a format, but with a formula. The idea was that whenever anyone queried the field, the formula would be executed to retrieve a value. In MIMS, this was a very powerful ability, allowing extensive manufacturing planning systems – that would otherwise have taken years to build – to be developed in a few months.
In the object-oriented world, no attribute actually represents a value. Rather, there is a “method” implicit that allows one to retrieve the value. This is a subtle distinction, but recognizing it provides an opportunity for some profound things in the relational world as well.
There was only one problem with taking advantage of this MIMS feature: When a system was implemented and someone asked for data, the lights would dim. The computers of that day were much slower than they are today.
It turned out that in designing a database, it was important to note those situations when the data could be computed on input and stored in the derived form. It is only in certain dynamic situations that the formulae should be used for queries. Thus, the decision as to which approach to take is a design decision, depending on both the dynamic quality of the data and the circumstances under which they will be retrieved.
In producing a business-oriented “essential” data model, however, it is not necessary to consider design decisions yet. It is instead both possible and desirable to identify potential derived attributes. When presenting a model with these attributes explained to a business audience, they become interested, and the model becomes much more meaningful to them. Derived attributes are to be found throughout your author’s latest book on data model patterns.2
In most cases, the derivations are obvious. For example, Figure 1 shows the standard model for orders. A modified version of the UML notation is being used here.3 It happens to have a feature that permits derived attributes to be flagged as such. Note the “/” symbol in front of some attributes. In Line Item the attributes “Quantity” is an ordinary attribute with a stored value. The attributes “Price” and “Extended Value”, are shown to be derived. UML has a language that can be used to specify the algorithms involved, but it is way too complicated for your average business-oriented data modeler. Instead, a syntax borrowed from MIMS seems a suitable way of documenting the algorithms. Again, the ultimate system designer is free to use whatever language is appropriate. Beginning with “Extended Value”, the derivation logic could be encoded in the form:
- “Line Item Quantity Ordered * Line Item Price”.
Figure 1: Example of Derived Attributes
The idea can be extended across entity types, however. Suppose, for example, one wanted to see the “Total Value” for the “/Extended Price” of all Line Items that are part of the Order. This is shown on the entity type Order in Figure 1 as “/Total Value”. In this case, each Order may be composed of one or more Line Items. This could be accomplished via the algorithm
- “SUM-THROUGH (composed of, Line Item, Extended Value)”.
The function “SUM-THROUGH” is in terms of:
- a one-to-many relationship,
- the related entity type, and
- the attribute to be summed over.
The result, then, is the total “Extended Value” across all of the Line Items.
A similar formula could be specified for “/Total Sales” in Product Type. That is:
- “SUM-THROUGH (ordered via, Line Item, Extended Value)”
Going in the other direction across a many-to-one relationship, the attributes on the “one” side (often called the “parent”) are in fact available as attributes to the “many” side (the “child”). In our example, the “/Price” for Line Item may be inferred from the Product Type that is being ordered. That is, each Line Item must be for one and only one Product Type. It could be derived from “INFER-THROUGH (for, Product Type, Standard Price per Unit)”. The function “INFER-THROUGH” is in terms of:
- a many-to-one relationship,
- the related entity type, and
- the attribute being inferred.
It is important to recognize that what is presented here is in a business-oriented, “essential” model. It is for the designer implementing any databases based on a model to make the appropriate design decisions as to whether the derivation should be calculated when data are being stored, or when they are being retrieved. Calculating when entered means that changing the value is more difficult. But the cost of the calculation is only incurred once. Calculating it every time it is retrieved incurs costs, but designing it that way means that it is always current.
In our order example above, it would be reasonable to calculate both the Line Item “\Price”, and Order “Total Value” and store them when each order is created. Product Type “Total Sales”, on the other hand is dynamic, and is more appropriately calculated when someone needs the value.
Note that this issue in one sense can drive decisions about de-normalization. The logic here has anticipated, for example, the decision to copy parent column values into a child table. That is, in effect, what is being done “virtually” when the INFER-THROUGH logic is specified.
End Notes:
- McMenamin, S., and John Palmer. 1984. Essential Systems Analysis. (Englewood Cliffs, NJ: Yourdon Press).
- David C. Hay. 2011. Enterprise Model Patterns: Describing the World. (Bradley Beach, NJ:Technics Publications).
- See David C. Hay. 2011. UML and Data Modeling: A Reconciliation (Bradley Beach, NJ: Technics Publications) for a discussion of how UML can be modified for this kind of model.