Data Model Evaluation Workplan

The book Data Model Patterns: Conventions of Thought describes a set of standard data models that can be applied to standard business situations. These patterns, it turns out, occur on
several levels. At the basic level are models of the things seen in business. The patterns in the book are a bit more abstract than conventionally seen, but they do describe things that are easily
recognizable to anyone: people and organizations, products, contracts, and so forth.

There is a more abstract level of modeling, however, which is necessary when the things being modeled don’t fall into these tidy categories. This level, also described in the book, is the subject
of this paper.

The Basic Model

Before getting into the more exotic models, it is useful to be sure we understand the basic patterns that will apply to nearly all organizations. Each real organization will have variations on this
model, but here you will find the elements that will be present in nearly every one. Figure 1, for example, shows that the entity party encompasses person and organization. That is, a person and an
organization are each things of significance, and if you want to refer to either, you can refer to a party.

Parties may be related to each other, as shown by the entity party relationship. This is simply the fact that one party has a specified relationship with another, as in a reporting structure,
employment, marriage, membership in a club, etc.

A party may have more than one address. Each address is shown in this model as a site, where each party may be located via one or more party placements in a site. Each site must
be in one or more geographic areas, such as a city or region.

Figure 2 shows the “stuff” a company deals with. Here it is called product type and product instance. It could be called asset type and asset, item type and item occurrence, or something similar.
Note the distinction between product instance, a physical example of the product, and product type, which is the definition of it, such as you would see in a catalogue.

A product structure element is the fact that one product type may have another product type as a component. Thus an assembly may have three sub-assemblies as components, and this would be
represented by three product structure element occurrences where the assembly is the assembly in and each sub-assembly is the component in each product structure element,
respectively.

 



 


Figure 1: People and Organizations

Note that a product instance may be either a discrete item which is kept track of individually, or an inventory which is a collection of items. In either case, each product instance must be at
a site
.

Figure 3 shows agreement, where an agreement is any formal relationship between two parties. Typically, this is a purchase order or a sales order, but it may encompass other kinds of agreements as
well. Invariably, our organization is one of the parties – either the buyer in the agreeement if it is a purchase order, or the seller in the agreement if it is a sales
order.

Each agreement must be composed of one or more line items, where each line item is for a product type.

Activities are the things the organization does to carry out its business. This is shown in Figure 4. As with product types and product instances, there is a distinction drawn between activity
types (the definition of what is to be done) and activities (the actual doing of it). Attributes of an activity type include its description and a standard length of time it is expected to require,
while attributes of activity include the actual date it occurred and the actual time it took.

 


Figure 2: Product Types

 


Figure 3: Agreements

 


Figure 4: Activities

 

Parameters

The above model is a good start, but it is not adequate to describe certain common situations. For example, there is a problem with product type and product instance. For each of these to be an
entity suggests that the attributes for all occurrences of each are the same. This simply is not true.

The attributes of a compressor are quite different from the attributes of a computer or a barrel of crude oil. We would like to have a single concept for “Product”, but that concept has many
different flavors.

We could define a sub-type for each product type, but new product types are being invented all the time, and the data management task would be impossible.

Figure 5 shows parameter itself as an entity. A parameter is a characteristic that is used to define a product type.

A parameter assignment is the fact that a particular parameter is used to define a particular product type. For example, the parameter “capacity” might be used to describe a boiler, while the
parameter “interest rate” might be used to define a savings account.

(Yes, one of the advantages of this approach is that it works as well for banks as it does for nuclear power plants.)

Note that the parameter may be expressed in a unit of measure. That boiler “capacity” for example, might be in “cubic feet”. The unit of measure that is the term for a
parameter assignment can override the default unit of measure of the parameter by itself.

 


Figure 5: Parameter Assignments

 

Note that in Figure 6 three kinds of parameters are shown: A discrete list is a parameter that can take only one of a specified set of allowable values. For example a “pharmacological category”
for a pharmaceutical would have a discrete list of allowable values. A derived parameter is calculated from one or more other parameters and/or constants. Each derived parameter must be
calculated from one or more parameter derivations, where each parameter derivation represents a formula of some kind. The formula, in turn, must be composed of one or more parameter
dervation elements, where each parameter derivation element may be the use of another or the use of a constant.

Other parameters simply describe the product type. If numeric, these cold be constrained by a “high value” and a “low value”. Within these constraints, a parameter assignment could have its own
“high value” and “low value”.

 


Figure 6: Parameters

 

A set of parameter assignments defines the nature of a product type. Any product instance that is an example of the product type is then evaluated with values for the parameters assigned
to its associated product type.

Figure 7 shows this. Here a parameter value is the fact that a particular product instance takes a specified “value” of a parameter. Note that the arc here is less about the fact that some are of
a parameter and some are of a parameter assignment, than it is about the fact that you can model it either way. If you specify that the value is of a parameter assignment you are keeping parameters
from being specified that were not previously assigned to product types. This is a partial business rule, although it still does not require (as a business rule should) that the product type and
parameter that the parameter value is for represent a legal combination as expressed by parameter assignments.

If the product type “Model 770 ThinkPad™”, for example, had assigned to it the parameter “processor speed”, the corresponding parameter value for the particular one I am looking at could
be “233” (unit of measure: mhz).

 


Figure 7: Parameter Values

 

The Laboratory

Your author discovered this structure when doing work for a bank. Sometime thereafter he was working for a lumber products company that needed a model for its laboratory. Fortunately he had been
doing the bank work, so he was fully prepared, coming with the following variation:

The laboratory does tests on product samples. In this case (unlike others I came across later), the company knows the product type it is dealing with. The tests are simply to determine specific
characteristics of the product. For this reason, it is possible to ascertain what the expected characteristics are to be.

In Figure 8 it can be seen that each product type may be evaluated in terms of one or more expected observations each of which is of a particular variable. That is, the product type is
considered to be within specifications if the value of a variable is between a “high value” and “low value” specified in the expected observation.

The laboratory process begins with a sample being taken from a product instance (which is an example of the product type in question). This sample is then subject to one or more
laboratory tests. Each laboratory test, in turn, is the source of one or more observations – each on a variable.

If you rename variable to parameter, expected observation to parameter assignment, and observation to parameter value, and if you then collapse sample and laboratory test, you have the model shown
above in Figure 7.

 


Figure 8 : The Laboratory

 

Clinical Research

This parameterization idea got stretched even further when applied to the collection of data from clinical pharmaceutical trials.

Pharmaceutical research is an example of a particularly messy modeling problem: Clinical data are captured on “case report forms” (CRFs), which, which depending on the study – indeed,
depending on the part of the study – have a variable number of sections, where each section could have one or several numbers, pieces of text, or even drawings. There is no fundamental,
underlying structure here. The only way to address the problem is to go up one level of abstraction.

Figure 9 shows how a clinical study is defined to be composed of one or more visit specifications by a patient to a physician. Each visit has been planned in the design of the study, as to
what information is to be collected. This information is organized into standard blocks, such as “personal information”, “hematological information”, “cardio-vascular information”, and so
forth. Each standard block is defined in terms of the block variables it is composed of, where a block variable is the use of a variable as part of a standard block.

The standards are defined by the pharmaceutical company, but each block may be tailored (to some extent) to the study in a visit block, which is part of a visit specification. Each visit
block then may have its own definitions of which visit block variables are part of it. (A business rule defined by the research company determines the extent to which visit blocks must
conform to the specifications of a standard block.)

Once the CRFs have been defined as to what visit blocks and visit block variables each visit specification contains, data may be collected. Each element on the CRF is an observation, which may be
either text or numeric, and which is collected at a specific date from a visit by a specific patient.


Figure 9: Clinical Trials

 

It may be argued that, while this is the most orderly way to capture all these data, it makes them a little difficult to get at. To correlate measurements of two variables it is necessary to
construct a query that asks for all values of a particular variable and the circumstances of their collection, in conjunction with all values of another variable, where the circumstances of their
collection are matched with the circumstances of the first. This is hard.

To address this, the pharmaceutical companies that have taken this approach have devised a table structure derived from this one. (This was the original “data mart” before that word became
fashionable.) The idea is that what the statisticians want to see is all the data of a certain kind together.

It turns out that the “block” structure described above gives us the opportunity to “de-abstract” the data into something a little more manageable. It is possible to write a single utility
program that takes the observation data and reorganize it into a single table for each visit block, with the variables showing up as columns in this table.

This appears in Figure 10. Each table represents a visit block, and the columns allow statistical analysis of correlations between similar variables. Even correlations between variables in
different tables is easier that it was in the original observations table.

 


Figure 10: “De-abstracted” Clinical Data

 

Mapping Legacy Systems

Data modeling is not done in a vacuum. It’s often done in conjunction with a major project. These days, that project is as likely as not to build a “data warehouse” – a repository that is
supposed to hold all of a company’s data and make them available to management for inspection and analysis.

The problem with building a data warehouse is that, while a data model is valuable in defining it’s architecture, it doesn’t help much in dealing with all those old “legacy” systems that are
going to be the source of the data. The designers of those systems often were not very cooperative in clearly identifying exactly what each datum means and where it fits into the larger scheme of
things.

The data model does help, in that it provides a road map of what kind of data have to be in there somewhere. What is needed next, though, is some sort of mapping from the columns and tables (fields
and files?) of the old systems to the attributes and entities of the model.

In one sense, this is not a logical data modeling problem. After all, the legacy database designs are physical structures, not logical ones. The assignment, however, is to make these logical
structures useful, and it is our job to do so.

So, it is necessary to look at the model of our “meta data repository” that is keeping the “model of our models”. In Figure 11, you can see this mapping. The legacy system consists (for the
sake of argument – we will not get into more complex legacy systems) of tables, each of which is composed of one or more columns. Our model, on the other hand, is made up of
entities, each of which is composed of one or more attributes. If life were simple, all we would have to do is to create a column mapping of each column of the legacy system to an
attribute which is part of an entity in our model.

 


Figure 11: Simple Mapping

 

Alas, life is not so simple. If you remember, much of the data described in our model is not contained in attributes of the entities, but separately in parameter values. That is, the definition of
the data structure is not in the entities and attributes at all, but in parameter assignments. This means that in some cases, we are not mapping a column to an attribute, but to a parameter
assignment, as shown in Figure 12. Here, the column mapping is either to an attribute or to a parameter assignment. Note that for the mapping to a parameter assignment to work, the table involved
had better have something to do with product types.

In this view of the world, note that the parameter values of a parameter are exactly like the values for an attribute. As implemented in a relational database, of course, those values will go into
a column in the table corresponding to the entity, rather than being captured in a separate table, but conceptually, a value of an attribute is exactly equivalent to a value of a parameter.

Note that we have brought together the application model of our business with the meta-model that is supposed to define the application model. Stay tuned. This gets weirder.

 


Figure 12: Not So Simple Mapping

 

Let’s look at our model of entities and attributes. (See Figure 13.)

You will recall that each entity is composed of one or more attributes. Looking more closely at attributes, we can see that each attribute must be constrained by a domain. The
domain provides validation rules for the attribute. Looking at domains carefully, it turns out that there are at least three kinds. There are other domains, that simply provide a format and perhaps
limits to numeric values; value sets, which require the attribute to take one of a specified set of attribute allowable values; and calculations, which are derived from other domains.

Now, look carefully. Does this model look familiar? We just did it in the parameter exercise. Figure 14 brings all this together. The model on the right, from our application model, is the
model of entities and attributes from the meta data repository. An attribute is nothing but a parameter assignment in disguise. A domain is a parameter.

 


Figure 13: Entities and Attributes

 

Now, what about that entity on the left and the product type on the right? Well, it so happens that the parameter model we’ve shown here is but a specialized example of a more general phenomenon.

There are different kinds of parties, for example, and we want to collect different kinds of parameters for them: departments, individuals, housholds, professional societies, etc. Similarly,
different transaction types might be defined by different parameter assignments. In general, there are few enough of these that we can use the super-type/sub-type structure – you
know, the device we use when there are different sub-categories of things that have different attributes? When we use sub-types, conceptually, we are doing exactly the same thing we do here with
parameter assignments.

Each parameter assignment could as easily be to a product type, a party type, a transaction type, or anything else. That is, it could be to any entity.

All of this is to say that pretty much everything in the world (with the possible exception of accounting) can be represented by the “universal data model”, shown in Figure 15.

 


Figure 14: Attribute Meta Model

 

 


Figure 15: The Universal Data Model

 

1. David Hay, Data Model Patterns: Conventions of Thought, Dorset House Publishers, Inc. (New York: 1996). This article is largely derived from this book.

2. Definition: Recursion – (see recursion).

3. For the attribute model and the consolidated Attribute Meta Model which follows, your author is indebted to Allan Kolber, of Butler Technology Solutions, Inc.

A twenty-five year veteran of the Information Industry, with extensive experience developing on-line, database-oriented systems, Dave Hay has been producing data models to support strategic
information planning and requirements analysis for ten years. He has worked in a variety of industries, including, among others, power generation, clinical pharmaceutical research, cable
television, oil refining, and forestry.

This diversity of industrial experience prompted him to write Data Model Patterns: Conventions of Thought, published by Dorset House Publishers, which presents data models of common
business situations that cross industry.

Dave is President of Essential Strategies, Inc., a consulting firm dedicated to helping clients define corporate information architecture, identify requirements, build, and implement new systems.

You can read more of his works on the Essential Strategies, Inc. web page, www.essentialstrategies.com.

 

Share this post

Michael Gorman

Michael Gorman

Michael, the President of Whitemarsh Information Systems Corporation, has been involved in database and DBMS for more than 40 years. Michael has been the Secretary of the ANSI Database Languages Committee for more than 30 years. This committee standardizes SQL. A full list of Whitemarsh's clients and products can be found on the website. Whitemarsh has developed a very comprehensive Metadata CASE/Repository tool, Metabase, that supports enterprise architectures, information systems planning, comprehensive data model creation and management, and interfaces with the finest code generator on the market, Clarion ( www.SoftVelocity.com). The Whitemarsh website makes available data management books, courses, workshops, methodologies, software, and metrics. Whitemarsh prices are very reasonable and are designed for the individual, the information technology organization and professional training organizations. Whitemarsh provides free use of its materials for universities/colleges. Please contact Whitemarsh for assistance in data modeling, data architecture, enterprise architecture, metadata management, and for on-site delivery of data management workshops, courses, and seminars. Our phone number is (301) 249-1142. Our email address is: mmgorman@wiscorp.com.

scroll to top