Published in TDAN.com October 2000
Whose career was not showered in glory,
He had a really bad day
When he just couldn’t say . . .
Meta Data Repository”
To request a version of this article that has been updated since the original publishing of this
article in TDAN.com 14.0, please click here or send the publisher an email with the subject
“Request for Updated Hay Article in TDAN.com 14.0”.
Last issue in an article in this publication – http://www.tdan.com/view-articles/4869/ – I presented the first components of a data catalogue
(“metadata repository” in the current argot), in data model form. In that article, emphasis was placed on the elements required to support analysis – entities, attributes,
relationships, and so forth. This issue was to have covered the area of design, but that topic has turned out to be bigger than expected. For that reason, this issue will cover relational database
and system design, while object-oriented design will be saved for a later issue.
As before, this article makes the point that, in most situations, there are relatively few very well defined things that we want to keep track of in a catalogue. To model these things should not be
very difficult. These articles present a simple set of models to describe a catalogue that will support a typical application. Yes, these are sketches, and they could certainly be made more
elaborate. But they should accurately represent at least those things they set out to represent – concisely and in concrete terms.
Readers are encouraged to disagree with the particulars of these models. The nice thing about data modeling is that it gives us a very good language with which finally to clarify what we disagree
about.
Responses
Before moving into the design model, it is worth while to respond to some of the correspondents who commented on the last model.
Two respondents commented on the “entity view” concept, and both anticipated a change that your author made as soon as the article was published. Figure 1 shows the model from that
issue, with each entity view being composed of one or more entity definition elements each of which must be the use of an entity/object class. The problem with this is that each
entity view is really a virtual entity, which has many of the same characteristics and relationships as any other kind of entity/object class.
A better approach is shown in Figure 2. Here, a virtual entity is simply another kind of entity/object class. As in the previous model, this virtual entity may be composed of one or more
entity definition elements, each of which must be the use of another entity/object class. Also, as in the previous model, each virtual entity may be composed of one or more attribute
definition elements or relationship definition elements. For example, the virtual entity “vendor” is composed of the entity party, plus the relationship buyer in one
or more contracts.
Note that Figure 2 has one other change as well. Where the definition of a virtual entity is clearly superior, the change from role/relationship end to simply relationship. Where previously,
role/relationship end described just half of a relationship, here relationship describes the whole thing, from one entity/object class to another entity/object class. This is a
bit more intuitive, but it does require two sets of attributes – optionality, cardinality, and a name for the “from” entity… and optionality, cardinaity, and a name for the
“to” entity….
It also means that the model cannot show which end of the relationship is involved in defining the virtual entity. Presumably this can be inferred by the entity/object class that is used
in one of the entity definition elements that are the use in the definition of the virtual entity, but this is not guaranteed.
Relational Design – Tables and Columns
So, enough with the requirements analysis. What about design? First, if the system is to be based on relational technology, we must describe the database. Table and column are shown in Figure 3. A
table is a collection of data about something, organized into a set of rows and columns. A column defines a particular thing to be described by the table, and a row is an occurrence of the thing
the table is about. Each column must be part of a single table, while each table may be composed of one or more columns.
For our purposes here, “Name” is the only attribute required of table, although probably others can be specified. Attributes of column are (in addition to “Name”, of course)
“Format” and “Length”.
If the symbol to the left of an attribute name is an asterisk (*), then the attribute is mandatory. A value must be specified for each row in the table. If the symbol is a circle
(o), then the attribute is optional.
To mitigate controversy in these articles, unique identifiers (primary keys) are not shown.
Relational theory does not provide for the notion of calculated columns. The idea is that a calculated column duplicates the information already contained in the columns from which it was derived.
That may be so, but it is often useful to design in (at least conceptually) computed columns. This is shown in Figure 4, with the new sub-types for column. The attribute “in or out
indicator” tells whether the calculation is done at the time rows are populated in the table involved, or if it is done only when data are retrieved. In either case, the presence of a
computed column implies that a module is present that will do the calculations.
The primary attribute of each computed column is “Formula”, and the formula must be parsed into (composed of) one or more computed column elements. Each computed column element
may be the use of one and only one other column, or of a “Constant”. The structure here is remeniscent of a Hewlett Packard calculator, that uses the Reverse Polish Notation.
An expression such as “A+B*C + 10” is made up of the following elements:
Column | Constant | Operator |
A | plus | |
B | plus | |
C | multiply | |
10 | plus |
According to relational theory, each row of a table must be identified by one or more unique keys. This is shown in Figure 5. Each unique key must be for a table, of course, and
it must be composed of one or more columns.
Each unique key must be either a primary key or an alternate key.
Note that there are two business rules that apply to this model that cannot be directly represented by it:
- A table must be identified by only one primary key. It may be identified by any number of alternate keys.
- All columns that are part of a unique key must be part of the same table that the unique key is for.
Unique key has no attributes here other than “Name”.
A relationship end (from the previous article) is implemented in a relational database by means of a foreign key. That is, a foreign key is the mechanism by which tables can be related to each
other. Foreign key is shown in figure 6.
Each foreign key represents a constraint on a table. It is a reference to the primary key of the same or another table. This is achieved by its being composed of one or more
foreign key elements, each of which is a reference to a column that is part of the primary key for the table being referred to in the foreign key. Each foreign key element is also the use
of a column that is part of the table constrained by the foreign key.
As the above sentences suggest, this model is subject to two business rules that it cannot express directly:
- All foreign key elements that are part of a foreign key must be references to columns that are part of a single primary key. That primary key is for the
table being referred to in the foreign key. - Each foreign key element that is part of a foreign key must be the use of a column that is part of the same table that the foreign key is a constraint on. Indeed, that column
must be part of the primary key for that table.
Each foreign key has a “Name”, but there are no attributes for foreign key element.
Modules
Now, about the programs in our new system. The basic unit of programming is called a module. (In this catalogue, that is. Clearly, other terms could be used.) Each module must be written in a
particular programming language. Naturally, any programming language worth mentioning is the language of many modules.
In addition to “Name”, “Scenario” is shown here as an attribute of module. This links this specification to a document that describes the operation of the module. Each
programming language has a “Name”.
These days, a large proportion of the program modules written are for the purpose of displaying or entering data through an interactive screen or window. That is, as shown in Figure 8, a module may
be the implementation of one or more windows. Here, of course, a “window” is not a collection of glass and wood or aluminum in a wall, but a representation of a body of data on
a computer terminal or personal computer. As such, it may have component parts, here called display groups, where a display group is part of a window. That is, each window may be
composed of one or more display groups. Indeed, a window may be composed of one or more other windows.
Display groups can be classified, such that each display group must be an example of one and only one display group type.
In addition to “Name”, window has the attribute “Modal indicator” specifying whether or not opening the window locks up the screen and prevents the use of other windows.
These windows have stuff on them. Each has either data entry fields, buttons, or other icons for various purposes. In Figure 9, we represent this by showing that each window and each display group
may be composed of one or more of what are here called interface elements. That is, each interface element is an element on the screen that is part of either a display group or a
window.
An interface element must be either a field, displaying or accepting data, a trigger, the touching of which causes some action to take place, or a display element, which is just there to make the
window look pretty.
A field must be an example of a field type, such as “text field”, “radio button”, and so forth. It must be the use of a column, although this could be a
computed column.
A trigger is of a module. That is, pointing to and clicking on a trigger causes a module to be executed. A trigger must be an example of one and only one trigger type.
Interface element can have lots of attributes. Among them are:
- Name – the internal name by which the element is referred in program code.
- Label – the text that is displayed next to the field.
- Display order – the position of this interface element relative to others in the same window or display group.
In addition, field may have the following attributes:
- Default value – if this is an enterable field, the value taken if nothing is entered.
- Editable filter – whether or not the interface element can be edited once data are entered into it.
Connections
So, what are the relationships between the analysis elements presented in the previous issue and the design elements presented here?
Figure 10 shows how a business function may be mapped to a program module. A function implementation is the fact that a particular function is at least in part implemented by a particular module.
That is, each function implementation must be of a function as a module.
No attributes are required for function implementation, but a “Description” could be useful.
Figure 11 shows that an entity may be implemented with one or more tables and that each table may be the implementation of one or more entities. This is represented by entity implementation, where
each entity implementation must be of an entity as a table. That is, each table may be based on one or more entity implementations, each of which must be of one and only one entity. Alternatively,
each entity may be the basis for one or more entity implementation, each of which must be as one and only one table.
Just as entities are implemented as tables, then attributes of entities are implemented as columns in tables. An attribute implementation is of an attribute as a column. That is, each attribute may
be the basis for one or more attribute implementations, each of which must be as a column.
Relationship ends are implemented as foreign keys. One of the reasons that foreign keys do not belong in a conceptual entity/relationship model is that they are mechanisms for implementing the
relationships that are already shown. There is no need for them, and they clutter that diagram. It is only when we move to relational design that foreign keys come into play.
In Figure 12, a relationship implementation is the fact that a particular relationship end is implemented by a particular foreign key. That is, each relationship implementation must be of a
relationship end as a foreign key.
Responses
As expressed last issue, should you, dear reader, take exception to any of the models presented above – good! It is about time we had a discussion on the specific content we expect in a
repository, instead of being surrounded by fluff pieces talking about what a good idea it is.
Tell me exactly which assertions (entities and/or relationships) you disagree with. The purpose of a data model is to be wrong. This one represents your author’s best guess as to the truth,
and it is there for people to correct.
Please either write to me at davehay@essentialstrategies.com or post your disagreements to the Data Management Mailing list. You may subscribe
to this list by sending an e-mail to dm-discuss-subscribe@egroups.com, or go to its homepage at www.egroups.com.
Alternatively, if you think these models are completely wrong, please submit an article to TDAN describing your counter argument. Send it to rseiner@tdan.com.
In your disagreements, I ask only two things:
1. The model is a set of assertions in the form:
“Each | |
must be | where the line next to the first entity is solid |
or | |
may be | where the line next to the first entity is dashed |
one or more | where there is a “crow’s foot” next to the second entity |
or | |
one and only one | where there is no “crow’s foot” next to the second entity |
(For example, “Each column must be part of one and only one table; each table may be composed of one and only one column.)
Please express counter assertions in the same form. Yes, it is true that this is an unconventional approach to defining relationship names, and it is hard. But it is hard because to come up with a
reasonable name (that sounds perfectly obvious to the reader), you must really understand the nature of the relationship.
If UML is used, each can be shown as a role name.
2. If you draw an alternative model, organize it so that the crow’s feet (or the asterisks, if you use UML) are to the left or the top of the model. This tends to put reference entities in
the lower right part of the diagram, and intersect or transaction entities in the upper left. It provides a consistent organization for the drawing, and makes it easier for all to see where the
differences are.