|
A Repository Model - The Relational Design Model
Published: October 1, 2000
Published in TDAN.com October 2000
"There was a young fellow named Corey
Whose career was not showered in glory, He had a really bad day When he just couldn’t say . . . Meta Data Repository"
Publisher's Note:
To request a version of this article that has been updated since the original publishing of this article in TDAN.com 14.0, please click here or send the publisher an email with the subject "Request for Updated Hay Article in TDAN.com 14.0". Last issue in an article in this publication - http://www.tdan.com/view-articles/4869/ - I presented the first components of a data catalogue (“metadata repository” in the current argot), in data model form. In that article, emphasis was placed on the elements required to support analysis – entities, attributes, relationships, and so forth. This issue was to have covered the area of design, but that topic has turned out to be bigger than expected. For that reason, this issue will cover relational database and system design, while object-oriented design will be saved for a later issue. As before, this article makes the point that, in most situations, there are relatively few very well defined things that we want to keep track of in a catalogue. To model these things should not be very difficult. These articles present a simple set of models to describe a catalogue that will support a typical application. Yes, these are sketches, and they could certainly be made more elaborate. But they should accurately represent at least those things they set out to represent – concisely and in concrete terms. Readers are encouraged to disagree with the particulars of these models. The nice thing about data modeling is that it gives us a very good language with which finally to clarify what we disagree about. Responses Before moving into the design model, it is worth while to respond to some of the correspondents who commented on the last model. Two respondents commented on the “entity view” concept, and both anticipated a change that your author made as soon as the article was published. Figure 1 shows the model from that issue, with each entity view being composed of one or more entity definition elements each of which must be the use of an entity/object class. The problem with this is that each entity view is really a virtual entity, which has many of the same characteristics and relationships as any other kind of entity/object class.
Figure 1 : The Original Entity Views
A better approach is shown in Figure 2. Here, a virtual entity is simply another kind of entity/object class. As in the previous model, this virtual entity may be composed of one or more entity definition elements, each of which must be the use of another entity/object class. Also, as in the previous model, each virtual entity may be composed of one or more attribute definition elements or relationship definition elements. For example, the virtual entity “vendor” is composed of the entity party, plus the relationship buyer in one or more contracts.
Figure 2 : The New Virtual Entity
Note that Figure 2 has one other change as well. Where the definition of a virtual entity is clearly superior, the change from role/relationship end to simply relationship. Where previously, role/relationship end described just half of a relationship, here relationship describes the whole thing, from one entity/object class to another entity/object class. This is a bit more intuitive, but it does require two sets of attributes – optionality, cardinality, and a name for the “from” entity... and optionality, cardinaity, and a name for the “to” entity.... It also means that the model cannot show which end of the relationship is involved in defining the virtual entity. Presumably this can be inferred by the entity/object class that is used in one of the entity definition elements that are the use in the definition of the virtual entity, but this is not guaranteed. Relational Design – Tables and ColumnsSo, enough with the requirements analysis. What about design? First, if the system is to be based on relational technology, we must describe the database. Table and column are shown in Figure 3. A table is a collection of data about something, organized into a set of rows and columns. A column defines a particular thing to be described by the table, and a row is an occurrence of the thing the table is about. Each column must be part of a single table, while each table may be composed of one or more columns. For our purposes here, “Name” is the only attribute required of table, although probably others can be specified. Attributes of column are (in addition to “Name”, of course) “Format” and “Length”.
A word about the attributes shown in these diagrams:
If the symbol to the left of an attribute name is an asterisk (*), then the attribute is mandatory. A value must be specified for each row in the table. If the symbol is a circle (o), then the attribute is optional. To mitigate controversy in these articles, unique identifiers (primary keys) are not shown.
Figure 3: Tables and Columns
Relational theory does not provide for the notion of calculated columns. The idea is that a calculated column duplicates the information already contained in the columns from which it was derived. That may be so, but it is often useful to design in (at least conceptually) computed columns. This is shown in Figure 4, with the new sub-types for column. The attribute “in or out indicator” tells whether the calculation is done at the time rows are populated in the table involved, or if it is done only when data are retrieved. In either case, the presence of a computed column implies that a module is present that will do the calculations. The primary attribute of each computed column is “Formula”, and the formula must be parsed into (composed of) one or more computed column elements. Each computed column element may be the use of one and only one other column, or of a “Constant”. The structure here is remeniscent of a Hewlett Packard calculator, that uses the Reverse Polish Notation. An expression such as “A+B*C + 10” is made up of the following elements:
Figure 4 : Computed Columns
According to relational theory, each row of a table must be identified by one or more unique keys. This is shown in Figure 5. Each unique key must be for a table, of course, and it must be composed of one or more columns. Each unique key must be either a primary key or an alternate key. Note that there are two business rules that apply to this model that cannot be directly represented by it:
Unique key has no attributes here other than “Name”.
Figure 5 : Unique Keys
A relationship end (from the previous article) is implemented in a relational database by means of a foreign key. That is, a foreign key is the mechanism by which tables can be related to each other. Foreign key is shown in figure 6. Each foreign key represents a constraint on a table. It is a reference to the primary key of the same or another table. This is achieved by its being composed of one or more foreign key elements, each of which is a reference to a column that is part of the primary key for the table being referred to in the foreign key. Each foreign key element is also the use of a column that is part of the table constrained by the foreign key. As the above sentences suggest, this model is subject to two business rules that it cannot express directly:
Each foreign key has a “Name”, but there are no attributes for foreign key element.
Figure 6 : Foreign Keys
ModulesNow, about the programs in our new system. The basic unit of programming is called a module. (In this catalogue, that is. Clearly, other terms could be used.) Each module must be written in a particular programming language. Naturally, any programming language worth mentioning is the language of many modules. In addition to “Name”, “Scenario” is shown here as an attribute of module. This links this specification to a document that describes the operation of the module. Each programming language has a “Name”.
Figure 7 : Modules
These days, a large proportion of the program modules written are for the purpose of displaying or entering data through an interactive screen or window. That is, as shown in Figure 8, a module may be the implementation of one or more windows. Here, of course, a “window” is not a collection of glass and wood or aluminum in a wall, but a representation of a body of data on a computer terminal or personal computer. As such, it may have component parts, here called display groups, where a display group is part of a window. That is, each window may be composed of one or more display groups. Indeed, a window may be composed of one or more other windows. Display groups can be classified, such that each display group must be an example of one and only one display group type. In addition to “Name”, window has the attribute “Modal indicator” specifying whether or not opening the window locks up the screen and prevents the use of other windows.
Figure 8 : Windows
These windows have stuff on them. Each has either data entry fields, buttons, or other icons for various purposes. In Figure 9, we represent this by showing that each window and each display group may be composed of one or more of what are here called interface elements. That is, each interface element is an element on the screen that is part of either a display group or a window. An interface element must be either a field, displaying or accepting data, a trigger, the touching of which causes some action to take place, or a display element, which is just there to make the window look pretty. A field must be an example of a field type, such as “text field”, “radio button”, and so forth. It must be the use of a column, although this could be a computed column. A trigger is of a module. That is, pointing to and clicking on a trigger causes a module to be executed. A trigger must be an example of one and only one trigger type. Interface element can have lots of attributes. Among them are:
In addition, field may have the following attributes:
Figure 9 : Interface Elements
ConnectionsSo, what are the relationships between the analysis elements presented in the previous issue and the design elements presented here? Figure 10 shows how a business function may be mapped to a program module. A function implementation is the fact that a particular function is at least in part implemented by a particular module. That is, each function implementation must be of a function as a module. No attributes are required for function implementation, but a “Description” could be useful.
Figure 10 : Function Implementations
Figure 11 shows that an entity may be implemented with one or more tables and that each table may be the implementation of one or more entities. This is represented by entity implementation, where each entity implementation must be of an entity as a table. That is, each table may be based on one or more entity implementations, each of which must be of one and only one entity. Alternatively, each entity may be the basis for one or more entity implementation, each of which must be as one and only one table. Just as entities are implemented as tables, then attributes of entities are implemented as columns in tables. An attribute implementation is of an attribute as a column. That is, each attribute may be the basis for one or more attribute implementations, each of which must be as a column.
Figure 11 : Entity and Attribute Implementations
Relationship ends are implemented as foreign keys. One of the reasons that foreign keys do not belong in a conceptual entity/relationship model is that they are mechanisms for implementing the relationships that are already shown. There is no need for them, and they clutter that diagram. It is only when we move to relational design that foreign keys come into play. In Figure 12, a relationship implementation is the fact that a particular relationship end is implemented by a particular foreign key. That is, each relationship implementation must be of a relationship end as a foreign key.
Figure 12 : Relationship Implementations
ResponsesAs expressed last issue, should you, dear reader, take exception to any of the models presented above – good! It is about time we had a discussion on the specific content we expect in a repository, instead of being surrounded by fluff pieces talking about what a good idea it is. Tell me exactly which assertions (entities and/or relationships) you disagree with. The purpose of a data model is to be wrong. This one represents your author’s best guess as to the truth, and it is there for people to correct. Please either write to me at davehay@essentialstrategies.com or post your disagreements to the Data Management Mailing list. You may subscribe to this list by sending an e-mail to dm-discuss-subscribe@egroups.com, or go to its homepage at www.egroups.com. Alternatively, if you think these models are completely wrong, please submit an article to TDAN describing your counter argument. Send it to rseiner@tdan.com. In your disagreements, I ask only two things: 1. The model is a set of assertions in the form:
(For example, “Each column must be part of one and only one table; each table may be composed of one and only one column.) Please express counter assertions in the same form. Yes, it is true that this is an unconventional approach to defining relationship names, and it is hard. But it is hard because to come up with a reasonable name (that sounds perfectly obvious to the reader), you must really understand the nature of the relationship. If UML is used, each can be shown as a role name. 2. If you draw an alternative model, organize it so that the crow’s feet (or the asterisks, if you use UML) are to the left or the top of the model. This tends to put reference entities in the lower right part of the diagram, and intersect or transaction entities in the upper left. It provides a consistent organization for the drawing, and makes it easier for all to see where the differences are. Go to Current Issue | Go to Issue Archive Recent articles by David C. Hay
David C. Hay - In the information industry since the days of punched cards, paper tape and teletype machines, Dave has been producing data models to support strategic and requirements planning for more than twenty
years. He has worked in a variety of industries, including, among others, banking, clinical pharmaceutical research, and all aspects of oil production and processing.
He is the founder and President of Essential Strategies, Inc., a seventeen-year-old consulting firm dedicated to helping clients define corporate information architecture, identify requirements, and plan strategies for the implementation of new systems. Dave is the author of the book, Data Model Patterns: Conventions of Thought, and Requirements Analysis: From Business Views to Architecture. His new book Data Model Patterns: A Metadata Map is a comprehensive schema of metadata from many different perspectives. He has also spoken at numerous international and local DAMA conferences, Oracle user group conferences, and many others.
He can be reached at dch@essentialstrategies.com, (713) 464-8316, or via his company's website at http://www.essentialstrategies.com. |