Data Modeling. The art of cutting corners.
Have you ever thought why the most difficult book is the one entitled “Introduction to…”. It means that the author knows much more on the subject that can fit even in a thick volume. Such a
book usually has a number of levels of understanding and can serve beyond just the initiation into the subject.
The author is balancing between the significant knowledge he wants to share and the need to make the narration comprehendible by the inexperienced reader. The readers, on the other hand, accept
working with the tough book in a hope to gain a competitive edge.
Data modeling reminds me of the above situation. Experienced data modelers use knowledge from previous assignments and books. They know how it should be done to be perfect, and 100% correct. Do we
ever implement a 100% correct model? Each project has limitations: budget, time, developer’s experience, management commitment, etc. The data modeler’s maturity is not in observing the 3rd normal
form; it is in making the right compromises, in the art of cutting right corners. An experienced practitioner balances needs of the most likely future against the limitations of today. Developers
should accept additional complexities of SQL queries to prolong the expected life of software.
Model to Implementation Mappings
With the high turnover rate of the industry, adequate documentation is a must during software maintenance. This need gives an excellent opportunity to expand Data Administration function into the
software maintenance support.
The maintenance enhancements don’t lead to the design of a completely new database, but are limited instead to the confines of existing data structures. The developers who support the application,
not the high-level analysts, handle them. With limited project scope the developers have no motivation to understand the logical data model. Abstraction of the existing tables into the logical
model and consecutive interpretation of the new elements back into the physical world becomes the Data Administrator’s (DA) burden. The adequate physical-logical mapping is crucial here.
For a number of reasons the physical implementation differs from the logical data model. The usual performance improvements for the operational data store lead to denormalization. The attributes,
inherited from the supertype, are repeated in multiple subtype tables. The atomic data warehouse tables add the overhead of keeping value changes over time. The data mart design collapses a chain
of parent-child relationships into a single dimension table.
Base Attribute Mapping
The simplest way to map physical to logical is to identify a base attribute for each column.
Figure 1. Base Attribute Mapping
As a result,
- Each physical column gets documentation in a full business name and a definition. Developers start to appreciate the Data Administrator’s work.
- Requests to add table columns point the Data Administrator to the appropriate spots in the model. The DA studies the existing model and uses it as a foundation for discussions.
However, the base attribute mapping is imprecise and generic. It’s difficult to help developers who deal with details if all you have is a big picture vaguely correlated to the real world. Any
change still requires additional interpretation of the physical schema. This time-consuming effort is repeated over and over again.
Tailoring of the logical model after the physical data structures minimizes this effort, so the models used for maintenance have a tendency to closely follow the physical schema.
Transformation Mapping
The objective is to strike a balance between a need for information, and an ability to maintain the documentation up to date. This can be accomplished by
- building it as an extension to the base attribute mapping;
- keeping it relatively simple, limiting to the frequently used scenarios;
Some of the transformations and corresponding requirements for documenting the physical-logical mapping are described below:
Redundant Attribute
To improve application performance, an attribute from a parent entity might be redundantly stored in a child entity. The transformation could be documented by referencing the base attribute, and
all relationships within the access path from the parent to the child entity. The resulted column definition can be compiled by tracing access path from the child entity back to the base attribute.
For example: a number of EMPLOYEES are assigned to a DEPARTMENT. The DEPARTMENT is identified by the DepartmentCode. The DepartmentCode column is propagated to the EMPLOYEE table as a foreign key.
Figure 2. Employees and Departments
All management reports refer to the departments by their names instead of codes. To avoid joins with the DEPARTMENT table, and to improve query performance the development team added DepartmentName
to the EMPLOYEE table.
EmployeeId | EmployeeName | DepartmentCode | DepartmentName |
101 | Joe | HR | Human Resources |
102 | Carl | ENG | Engineering |
103 | Brenda | ENG | Engineering |
Without arguing the merits of the decision, let’s consider the consequences. The DepartmentName is redundantly stored in a child entity. The new column does not fit into a proper logical model,
but ignoring it altogether creates the maintenance headaches.
It can be mapped to the Department Name attribute from the DEPARTMENT entity, the DEPARTMENT entity, and the EMPLOYEE-DEPARTMENT relationship. The following is the compiled definition for the
EMPLOYEE. [DepartmentName]:
An EMPLOYEE is assigned to a DEPARTMENT. A DEPARTMENT is an organizational unit within a company. Department Name is a textual label, used to distinguish the DEPARTMENT.
Pivoting Transformation
Pivoting transforms a relational table into a ‘spreadsheet’. It creates a repeating attribute (or repeating group of attributes). The table grows sideways, and shrinks in the number of rows.
For example: The EXPENSE entity has three attributes. It keeps track of an Expense Amount by Expense Type and Period. The Expense Types are Travel Expense, Meals, and so on.
Period | Expense Type | Expense Amount |
July ’99 | Travel Expense | $150 |
Meals | $ 30 | |
Tuition | $220 |
The physical implementation has a column for Period and a number of dollar amount columns. It has one column for TravelExpenseAmount, another for MealsExpenseAmount, and so on. In this case all
columns have the same base attribute, Expense Amount. The base attribute mapping can’t tell the difference between TravelExpenseAmount and MealsExpenseAmount.
Period |
TravelExpense Amount |
MealsExpense Amount |
TuitionExpense Amount |
July ’99 | $150 | $ 30 | $220 |
The transformation documentation includes the base attribute, the pivoting attribute and the value of the pivoting attribute. In the above example, the TravelExpenseAmount column maps to the
Expense Amount attribute, when the Expense Type is ‘Travel Expense’.
Data Warehouse Snapshots
Applying historical perspective to the Logical Model creates an Atomic Data Warehouse. The rows, whose effective-termination period has an overlap with a snapshot begin-end period, are selected.
Each attribute is aggregated over the snapshot period. The aggregation might be totaling, averaging, counting, etc. The snapshots are accumulated over time.
The meaning of each attribute in the Data Warehouse changes because it is defined over a period of time, instead of a point-in-time as in the operational data store. If the attribute is averaged,
the semantic difference is small, and might be neglected. Mapping for this transformation includes the base attribute and the fact of taking a snapshot.
On the other hand, either the totaling or the counting over the snapshot period of time might produce a completely new datum.
Meta Data Transformations Tool
A tool was developed in Microsoft Access to perform Meta-data Transformations. It accepts data from the CASE tool and the database catalog, and provides a graphical interface for mapping. If you
are interested in this, you can contact Data Cartography for details.
The tool was successfully used for a model of about 200 entities, which was implemented as 3 separate databases. One of the databases broke all the relational rules and, because of that was,
extremely difficult to support.
The process of mapping by itself gave a significant boost of confidence to the Data Administrator. Now, all the change requests were immediately decoded to the logical model constructs. In that
context, they were much easier to understand and steer towards more relational resolution. The tool also helped to transfer knowledge to the next DA.
Advantages of Documenting the Meta Data Transformations
- Better documentation provides for easier knowledge transfer, streamlines the software maintenance process.
- Full life-cycle software support promotes the Data Administrator’s function within the IT organization. It changes the Data Administration image from being analytical overhead to a practical
and intrinsic part of IT. - Flexible cross-referencing eliminates the need for a logical model to closely follow the physical design. It allows the modernization of the logical model with new ideas and concepts without
losing information needed for the support of an installed base of applications.