Author: Len Silverston, W.H.Inmon, Kent Graziano
Does your organization have an enterprise data model? How long would it take to create one? What about an enterprise data warehouse design? The Data Model Resource Book by Len Silverston, W.H.
Inmon, and Kent Graziano can help in this area. The Resource Book provides library of logical data models and sample data warehouse and data mart designs. It highlights the importance of
subject-area modeling and shows the relationship among high-level models, logical data models, warehouse designs, and data mart designs. It also provides a methodology to transform logical data
models into data warehouse designs. But what are data models anyway?
Data models represent information areas of interest. While there are many ways to create data models, two modeling methodologies stand out—top-down and bottom-up. Bottom-up models are often
the result of a reengineering effort. They usually start with existing data structures–forms, fields on application screens, or reports. These models are usually physical, application-specific,
and incomplete from an enterprise perspective. They may not promote data sharing, especially if they are built without reference to other parts of the organization.
Top-down logical data models, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a
logical model, but the model serves as a reference point or template. Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and
by consistently referencing a subject-area model. Unfortunately, in many environments the distinction between a logical data model and a physical data model is blurred. In addition, some CASE tools
don’t make a distinction between logical and physical data models.
Data models are the result of a conscious, deliberate effort to understand and represent business data. But what good are they, especially in the context of a data warehouse? Data models and the
modeling process are useful in a number of ways:
Communication. Data modeling can facilitate dialog among members of the business and IT community. Business people have the opportunity to share experience, come to a common understanding, and
see different points of view. Consistent Representation of Corporate Data. Modeling standards allow for data to be represented in the same way across the enterprise. This is especially important for
data elements which may be represented in multiple ways. Common examples include phone numbers, social security numbers, dates, and addresses. Data Integrity. A normalized data model implemented in a
database improves data integrity by implementing referential integrity and reducing data redundancy. Data Flexibility. In the context of a data warehouse, only application-neutral, subject-oriented
data structures can easily satisfy needs for business intelligence across several departments. For example, a customer-profitability model might require data from several subject areas, including
customer (customer service calls), finance (revenue and cost information), and human resources (hourly wage information). If the data warehouse is “tuned” for a specific area of reporting or
analysis, easily extracting data for other business areas may be difficult. Warehouse data structures that are flat or “tuned” for a specific report often require a dedicated staff of report
writers to satisfy business intelligence needs. Notation
The modeling notation in the Resource Book is somewhat “standard.” You’ll see rounded rectangles (entities), lines, and “crow’s feet.” Entities are named with singular nouns.
Primary keys are marked with a “pound” sign (“#”). Supertype/subtype relationships are shown as a box-in-a-box. Primary keys are not migrated to related entities.
The Subject Areas
The main part of the book presents seven logical data models of core subject areas common to most industries:
- People and Organizations
- Ordering Products
- Order Delivery and Invoicing
- Work Effort
- Accounting and Budgeting
- Human Resources
The chapter on each subject area describes key features of the subject area, entities, attributes, and the subject area’s relationship to other subject areas.
Levels of Models
The Resource Book contains one of the best notations I have seen for showing the relationship among high-level models, mid-level models, and physical designs. Figure 1 shows major subject areas,
while Figure 2 shows the relationship among the subject areas, mid-level models, and physical models. High-level models are appropriate for business presentations. Mid-level logical data models are
appropriate for business-area analysis and transformation into physical designs for implementing systems. The physical model in Figure 2 is implicit. The book doesn’t present any physical
data models for operational systems (as opposed to decision support systems). Rather, it assumes that logical models are transformed into physical designs.
Figure 1. High-Level Model
Figure 2. Relationship among models and designs
Transforming the Logical Model
While the methodology for creating a logical data model is well-known, the methodology for creating a data warehouse model from a logical (or physical) data model is not as well known. In the
Resource Book, logical data models are transformed into data warehouse designs. Key transformations include
- Removing purely operational data
- Adding an element of time
- Merging data from multiple tables into one table
- Adding derived data
- Creating data artifacts
- Sample Enterprise Data Warehouse Model
Chapter Ten applies the transformation rules just mentioned to derive a sample data warehouse data model. Purely operational data (such as comments) are removed. Invoice and invoice detail tables
are merged. A “load_date” column is added to several tables. These transformations result in three subject areas appropriate for business analysis: sales, budgeting and purchasing, and human
resources. The data warehouse data model consists of several denormalized and reference tables, such as geography and products. The enterprise data warehouse design is suitable for decision support
or for a repository, whose data will eventually be distributed to a departmental data mart.
Sample Data Mart Schemas
If the enterprise data warehouse or data warehouse design is inappropriate for departmental analysis, designers may want to create a departmental data mart. The Resource Book presents two data mart
schemas for sales analysis and human resource analysis. These schemas are standard “star” schemas. Star schemas contain a “central table, called a fact table, with relationships to several
lookup-tables called dimensions.” Star schema structures allow for browsing of dimension tables and generally only require a few joins to get useful information.
The Resource Book is one of the more useful books I have seen on data architecture. It is just what it says: an (excellent) resource library for seven logical data models, one warehouse design, and
two data mart designs. These data models and designs can be used as templates or starting points for your own modeling, an introduction to subject areas you might not be familiar with, validating
your existing models, and a help to building a corporate data model. The logical models tend to be very complete. You probably won’t need all their features, but they provide a good
The book provides a good notation for showing the relationship among high-level models, mid-level models, and data warehouse and data mart designs. Instance tables (sample data) help bring the
models to life. The book also provides a good methodology for transforming logical data models to data warehouse designs. I do have one nit: I think the logical data models would be more readable
if primary keys were migrated into the related entities, especially when an entity receives keys from several (four or five) other entities. All in all, the book is an extremely useful resource.
The Resource Book is published by John Wiley & Sons. A CD-ROM of SQL scripts to create the models via a CASE tool is also available. The ISBN number for the CD-ROM is 0-471-15366-4.