Some of the material presented in this article was excerpted with permission from The Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs published by John Wiley and Sons and authored by Len Silverston, W. H. Inmon and Kent Graziano.
The Concept
The age of the data modeler as artisan is passing. Organizations can no longer afford the labor or time required for handcrafting data models from scratch. In response to these constraints, the age of the data modeler as engineer is dawning.
Engineers build new products using proven components and materials. In data modeling, the analogue to a component is a “universal data model.” A universal data model is a generic or template data model that can be used as a building block to jump-start development of the corporate data model, logical data model or data warehouse data model.
Resistance to the use of universal data models is usually based on the belief that a particular organization has unique needs or the dreaded “not invented here” syndrome. This article describes the application of universal data models to several disparate organizations. It demonstrates that the same basic models, with minor customization, can be successfully applied in each example.
One Size Fits All?
The belief that a particular organization is unique because of its missions, goals, policies, values, functions, processes and rules can be very strong. After all, some businesses sell to people and others sell to other organizations. Some deal with products and others deal with services. Each industry has its own set of business issues, and each organization within an industry varies as much as the differences between the personalities of various individuals.
People and Organizations
A subject data area that is common to most enterprises involves the people and organizations that are part of conducting business. There is an important need to track the names, addresses, contact numbers and various relationships and interactions between the parties conducting business. Enterprises need to track information about customers, distributors, agents and suppliers as well as the internal organizations and people within the enterprise. This type of information is critical throughout all aspects of business including sales, marketing, customer service, purchasing, shipping, invoicing, budgeting, accounting and human resources.
Enterprises spend significant effort and time defining the most effective ways to model this type of information. The data model may lead to sub-optimal solutions if careful analysis is not conducted. For example, many data models depict separate entities for each type of party that exists in an enterprise. There may be entities for CUSTOMER, SUPPLIER, INTERNAL ORGANIZATION, BROKER, EMPLOYEE, INVESTOR and any other role that a person or organization may play in the enterprise.
There are problems with modeling the information this way. What if a person or organization plays more than one role in the organization? For instance, what if an organization supplies products and/or services to our organization and also buys products from us? Does this mean that we maintain their name, addresses, contact numbers and other organizational information in both the CUSTOMER and SUPPLIER entities? Under this scenario, if a name or address changes, the information needs to be changed in two places. Furthermore, does the organization play other roles such as an agent of the company or a distributor of products? Each time an organization’s role is modeled as a separate entity, there is a potential for redundant and inconsistent information.
The same argument applies to people. Should we have a separate EMPLOYEE entity as well as a CONTRACTOR entity? What if a contractor becomes an employee of the enterprise or vice versa? The person’s name, demographics and contact information may still be the same. The only thing that has changed is the nature of the relationship between the parties.
It only makes sense to refer to pre-defined templates or universal data models when modeling common data structures. Universal data models can point out the most effective means to maintain this information and assure that subtle, yet important, data integrity issues are not overlooked.
People and Organization Information
Figures 1, 2, 3 and 4 depict universal data models for the people and organizations involved in conducting business. These data models include information concerning the relationships between each person and organization as well as their associated contact information.
Before beginning our discussion of these models, let’s clarify some data modeling conventions. Entities are represented using rounded-edge rectangular boxes. Sub-types are represented by showing boxes within the larger box. For example, in Figure 1, ORGANIZATION and PERSON are both represented as sub-types of PARTY.
The lines between entities define relationships. The dashed section of each line represents that a relationship is optional. For example, in Figure 1, a PARTY does not necessarily have an associated PARTY DEFINITION. The part of the line closest to PARTY DEFINITION is solid, and this represents a mandatory relationship. Therefore, each PARTY DEFINITION must be associated with a related PARTY. The small crossed line across the PARTY DEFINITION to PARTY relationship specifies that the key to PARTY (party_id) is included as part of the key to PARTY DEFINITION.
The “crows feet” (three small lines at the end of each relationship line) denote a one-to-many (1:M) relationship. For example, Each PARTY may be defined by one or more PARTY DEFINITIONs. Vice versa, each PARTY DEFINITION must be used to define one and only one PARTY since the line from PARTY DEFINITION to PARTY does not end with a “crows foot.”
Now let’s discuss the data models. Figure 1 identifies a super-type named PARTY, with two sub-types, PERSON and ORGANIZATION. Information about a person or organization is maintained independent of their roles or relationships. This leads to a much more stable and normalized data structure since information about various people and organizations is only stored once. The same information can then be associated with each of the party’s roles.
The reason that PERSON and ORGANIZATION are both sub-typed into a PARTY entity is that there is common information related to both people and organizations such as their credit rating, credit limit, address, phone number, fax number or e-mail address. Additionally, organizations and people can serve in similar roles. Both people and organizations may be buyers, sellers, members or parties to a contract. Parties may be classified into various categories (i.e., industry codes, minority classifications) using the PARTY DEFINITION which stores each category into which parties may belong.
Figure 2 depicts that each PARTY may be involved in one or more PARTY RELATIONSHIPs. PARTY RELATIONSHIP is used to define the relationship between two parties. An occurrence of a PARTY RELATIONSHIP may be between two organizations, such as a customer relationship to an internal company. The relationship may be between a person and an organization–for example, an employee of an internal company. Finally, the relationship may be between two people. An example of this is the relationship between a purchasing agent and their preferred supplier representative. The PARTY RELATIONSHIP TYPE defines the possible types of relationships. Possible instances of PARTY RELATIONSHIP TYPE are “employer/employee,” “parent/subsidiary,” and “customer/customer representative.” The PARTY TYPE ROLE defines the two parts of the relationship. For example, one role of the relationship may be “employer” and the other role for that same relationship may be “employee.” Finally, the PARTY PRIORITY and PARTY RELATIONSHIP STATUS TYPE entities allow each PARTY RELATIONSHIP to be prioritized (high, medium, low) and defined via a status (active, inactive).
By distinguishing whether information should be associated with the PARTY or the PARTY RELATIONSHIP, we can avoid data anomalies. For example, many data models associate a status with a PARTY. This does not account for the fact that three sales representatives may have three distinct relationships with the same party. Each sales representative may want to record a different status for their relationship with the party. If the status were stored with the PARTY, then the sales representatives would have to override each other’s information. In actuality, there are really three separate relationships, and the status should be associated with the PARTY RELATIONSHIP.
Figures 3 represents address or location information about parties. It shows that ADDRESS is its own entity and can be applied to many parties. The PARTY ADDRESS is a cross-reference or associative entity that allows each party to have many addresses (home address, work address) and each address to have many parties (an office location of many employees). Each PARTY ADDRESS may have many PARTY ADDRESS ROLES and vice versa. These relationships determine the purpose of the address. Examples of PARTY ADDRESS ROLE include “corporate headquarters,” “sales office” and “warehouse.”
Figure 4 is a model to maintain phone numbers, fax numbers, cell numbers, e-mail addresses and all other CONTACT MECHANISMs. Instead of defining these contact mechanisms as attributes, this model provides flexibility in allowing as many contact mechanisms to be stored for a PARTY or PARTY LOCATION as needed. The CONTACT MECHANISM TYPE entity identifies the type of mechanism such as phone, fax, cellular or pager. The PARTY CONTACT MECHANISM is an associative entity that allows each CONTACT MECHANISM to be related to many PARTY ADDRESSes or PARTYs (a shared telephone number for several consultants). Conversely, each PARTY or PARTY ADDRESS may be contacted via many PARTY CONTACT MECHANISMs (a person or location with numerous contact mechanisms of different types).
The line connecting the two relationships under PARTY CONTACT MECHANISM represents an exclusive arc and states that either one of these relationships exists, but not both. A PARTY CONTACT MECHANISM may be either the mechanism to contact a PARTY or a PARTY ADDRESS. Similar to addresses, contact mechanisms may have roles. Examples of PARTY CONTACT MECHANISM ROLE TYPEs include “general information number,” “sales information” and “customer service number.”
After extensive analysis and consideration of many alternate data models, I believe that these four universal data models represent a very effective way to model people and organizations for most enterprises. Now let’s take a look at how these universal data models can be applied to specific enterprises.
A Manufacturing Enterprise
Let’s consider the needs of a particular type of enterprise, specifically a manufacturing firm. Suppose this firm manufactures personal computers. They sell their products to retail chains, distributors and directly to individuals and organizations. It is important to record contact information on each distributor and the people within those organizations. They need to track supplier information to indicate who provides PC components for their machines. Information on their end-user customers who have bought their equipment is critical. They also maintain employee information as well as information about the many subsidiaries, divisions and departments and their associated locations.
The first comments an enterprise may make about using the previously presented universal data models are “Where is the customer entity? Our most important information needs are about our customers. We need to record their credit limit, billing options and their customer status. Similarly, where are the entities for supplier, employee, distributor or internal organization?”
Each of these business entities is characterized by very common information. They all have names, addresses, phone numbers, statuses and other contact information. This leads us to the conclusion that they could be sub-typed together. Should we then modify the model to add the sub-types CUSTOMER, SUPPLIER, DISTRIBUTOR, INTERNAL ORGANIZATION and EMPLOYEE all within the PARTY entity? An issue is that a single person or organization may be involved in more than one of these relationships. For example, a distributor of the manufacturer may also be a supplier of some of their PC components. Again, we do not want to maintain more than one occurrence of the same person or organization as this can lead to data inconsistencies.
Figure 5 illustrates how the previously described universal data models can be modified to meet the information needs of our manufacturing example. For simplicity reasons, only a few important entities are shown in Figure 5, but all previously described entities also apply to our manufacturing firm.
The PARTY RELATIONSHIP is sub-typed into the applicable business relationships, CUSTOMER, SUPPLIER, DISTRIBUTOR, EMPLOYEE and INTERNAL ORGANIZATION. This allows each person or organization to be involved in one or more of those relationships. If there are other types of relationships such as sales agents, government agencies who regulate manufacturing or stockholders, they can also be defined as additional PARTY RELATIONSHIP sub-types.
The basic information about each person or organization such as their names, credit rating, addresses, phone numbers and other contact information is associated with the PARTY. The information about each relationship is stored in the PARTY RELATIONSHIP entity. All sub-types of PARTY RELATIONSHIP have a from_date, through_date and comments. Each PARTY RELATIONSHIP sub-type may have different attributes to define that specific relationship type. For example, the CUSTOMER sub-type has a credit limit, statement_day (defined as the closing day for statements), and statement_frq (defining the frequency of statements such as weekly, bi-monthly or monthly).
This model provides an extensive, flexible and stable means of maintaining person and organization information for the manufacturing organization. The only customization required was to add the PARTY RELATIONSHIP sub-types applicable to the manufacturer.
A Financial Securities Company
Is this model applicable to other enterprises such as a financial securities company? Let’s assume that this enterprise sells investment vehicles such as mutual funds, stocks, bonds and other investments to the general public, mostly through brokers.
The same type of base information is needed for people and organizations in this type of company: their names, addresses, phone numbers and information about different types of parties and relationships. The difference is in the types of business relationships involved in a financial securities firm. A financial securities company needs to track information about their brokers, investors, wholesalers (the party selling to the broker), employees and internal organizations.
Figure 6 shows that this same universal data model can be effectively used for a financial securities company by adding the applicable PARTY RELATIONSHIP sub-types and associating appropriate attributes to each type of relationship. Investment goal is an attribute of INVESTOR, annual quota is an attribute of the WHOLESALER that sells to the broker and the exclusive indicator and broker license number are stored for the BROKER sub-type.
Other Enterprises
By now, it is hopefully evident that most enterprises can use this same model and customize it by adding sub-types to represent their own party relationships. Health care enterprises may have DOCTOR, PATIENT, INSURANCE COMPANY, HMO and PPO as PARTY RELATIONSHIP sub-types. Universities may have STUDENT, FACULTY, ADMINISTRATOR, GOVERNMENT AGENCY (for grants) and DONATOR as possible sub-types. In a consulting services enterprise the party relationships may be CONSULTANT, CLIENT, INTERNAL STAFF and DEPARTMENT. Every enterprise has the same type of data structures related to people and organizations, but they are applied toward different types of relationships.
Other Models
We have only examined data models for one common aspect of business; namely, managing party and relationship information. There are many other universal data models for maintaining information on product/services, orders/agreements, shipments, time entry, invoices, accounting, budgeting and human resources. There are also universal data models for standard data warehouse design applications such as sales analysis, financial analysis and human resource analysis. I have successfully assisted enterprises in applying these template models against many industries and have found that quite often 60 percent of the universal data model constructs are applicable. This translates into data models of much better quality and substantial savings of time and cost.
Conclusion
Universal data models can substantially reduce the time to complete a corporate data model, logical data model or data warehouse design. They can lead to higher quality designs by identifying subtleties that may be overlooked by inexperienced modelers or harried modelers who may have tight project deadlines.
Businesses are very unique. However, most businesses’ underlying data structures are very similar. This article used specific examples to illustrate that universal data models can be effectively applied to diverse types of enterprises. There are many other examples of universal data models that can save tremendous amounts of time and, at the same time, produce high quality data models and data warehouse designs.
This article was previously published in Data Management Review (www.dmreview.com) magazine (Vol. 8 No. 8) in September 1998 and is being republished with their permission.