An Enterprise Data Model is an integrated view of the data produced and consumed across an entire organization. It incorporates an appropriate industry perspective. An Enterprise Data Model (EDM) represents a single integrated definition of data, unbiased of any system or application. It is independent of “how” the data is physically sourced, stored, processed or accessed. The model unites, formalizes and represents the things important to an organization, as well as the rules governing them.
An EDM is a data architectural framework used for integration. It enables the identification of shareable and/or redundant data across functional and organizational boundaries. Integrated data provides a “single version of the truth” for the benefit of all. It minimizes data redundancy, disparity, and errors; core to data quality, consistency, and accuracy.
As a data architectural framework, an EDM is the “starting point” for all data system designs. The model can be thought of much like an architectural blueprint is to a building; providing a means of visualization, as well as a framework supporting planning, building and implementation of data systems. For enterprise data initiatives, such as an Operational Data Store (ODS) or Data Warehouse (DW), an EDM is mandatory, since data integration is the fundamental principle underlying any such effort. An EDM facilitates the integration of data, diminishing the data silos, inherent in legacy systems. It also plays a vital role in several other enterprise type initiatives:
Data is an important enterprise asset, so its quality is critical. Disparate redundant data is one of the primary contributing factors to poor data quality. An EDM is essential for data quality because it exposes data discrepancies, inherent in redundant data. Existing data quality issues can be identified by “mapping” data systems to the EDM. As new data systems are built from an enterprise data model framework, many potential data quality issues will be exposed and resolved, prior to implementation.
Ownership of enterprise data is important because of its sharable nature, especially in its maintenance and administration. An EDM is used as a data ownership management tool by identifying and documenting the data’s relationships and dependencies that cross business and organizational boundaries. Thus supports the concept of “shared” ownership, essential in an enterprise data initiative.
Data System Extensibility
An EDM supports an extensible data architecture. Extensibility is the capability to extend, scale, or stretch, a system’s functionality; effectively meeting the needs of the user’s changing environment. Extendable systems have the capability to add or extend functionality with little adverse effects. An EDM, based on a strategic business view, independent of technology; supports extensibility; enabling the movement into new areas of opportunity with minimal IT changes.
Industry Data Integration
No business operates in a vacuum. Because an EDM incorporates an external view, or “industry fit,” it enhances the organization’s ability to share common data within its industry. Organizations within the same industry oftentimes consume some of the same basic data such as: customers, locations, and vendors. Organizations can also share data with related industries or “business partners.” For example, within the airline industry, data is often “shared with car rental companies. An EDM, with its industry perspective, incorporates a framework for industry data integration.
Integration of Packaged Applications
An EDM can be used to support the planning and purchasing of packaged applications, as well as their integrated implementation. This is accomplished through “mapping” the packaged application to the EDM, establishing its “fit” within the enterprise. Since existing systems are also “mapped” to the EDM, the integration points between the packaged application and existing systems can be identified, providing a road map for the flow of consistent quality data through the packaged product.
Strategic Systems Planning
Since an EDM is independent of existing systems, it represents a strategic view. It also identifies data dependencies. As existing systems are mapped to the EDM, a strategic gap analysis can be
performed, identifying the business’s strategic information needs. From the gap analysis and data dependencies, prioritization of data systems releases can be determined.
The enterprise data modeling process utilizes a “top-down – bottom-up” approach for all data system designs (ODS, DW, data marts and applications). The process is driven from the top-down. The EDM is the artifact produced from the top-down steps. The bottom-up is also important because it utilizes existing data sources to create data designs in an efficient, practical manner.
An EDM is built in three levels of decomposition.). The Enterprise Subject Area Model (ESAM) is created first, and then expanded, creating the Enterprise Conceptual Model (ECM), which is further expanded, creating the Enterprise Conceptual Entity Model (ECEM). Although the models are interrelated, they each have their own unique identity and purpose. Creating an EDM is much more an art than a science. An EDM is created in its entirety, relative to the best knowledge available at the time; as there will always be more revealed. An EDM can be thought of in terms of “levels,” as shown in figure 1.
Figure 1 – Enterprise Data Model Levels
Enterprise Subject Area Model
Enterprise data is any data important to the business and retained for additional use. Data would not be saved unless there was a perceived additional need. So basically, most data could be considered enterprise; making its scope immense. It is almost impossible, even for a large team to design, develop, and maintain enterprise data without breaking it into more manageable pieces. A fundamental objective of an Enterprise Subject Area Model (ESAM) is the idea of, “divide and conquer.” An ESAM covers the entire organization. All data produced and/or consumed across the business are represented within a subject area. The average number of subject areas for an organization is between 10 to12. Additional subject areas may be required for more complex organizations. An ESAM is the framework for the Enterprise Data Model (EDM).
Each subject area is a high-level classification of data representing a group of concepts pertaining to a major topic of interest to an organization. Subject areas can represent generic business
concepts (customer, product, employee and finance), as well as industry specific. The subject areas for an airline are shown in Figure 2.
Figure 2 – Airline Subject Area ModelSubject Area Groupings
Subject areas can be grouped by three high-level business categories: Revenue, Operation, and Support. These groupings are significant because each represent a distinctively different business
focus. Revenue types focus on revenue activities including, revenue planning, accounting, and reporting. Operation types represent the main business functions involved in daily operations. Support
types aid the business activity, rather than represent the main business. All organizations share these high-level business groupings. An airline’s subject areas are grouped as follows:
- Revenue Ticket, Booking, Sales, Inventory, Pricing
- Operation Flight, Location, Equipment, Maintenance, Schedule
- Support IT, Finance, Employee, Customer
Subject Area Data Taxonomy
Taxonomy is the science of naming, categorizing and classifying things in a hierarchical manner, based on a set of criteria. Data Taxonomy (*see Data Taxonomy paper) is a hierarchical classification tool applied to data for understanding, architecting, designing, building, and maintaining data systems. Data Taxonomy includes several hierarchical levels of classification. At the highest level, all data can be placed into one of three classes: Foundational, Transactional, or Informational, as shown in figure 3. These classes are distinguished by patterns of data production and conception, as well as their data life cycles.
Figure 3 – Data Taxonomy
Foundational Data is used to define, support and/or create other data. It includes reference type data, metadata, and the data required to perform business transactions. Transactional Data is the data produced or updated as the result of business transactions. It is dynamic in nature and current within operational systems. Informational Data is historic, summarized, or derived; normally created from operational data. It is found primarily within decision support systems and occasionally used within operational systems for operational decision support.
Subject areas can be categorized according to their predominant data classification. At the detail level, subject areas contain all three data classes. The classification is based on the size, usage and implementation of that class within the subject area. An airline’s 14-subject area’s can be classified as follows:
- Foundational – Equipment, IT, Employee, Sales, Location, Customer
- Transactional – Ticket, Booking, Flight, Finance, Maintenance
- Informational – Pricing, Inventory, Schedules
Subject Area Model Creation
An ESAM is developed working closely with the business subject matter experts, under the guidance of any existing enterprise knowledge. Organizational structure and business functions need to be identified and understood. Subject areas common to most organizations (Customer, Employee, Location, and Finance) are identified first. Additional subject areas are then defined, ending up with a complete list of the “official” subject areas, and their definitions. These are then validated with the business experts.
The process of defining and naming each subject area is important because it provides an opportunity to gain consensus across business boundaries on topics vital to an organization. These topics include such things as: what is a customer. If agreement can be gained at a high level, the more detail concepts will be much easier to define. During this process, priorities are established for the more detail analysis needed in the subsequent development of the EDM.
Questioning may arise regarding Informational type subject areas, because they usually consist of the summarized and/or historic data of a Transactional subject area. An Informational subject area’s definition may make it appear as if it belongs to the original Transactional subject area. Regarding the airline subject area example; Booking is a Transactional subject area and Inventory is an Informational. A core concept within the Inventory subject area is called “Booking History”, containing the data needed to derive the available seat inventory, an airlines “product inventory.” Booking and Inventory are both important, but separate Airline subject areas. This is where Data Taxonomy is valuable for understanding.
Subject area names should be very clear, concise, and comprehensive; ideally one word. When ever possible, industry standard business names (Customer, Employee, and Finance) are used. Definitions are formulated from a horizontal view, as all relevant information is considered. The definitions help determine the scope of a subject area. Definitions are important because they are viewed by the entire organization, so they need to be as simple, and as understandable as possible. Theoretical, academic or proprietary language should never be used.
The relationships between subject areas represent significant business interactions and dependencies. A simple line is used to represent the major business relationship between subjects. There is no optionality (relationship being required or not) or cardinality (numeric relationship, 0, 1, infinite) at this level. All of the possible relationships are not represented because of the practicality. The ESAM is not intended to represent each subject area as a “silo”, but rather an integrated view of the business; the point of the relationships. There are very “gray” boundaries between subject areas. An ESAM can be thought of as a Venn diagram, with overlaps ending up in only one subject area.
Color plays an important role in the ESAM, as well as the entire EDM. Each subject area and its subsequent concepts, as well as its data objects, have a distinct color. One color is used for all data concepts, entities and tables belonging to a specific subject area. Use of color conveys an instant understanding when viewing any of an organization’s data models. The Airline’s 14-subject area example, shown in figure2, displays 14 distinct colors. As the ESAM becomes institutionalized, the subject areas may even be referenced by their color.
Creation of the ESAM follows enterprise data standards, a naming methodology and a review process. The ESAM is validated by the business in an iterative manner. After gaining consensus across the business, the subject areas are assigned a high-level data taxonomy class (Foundational, Transactional, or Informational) and added to the Metadata repository. Subject areas are core to an enterprise Metadata repository strategy, because all data objects will be tied to a subject area. Subject areas are assigned one or more business area owners.
At first glance, an ESAM may appear as if it would only take a few hours to create, because it looks like a very simple diagram. However, a true ESAM will take much longer, due to the participation required across the entire organization. Coordination and consensus of this magnitude takes time. With an average size organization and experienced design professionals, the process may take up to two or three months. To facilitate this process, meetings with business experts can be informal. It is essential to have enterprise wide participation and interaction, since the value of the ESAM is in its depth of business understanding and agreement.
A method of organization is a way of grouping things into an orderly structure. An ESAM provides the structure for organizing an EDM by business subjects rather than by applications or data systems. Enterprise data systems (ODS or DW) are also organized by the ESAM, providing an orderly structure for their design, use, management, and planning. The process to create the ESAM is also important. It provides an opportunity to “sell” the value of enterprise-integrated data, as well as uncover many of the organization’s core data integration issues.
Enterprise Conceptual Model
An Enterprise Conceptual Model (ECM) is the second level of the Enterprise Data Model (EDM), created from the identification and definition of the major business concepts of each subject area. The ECM is a high-level data model with an average of 10-12 concepts per subject area. The concepts convey a much greater business detail than the subject areas. An ECM is comprised of concepts, their definition and their relationships.
Concepts describe the information produced and consumed by an organization, independent of implementation issues and details. Concepts are grouped by subject areas within the ECM. A concept can
represent a relationship between subject areas. Even in this case, concepts always belong to only one subject area. The concepts help to further define the subject areas, including their scope.
They are the details of the subject area definitions. Concepts may be found at different levels of granularity depending on their business relevance. Each concept may cover a very large or small area or volume of data. The point is that the concepts represent the important business ideas, not an amount of data.
The relationships between concepts define the interdependency of the data, void of optionality (relationship being required or not) or cardinality (the numeric relationship; 0, 1, infinite). A simple line is used to represent the major business relationships between concepts. All possible relationships are not represented. The concepts are not intended to be “stand alone” or “silo” areas of the business, rather, an integrated view of the business. There can be very gray boundaries between concepts, even concepts connecting subject areas. Gray areas are desirable because they represent a more “tightly coupled” or integrated enterprise design.
Concepts are based on the organization’s main business. An airline’s main business is to provide transportation services. In the normal operations of any organization, there are many supportive
areas such as: Finance, Information Technology (IT), and HR. Supportive areas may contain business functions similar to the main business. For example, IT has customers, but these customers are not
the airline customers. Including the IT customers into the airline customer concept causes confusion, unnecessary complexity, and does not represent data integration. Care must be taken to have the
main business drive the concept definitions.
The ECM also needs to fit within the bigger picture of an industry view. This includes concepts such as vendors/suppliers and business partners, as well as the external reference data. It is important to be careful not to have the industry view drive or define the definition of an organization’s internal concepts. The industry viewpoint would be irrelevant if it weren’t for the organization. Always remember the dog wags the tail, the tail does not wag the dog.
Enterprise Conceptual Model Creation
Concentrating one subject area at a time, the ECM is developed from a top down approach using an enterprise view, not drawn from just one business area or specific application. As with the ESAM, the ECM is developed under the guidance of any existing enterprise work. Multiple sessions are held with the appropriate subject matter experts and business area owners. The business users ultimately provide the information needed to build the model.
The first step is to identify and formally document the creators and consumers of the data. Informal interviews are conducted with the identified business users, as well as subject matter expertise. The data designers identify the initial set of data concepts and then conduct working sessions to further develop and verify the concepts. During the working sessions, relationships and overlaps between the concepts of subject areas are identified and resolved. Many concepts are moved from one subject area to another due to the gray nature of data integration and subject area scope. The process also helps to establish the areas needing more detail analysis in the subsequent EDM development.
Concepts are formulated from a horizontal view of data created and consumed by the business functions. Enterprise concept names and definitions are derived from the intersection of all the business definitions or usage of that data. Concept names should be very clear, concise, and comprehensive. They are business oriented, not system or application aligned. They are not abbreviated. The concept definition needs to be clear and concise, but as complete and detailed as necessary for comprehension. The concept definitions are inclusive of the scope.
The model graphically displays the concept name and definition. All definitions are consistently written and begin with “The concept of XXXX describes”, so on its own, it is clear as to its level. For example; the name “customer” may be used for a subject area, a concept, as well as a table name, therefore its level must be specified. A simple line is used to represent the major business relationships between concepts. All of the possible relationships are not represented. Relationship names may or may not be displayed on the model, but are always defined within the model documentation.
With an average size model of 100 concepts, it can be an overwhelming amount of information to comprehend. A large format plot of the ECM is important because people tend to learn visually. It is used both during and after the model’s development. Color plays a vital role in visual comprehension; as the appropriate subject area colors are used, making it easy to instantly relate the concepts to subject areas. Subject area concepts are grouped together, with dependant concepts and subject areas located near each other.
After several working sessions, the appropriate business experts, including the experts from related subject areas, validate each set of subject area concepts. A plot of a subject area’s concept, is used to facilitate the validation process. Validating the entire ECM, with all of the subject area business experts would be a daunting task. Gaining consensus, one subject area at a time is much more feasible. The validation sessions should be very lively because the concepts are independent of technology and implementation, making it easy for the business experts to contribute to discussions. From these sessions, documentation is created, describing enterprise overlap, conflicts, and data integration issues or concerns.
After the business validation is complete and adjustments made, a design review is conducted, verifying consistent adherence to enterprise standards. The concepts are assigned a high-level data taxonomy classification (Foundational, Transactional, or Informational). Many Concepts within a subject area will have the same classification as their subject area, but there are exceptions. The concepts are added to the Meta data repository and mapped to their appropriate subject area. All data designs and subsequent data stores will be tied to the appropriate enterprise concepts, and subject areas.
The process of creating the ECM is iterative; as more detail is discovered in the development of the Enterprise 3rd level model, changes and updates to the ECM may be necessary.
An ECM is used to confirm the scope of the subject areas and their relationships. As concepts are defined, questions arise regarding what’s included within a subject area. Concepts clarify the scope and definition of subject areas. Sometimes, subject area definitions are updated from discoveries made during the development of an ECM.
An ECM defines significant integration points, as the subject area’s integration points are expanded. Relationships between subject areas are represented as one or more relationship between subject area concepts, or simply as a concept. In other words, subject area relationships can become a concept within an ECM. At the subject area level, enterprise data ownership is assigned to a business area. At the conceptual level, business experts with a broad knowledge are assigned enterprise data ownership.
With an average size of around 100 concepts, the level of the ECM is ideal for information systems planning activities. The concepts can be plotted poster size or transferred to a word document and formatted into an enterprise data book; an excellent tool for planning, as well as communication. Subsets of concepts can be extracted, representing future and existing information systems. These subsets are a great tool for visualization and understanding of existing and/or future information systems, as well as the identification of system overlaps and dependencies.
The ECM serves as the foundation for creating the Enterprise Conceptual Entity Model (ECEM), the third level of the EDM. Creating the ECEM would be much more difficult without the framework provide by ECM; with many data integration points missed. It would be like trying to hang drywall without the studs in place.
Enterprise Conceptual Entity Model
An Enterprise Conceptual Entity Model (ECEM) is the third level of the Enterprise Data Model (EDM) representing the things important to each business area from an enterprise perspective. It is the detail level of an EDM; expanding each of the concepts within each of the subject areas, adding finer detail. Beginning with the Enterprise Conceptual Model (ECM), the data designers, working with the business area experts, create the ECEM. The business and its data rules are examined, rather than existing systems, to create the major data entities (conceptual entities), their business keys, relationships, and important attributes.
Although an ECEM is created as the next step following the creation of the ECM, it is developed in a phased approach. As many 2nd level concepts as possible, are initially expanded. The remaining concepts are expanded based on business importance and prioritization. The greater number of concepts expanded, the more solid a framework an ECEM will provide for data systems design and development. The detailed “build out” of the EDM is often times driven by the development of an ODS, EDW and/or large enterprise application.
Conceptual Entity Model Components
There are four major components to the ECEM as follows:
Conceptual entities represent the things important to the business, similar to the “major” entities found within a logical data model. They can be thought of as “pre-normalized” logical model entities. The concepts are independent of technology and implementation concerns. They exist at different levels of granularity, depending on their business and/or data relevance. The level of granularity can also depend on the information known at the time of their creation. The idea is to define the important data, not necessarily the size of the data. Although, there can be some correlation between size of data and the number of conceptual entities. An entity concept may also be a common super-type, or important subtype. Each entity concept will ultimately represent multiple logical entities and possibly physical tables.
Primary Keys & Significant Attributes
A conceptual entity contains a primary key representing its unique identity in business terms. Although a conceptual entity may represent multiple logical entities, the key remains realistic at the root level. A key validates business rules; as entity concepts are related and keys are inherited, they must continue to work correctly.
Additional attributes are included for business significance and/or enterprise data integration. An example is a reference table’s key attribute. Since reference tables are not generally included in an ECEM, the type code key is added to the conceptual entity, as the foreign key would have been, if the referenced table were included in the ECEM.
Relationships between conceptual entities represent many of the data rules important to the business. Relationships define the interdependency of the conceptual entities. Many-to-many relationships are not generally resolved, unless the resolution represents an important business data concept. The relationships will incorporate both optionality (being required or not) and cardinality (numeric relationship, 0, 1, infinite). They can be identifying or non-identifying, depending of the business rules. Relationship names may or may not be displayed on the model, but are always defined and documented. They need to make sense within an English sentence. Relationships are defined in both directions.
Enterprise data integration is generally defined in terms of the keys and relationships. If a relationship does not work and/or a key is not being inherited correctly, there’s probably an incorrect assumption about the business rules, or the conceptual entity may be too “conceptual” or artificial. This is where the “Ah Ha’s” happen and many potential issues are resolved.Discovering these issues represents one of the most important values of an EDM. Working out the “kinks” is essential before proceeding to the development of the organization’s data systems.
Name & Definition
Conceptual entity names are business oriented; not influenced by systems or applications. Abbreviations and acronyms are not used. The names are as simple as possible, yet appropriately descriptive. Enterprise definitions are created from the intersection of all business definitions/usage. Business area definitions can differ depending on the viewpoint or consumption usage. The enterprise definition improves the context of information. It is as complete and detailed as necessary for clarity, while remaining simplistic and concise. All definitions are consistently written, beginning with: “The XXXX conceptual entity describes”, in order to clearly identify its level.
Conceptual Entity Model Creation
An ECEM is created using a “top down” approach, from an enterprise business view; not from one specific application or business area. The information gathered during informal interviews with the appropriate business data creators and consumers is analyzed under the guidance of existing enterprise work; expanding and enhancing the ECM. The data designers then create the initial subject areas of the ECEM. Working sessions are held with subject matter experts, to further develop and verify the ECEM. The sessions also serve to identify and document relationships and overlaps between subject area entity concepts. The ECEM design process is highly iterative, as more is continually discovered.
Business validation sessions are conducted with the proper business experts for each subject area of the ECEM. It is important the business understands that the model is a conceptual representation from an enterprise view. There are business users who are unable, or may not want to see their business area from an enterprise perspective. The validation is not a “sign-off” by the business to approve modeling techniques. It is to verify the business is completely and correctly understood. There may be more than one session necessary, due to the number of entity concepts, business complexity, or number of issues discovered.
After the business validation is complete and adjustments made, an enterprise standards review is conducted to verify model consistency and accuracy; assuring adherence to enterprise design standards. A detail document describing enterprise overlaps, conflicts, and integration points is created. The document is used as a tool in the development and management of the organization’s data resource.
An ECEM can easily contain more than a thousand conceptual entities, so it may be separated by subject area into individual models or files. This is based on a combination of tool limitations and model size. It is also much simpler to coordinate updates and mappings when the model is in separate files. Even if the model is separated, it is important the model stay in sync and integrated.When the model is separated into subject areas, each will need to include additional conceptual entities from related subject areas where a key is inherited. This will help to assure models stay in sync, as well as give an integrated view when a subject area ECEM is plotted or viewed. Even if the model is split into separate files, it is still considered one model; as all or part is referred to as, the Enterprise Conceptual Entity Model.
A large format plot of the model is important because people tend to learn visually. The model displays the conceptual entity names, definitions, key(s), and relationships. Color is fundamental for
visual comprehension, making it easy to instantly relate the conceptual entities to subject areas.
An ECEM provides a data architectural framework for the organization’s data designs and subsequent data stores, in support of data quality, scalability and integration. The framework can be thought of in much the same way as a framework (stud walls, roof trusses, and floor joist) in the construction of a house. The siding, drywall, molding, and fixtures, attached to the framework, are the finish materials to complete the house. In a similar manner, the business’s data requirements and data sources supply the finish material for a data design. The details or “finish material” to complete the data designs are “attached” to an ECEM framework. These “finish materials” are drawn from data sources, including legacy systems, as well as business requirements. When data designs are created using only “finish materials”, the designs and resulting data stores tend to be very weak (poor data quality, non-scalable and not integrated), similar to a building constructed of finish materials.
An ECEM, serving as the integrated data architectural framework, is also the source of reusable data objects for construction of the organization’s data stores (ODS, DW, application, and data mart). The first step in creating any data designs is the creation of a Business Conceptual Entity Model (BCEM). This model is a “subset” of the ECEM, representing the logical/conceptual view of the potential data store, within an enterprise perspective. A BCEM is a 3rd level model, as is the ECEM. It is a separate model, but always drawn from the ECEM. When data designs are drawn from the same model, many data objects can be appropriately reused, enabling development to proceed much faster.
Data designs and subsequent data stores are mapped to the ECEM through their BCEM, providing an enterprise perspective, essential for data integration and core to achieving a high quality data resource. The ECEM is the “glue”, tying all of an organization’s data together, including packaged applications. A BCEM is created for packaged applications. When the data designs and subsequent data stores are drawn from the same model, they will have a common ‘look and feel’, enabling a consistent flow of data, enhancing the development of new systems.
Data is one of an organization’s most valuable assets. All current and future business decisions hinge on data. An EDM is essential for the management of an organization’s data resource. The core principle of data management is order; applying order to the vast universe of data. To manage data is to apply order. According to the second law of thermodynamics; the universe and everything in it, continually heads toward chaos; it takes energy to bring order. The same holds true for data, left alone, it continually deteriorates to a state of disorder. It takes concerted effort to keep data in order. An EDM brings order.
There’s a saying, “the journey counts more than the destination.” The process of creating the EDM, in itself, is important because it provides opportunities for the business to work together in understand the meaning, inter-workings, dependency and flow of its data across the organization. In the day-to-day operations, many never get an opportunity to “look up” and see the bigger picture; see the enterprise data view; where data comes from, its transformation, where it goes, what happens to it, and where they fit in. The modeling process gives this opportunity; bringing focus to data’s importance. The “big picture” understanding and support from the business are essential in establishing a data quality program, data ownership, and data governance; all necessary within an enterprise data environment.
The process also provides the opportunity to build relationships and trust between Information Technology (IT) and the business. Often times the business feels IT doesn’t understand. The data designers, representing IT, work closely with the business in the development of an EDM, gaining trust and providing assurance of IT’s understanding and partnership. If the business is presented an EDM where they were not involved, the model has little meaning; resulting in a lack of ownership and commitment. The opportunity to build the IT-business relationship is lost. The EDM and the process to create it, is essential for any organization that values its data resource.