Data Warehousing & ERP – A Combination of Forces

Published in April 2002

Since the introduction of the term “data warehousing” in 1990, companies have explored the ways they can capture, store and manipulate data for analysis and decision support. At the
same time, many companies have been instituting enterprise resource planning (ERP) software to coordinate the common functions of an enterprise. ERP software usually has a central database as its
hub, allowing applications to share and reuse data more efficiently than previously permitted by separate applications. The use of ERP has led to an explosion in source data capture, and the
existence of a central ERP database has created the opportunity to develop enterprise data warehouses for manipulating that data for analysis. This paper will provide an overview of the issues and
challenges that the intersection of these two IS concepts are creating.

Data warehouses are one of the foundations of the decision support systems of many IS operations. They serve as the storage facility of millions of transactions, formatted to allow analysis and
comparison. As defined by the “father of data warehouse”, William H. Inmon, a data warehouse is “a collection of integrated, subject-oriented databases where each unit of data is
specific to some period of time. Data Warehouses can contain detailed data, lightly summarized data and highly summarized data, all formatted for analysis and decision support”
(“Building a Data Warehouse”, Inmon, W. H.; Wiley, 1996). In the “Data Warehouse Toolkit”, Ralph Kimball gives a more succinct definition: “a copy of transaction data
specifically structured for query and analysis” (“The Data Warehouse Toolkit”, Kimball, R.; Wiley, 2000). Both definitions stress the data warehouse’s analysis focus, and
highlight the historical nature of the data found in a data warehouse.

Enterprise Resource Planning software is a recent addition to the manufacturing and information systems that have been designed to organize the flow of data from process start to finish. This flow
of information has existed since the first manufacturers traded with the first merchants, but until the advent of ERP software and the processes that accompany it, this information was largely
ignored and not captured. ERP software attempts to link all internal company processes into a common set of applications that share a common database. It is the common database that allows an ERP
system to serve as a source for a robust data warehouse that can support sophisticated decision support and analysis.

ERP software is divided into functional areas of operation; each functional area consists of a variety of business processes. The main, common functional areas of operation in most companies would
include: Marketing and Sales; Production and Operations (Materials Management, Inventory, etc.); Accounting and Finance; Human Resources. Historically, businesses have had clear divisions among
each of these areas, and IS development was also clearly delineated so that systems did not share data or processes and cross-functional analysis of information was not possible. Since all
functional areas ARE interdependent, this separation was not a valid representation of a business’ activities and the divisions among the many information systems created artificial barriers
that needed to be overcome.

ERP software was designed to eliminate the barriers to sharing data and processes that occur when companies design and implement information systems for a single function or activity. ERP software
coordinates the entire business process, and stores all the captured data in a common database, accessible to all the integrated applications of the ERP suite. As explained in “Concepts in
Enterprise Resource Planning” (Brady, Monk, Wagner; Course Technology, 2001) companies can achieve many cost savings and related benefits from the use of ERP for transaction processing and
management reporting through the use of the ERP’s common database and integrated management reporting tools.

However, much of the work performed by managers and knowledge workers in the 21st century is not transaction or management reporting-based. The main activity of knowledge and management staff is
analysis, and this analysis is supported by the development and use of decision support systems. The most common application of DSS in companies today is the data warehouse. With the use of the
ERP’s common database and the implementation of DSS/DW user support products companies can design a decision support/data warehouse database that allows cross-functional area analysis and
comparisons for better decision-making.

Since companies usually implement an ERP in addition to their current applications, the problem of data integration from the various sources of data to the data warehouse becomes an issue.
Actually, the existence of multiple data sources is a problem with ERP implementation regardless of whether a company plans to develop a data warehouse; this issue must be addressed and resolved at
the ERP project initiation phase to avoid serious complications from multiple sources of data for analysis and transaction activity. In data warehouse development, data is usually targeted from the
most reliable or stable source system and moved into the data warehouse database as needed. Identification of the correct source system is essential for any data warehouse development, and is even
more critical in a data warehouse that includes an ERP along with more traditional transaction systems. Integration of data from multiple sources (ERP database and others) requires rigorous
attention to the metadata and business logic that populated the source data elements, so that the “right” source is chosen.

Another troubling issue with ERP data is the need for historical data within the enterprise’s data warehouse. Traditionally, the enterprise data warehouse needs historical data (see
Inmon’s definition). And traditionally ERP technology does not store historical data, at least not to the extent that is needed in the enterprise data warehouse. When a large amount of
historical data starts to stack up in the ERP environment, the ERP environment is usually purged, or the data is archived to a remote storage facility. For example, suppose an enterprise data
warehouse needs to be loaded with five years of historical data while the ERP holds at the most, six months worth of detail data. As long as the corporation is satisfied with collecting a
historical set of data as time passes, then there is no problem with ERP as a source for data warehouse data. But when the enterprise data warehouse needs to go back in time and bring in historical
data that has not been previously collected and saved by the ERP, then using the ERP environment as a primary source for the data warehouse is not a viable option.

Metadata in the ERP is another consideration when building a data warehouse is in the ERP environment. As the metadata passes from the ERP to the data warehouse environment, the metadata must be
moved and transformed into the format and structure required by the data warehouse infrastructure. There is a significant difference between operational metadata and DSS/DW metadata. Operational
metadata is primarily for the developer and programmer. DSS metadata is primarily for the end user. The metadata that exists in the ERP application’s database must be converted, and such a
conversion is not always easy or uncomplicated, and requires experienced data administrators and users to collaborate in the effort.

Mr. Inmon suggests some guidelines for using the ERP database as a source for a data warehouse. They would include the existence of a solid interface that pulls data from the ERP environment to the
data warehouse environment. The ERP to enterprise data warehouse interface needs to:

  • be easy to use
  • enable the access of ERP data
  • capture the meaning of the data that is being transported into the data warehouse
  • be aware of restrictions within the ERP that might exist when it comes to the accessing of ERP data
  • be aware of referential integrity
  • be aware of hierarchical relationship
  • be aware of logically defined – implicit – relationships
  • be aware of application conventions
  • be aware of any structures of data supported by the ERP
  • be efficient in accessing ERP data, supporting –
    • direct data movement
    • change data capture
  • be supportive of timely access of ERP data
  • understand the format of data

(taken from “Data Warehousing and ERP”, a white paper by Wm. H. Inmon, Kiva Productions, LLC, 1999)

In summary, the development of data warehouses and the emergence of ERP as factors in the information systems explosion must be addressed and resolved by experienced information systems
professionals with a clear understanding of the challenges each environment poses. Integrating ERP data into a data warehouse can lead to a superior source of data for analysis and decision-making
if the data is formatted for query and reporting, and if the ERP environment is coordinated with the decision support needs of the organization. To ignore the wealth of data and information that is
available from an ERP is to ignore a valuable corporate resource, one that can serve as a foundation for a superior data warehouse.


submit to reddit

About Anne Marie Smith, Ph.D.

Anne Marie Smith, Ph.D., is an acclaimed data management professional, consultant, author and speaker in the fields of enterprise information management, data stewardship and governance, data warehousing, data modeling, project management, business requirements management, IS strategic planning and metadata management. She holds a doctorate in Management Information Systems, and is a certified data management professional (CDMP), a certified business intelligence professional (CBIP), and holds several insurance certifications.

Anne Marie has served on the board of directors of DAMA International and on the board of the Insurance Data Management Association.  She is a member of the MIS faculty of Northcentral University and has taught at several universities. As a thought leader, Anne Marie writes frequently for data / information management publications on a variety of data-oriented topics.  She can be reached through her website at and through her LinkedIn profile at