Enterprise Data Warehouse: A Patterns Approach to Integration Evolution

Often, when large organizations decide to implement an enterprise data warehouse (EDW), a major challenge is the agreement on and implementation of integrated data – a key goal for an EDW. Data integration, in this case, involves combing data residing in different sources and providing users with a unified view. It is the ability to define common entities and attributes around subject areas of importance to the business and source, map and load into common structures. Typically, an enterprise is comprised of several lines of business (LoBs) each with overlapping and distinct information needs. Although the various LoBs may concede that integrated data is a requirement to achieve that critical 360-degree view of the enterprise, most times the definition of this view is in the perspective of each of the businesses. For example, each business may have a different definition of Customer that works sufficiently well within the context of that business, but is an obstacle to conformity within the EDW. LoBs may not have the resources or motivation to work with IT and other businesses to analyze and agree on common definitions and the integration of data and business rules.

Forcing business groups down the integration path in a top-down approach is usually a no-win situation because too much time can be spent in analysis, defining and mapping data for the entire organization. Besides being long running and resource intensive, holistic efforts such as these usually fail because of an inability to deliver the entire solution in a timely manner. Conversely, a subject area approach leaves enormous gaps of data in the EDW and leads to low adoption and “shadow” or “silo” solution development. This occurs because of competing business priorities on subject area implementation. Likewise, locking your solutions into either a centralized warehouse or federated data mart architectures limits your flexibility by pushing integration when your businesses might not be ready, or development of redundant processes to build marts from the same sources. These approaches jeopardize the success of an EDW initiative. What may be needed is an evolutionary approach to business intelligence (BI) and the EDW that adapts to the BI maturity levels of the enterprise.

Figure 1: (mouse over image to enlarge)

The first step in this journey is critical: Data Sourcing and Warehouse Consolidation. This involves acquiring data once in a single store and then distributing the data in a mart fashion – one per LoB or solution. This is a hybrid approach to building the EDW that utilizes both a centralized store and federated marts. The goal of this is to collocate the data and processes onto a single, consistent infrastructure. This infrastructure consists of shared hardware and storage, a single set of tools and processes for ETL, security, data quality, etc., and a cohesive deployment environment. Although the data may not be integrated, it allows each LoB access to a far greater range of data in a more timely fashion. The problem focus now shifts from how do I get the data I need, to what does the data mean, why are there so many versions of it, and how can I link it together in new and insightful ways? This sets the context then for the necessary integration and governance discussions to occur, the next step in the evolution. Agreement and development of common, integrated data and business rules can then begin migration from LoB data mart solutions toward the central data store, thus reducing redundancy and conflict and enabling the final step in the evolution. Once data integration starts occurring in the central data store, attention can be applied to conforming the data marts, either in a federated or consolidated fashion. Since data disparity and semantics are no longer the reason to isolate data marts, physical segmentation of the data is used for other compelling reasons, such as performance, latency, security/privacy, or unique feature sets dependent on structure/format, etc.

With these three steps, a set of solution patterns can be defined that provide a balance between the need to collect and integrate data for the EDW and the need of the business to have immediate, tactical solutions. A pattern offers well established solutions to problems in software engineering by capturing essential elements of an architecture and depicting those elements in a way that allows you to categorize the components. In the case of our EDW evolution, we can define three patterns to classify how and where a solution fits within the architecture based on the BI maturity level of the business. These patterns are predicated on an overall EDW architecture that utilizes a common infrastructure, an integrated central data store (DW) and a business-driven distribution (data mart) model, as described earlier. This architecture is not an either/or proposition on the two classic schools of thought for warehousing, but rather uses the best of both. This architecture is necessary for providing the flexibility to integrate when needed/warranted and still accommodate LoB requirements in a timely manner by reducing the cross dependencies associated with reconciling business needs and requirements into a single application. This architecture provides the added benefit of supporting self-service BI by allowing the publishing of centralized data to a sandbox environment.

Figure 2: (mouse over image to enlarge)

Pattern 3: These are stand-alone, point solutions, either inherited as legacy, or developed for business areas in which there may be limited or no strategic value for integrating within the EDW. They are characterized as being supported by IT, non-common data models and data sources, and may utilize EDW infrastructure such as hardware, storage and process (ETL engine). Pattern 3 provides the bridge between being an independent DW for legitimate reasons and the EDW.

Pattern 2: These solutions benefit from fully utilizing the EDW infrastructure, common data sources and the integrated data within the central data store. A key assertion for Pattern 2 solutions is that they may utilize data directly from staging or the central data store. This is an intermediate Pattern 2 benefit in that it allows for the quick acquisition and use of data without waiting for the eventual integration or collocation of data in the central data store. In addition, it provides the ability to define and implement LoB-specific data models. Although such “collocated” data may be redundant and in conflict with the enterprise-level model, it provides enormous advantage in flexibility to the business and in acquiring and reuse of data that other solutions and businesses may find useful. This last point cannot be overstated; the sourcing and common storage of data from across the enterprise is a critical first step in getting data in the hands of the business, thus providing tactical advantage while providing better understanding of existing data assets, including associated problems with sources, definition, gaps, and data management quality and control.

Pattern 1: These are fully integrated solutions that utilize a common, subject-oriented data model and conformed domains/dimensions. The data is tightly governed and adheres to the data management policies of the enterprise.

It should be apparent, that the boundaries between these patterns are not discrete. That is, a solution can have components that fit in one, two or all three patterns. This is by design and what makes this a powerful framework for evolving toward the ultimate goal of an EDW, as opposed to the big bang, boil the ocean approach. A few such advantages include:

  • Data is sourced (acquired) once

  • Data is available from a common store, either integrated or collocated
  • Provides insight into problems with data quality, redundancy and gaps
  • Facilitates analysis and profiling of data for later integration
  • Integration can evolve over time when the business is ready
  • A single, commonly supported infrastructure

Given the challenges associated with traditional data warehouse implementations, patterns provide a measurable and systematic approach to migrating business intelligence solutions to an enterprise-class warehouse without constraining the business from having immediate, tactical benefit. These advantages could be the difference between successfully implementing an EDW over time and being yet another project failure statistic.


submit to reddit

About Michael Eldridge

Michael is a Senior Principal Architect at Microsoft working in the business intelligence (BI) engineering organization within Microsoft IT. He has more than 25 years of IT experience ranging from aerospace/defense, to banking and insurance, and forest products manufacturing. His fourteen years at Microsoft have been focused primarily on designing and building BI and warehouse systems for highly diverse and globally distributed business areas.