A New, Flexible Architecture for Developing Business Intelligence Systems

Through the years, many architectures have been introduced for developing business intelligence systems, including such classic architectures as Ralph Kimball’s Data Warehouse Bus Architecture, Bill Inmon’s Corporate Information Factory, and his more recent architecture called Data Warehouse 2.0.

These architectures have been deployed by most organizations and have served them well the last fifteen to twenty years. For a long time we had good reasons to use these architectures, because the state of database, ETL, reporting, and analytical technology did not truly allow us to develop systems based on other architectures. In addition, most tools were aimed at supporting these classic architectures, which made developing systems with other architectures hard.

The question we have to ask ourselves is: Are these still the right architectures? Are these still the best possible architectures we can come up, especially if we consider the new demands and requirements of users, and if we look at new technologies available in the market, such as analytical database servers, data warehouse appliances, and in-memory analytical tools? My answer would be no! To me, we are slowly reaching the end of an era. An era where the classic architectures were dominant. It’s time for change. This article describes an alternative architecture, one that is more flexible and fits the needs and demands of most organizations for (hopefully) the next twenty years.

This new architecture is called the Data Delivery Platform (DDP) and was introduced in a number of articles published at BeyeNETWORK.com (see The Definition of the Data Delivery Platform . The definition of the DDP is:

The Data Delivery Platform is a business intelligence architecture that delivers data and meta data to data consumers in support of decision-making, reporting, and data retrieval; whereby data and meta data stores are decoupled from the data consumers through a meta data driven layer to increase flexibility; and whereby data and meta data are presented in a subject-oriented, integrated, time-variant, and reproducible style.

Fundamental to the DDP are two principles. The first one is decoupling of data consumers and data stores and the second is shared specifications. Let’s explain those two principles.

In a business intelligence system with a DDP-based architecture, data consumers are decoupled from the data stores by a software layer. This means that data consumers (such as reports developed with SAP BusinessObjects WebIntelligence, SAS Analytics, JasperReport, or Excel) don’t know which data stores are being accessed: a data warehouse, a data mart, or an operational data store. Nor do they know which data store technologies are being accessed (an Oracle or DB2 database, or maybe Microsoft Analysis Service). The data consumers will only see and access the software layer, which presents all the data stores as logically one big database; see Figure 1. The data consumers have become data store independent.

Figure 1: The Data Delivery Platform

The advantages resulting from decoupling are:

Easier data store migration: Data store independency means that if a report that accesses a particular data store can easily be migrated to another data store. The report’s queries can be redirected through the DDP to that other data store. For example, if a report is currently accessing a data mart, migrating it to the data warehouse doesn’t require any changes in the report definition. The same applies if a need exists to migrate from a relational database to MDX-base technology, or if SQL Server has been replaced by Netezza. In most cases, these changes will have no impact on the reports. In short, if a DDP is in place, migration to another data store (technology) is easy. There are various reasons why an organization wants to migrate, for example, they may want to use technology that offers faster query performance, or data storage is outsourced and needs to be accessed differently.

Cost reduction due to simplification: If the DDP is installed in an existing business intelligence architecture, for example one based on the Corporate Information Factory architecture, the DDP makes it possible to simplify the architecture. Data marts and cubes can be removed and the existing reports must be redirected to another data store, which, as indicated, is easy to do with the DDP. The advantage of this simplification of the architecture is cost reduction.

Increased flexibility of the architecture: With less code and fewer specifications, it is easier to change a system. The DDP makes it possible to simplify the architecture and to work with shareable specifications. The effect is that new user requirements and demands can be implemented faster. In other words, the time to market for new reports is shortened.

Seamless adoption of new technology: New database and storage technology has appeared on the market, such as data warehouse appliances, analytical databases, columnar databases, and solid state disk technology. As indicated, because the DDP separates the data consumers from the data stores, replacing an existing data store technology with a new one is relatively easy and has no impact on the reports.

Transparent archiving of data: Eventually data warehouses become so big that ‘older’ data has to be archived. But if data is old, it doesn’t always means that no one is interested in it anymore. The DDP can hide where and how archived data is stored. Archiving data, meaning data is taken out of the original data store and moved to another, can be hidden for the data consumers. If users are still interested in all the data, the DDP can combine the non-archived data with the archived data store. The effect might be that the performance is slower, but reports don’t have to be changed. Therefore, the DDP hides that some data has been archived.

Decoupling data consumers from data stores is based on the concept of information hiding. This concept was introduced by David L. Parnas (see ‘Software Fundamentals, Collected Papers by David L. Parnas’, Addison-Wesley Professional, 2001) in the ’70s and was adopted soon after by object-oriented programming languages, component-based development, and service oriented architectures. But until now, the concept of information hiding has only received limited interest in the world of data warehousing.

The second principle of the DDP is called shareable specifications. Most reporting and analytical tools require specifications to be entered before reports can be developed. Some of those specifications are descriptive and others are transformative. Examples of descriptive specifications are definitions of concepts; for example, a customer is someone who has bought at least one product, and the Northern region doesn’t include the state Washington. But defining alternative names for tables and columns, and defining relationships between tables are also descriptive specifications. Examples of transformative specifications are ‘how should country codes be replaced by country names’, and ‘how a set of tables should be transformed to one cube’. In the DDP those specifications are centrally managed and are shareable. The advantages resulting from shared specifications are:

Easier maintenance of specifications: Unfortunately, in most cases descriptive and transformative specifications can’t be shared amongst reporting and analytical tools. So, if two users use different tools the specifications must be copied. The advantage of the DDP is that most of those specifications can be defined once and can be used by all the tools. Therefore, maintaining existing and adding new specifications is easier.

More consistent reporting: If all reporting and analytical tools use the same specifications to determine results, the results will be consistent, even if the tools are from different vendors. This improves the perceived quality of and trust in the business intelligence environment.

Increased speed of report development: Because most specifications already exist within the DDP and can be re-used, it takes less time to develop a new report. Development can focus primarily on the use of the specifications.

Currently, the simplest way to develop a DDP-based business intelligence system is by using a federation server. There are many federation servers available on the market, including Composite Information Server, Denodo Platform, IBM InfoSphere Federation Server, Informatica Data Services, Oracle BI Server, and RedHat MetaMatrix Enterprise Data Services. As example, the article Using Composite Information Server to Develop a DDP  describes how to develop a DDP with Composite’s product.

To summarize, the Data Delivery Platform is a business intelligence architecture that offers many practical advantages for developing business intelligence systems, including increased flexibility of the architecture, shareable transformation and reporting specifications, easy migration to other data store technologies, cost reduction due to simplification of the architecture, easy adoption of new technology, and transparent archiving of data. The DDP can co-exist with other more well-known architectures, such as the Data Warehouse Bus Architecture, and the Corporate Information Factory.


submit to reddit