Overview
Or like this:
Or like this:
The BI stack is usually thought of as a multi-layered, multi-functional pie, performing various functions including:
- Connecting to data sources
- Loading data into a data warehouse
- Subjects the data to various application queries
- Presents the data in various forms such as simple KPIs, dashboards, or complex analysis.
No matter how the BI stack is constructed its core service is to act as a data funnel. Its function is to constrain data as it moves through the funnel.
Data can be constrained by an ETL process that selects a subset of data, transforms it into a rigid data model of a data warehouse or a multidimensional or columnar database. Data can also be constrained by application logic through report and dashboard definitions, or KPI logic.
We tend to forget that a BI stack is just a means to an end, a tool to support our data aspirations. Some BI stack users are interested to know how “dirty” their data is while other users are interested in making informed decisions based on data.
Today, the two forces impacting the BI stack are:
- New BI tools breaking down the functional layers of the BI stack
- BI users want functionality they need instead of a general purpose “swiss-knife” like solution
There are three technical trends affecting the BI stack:
- Data & user-centric view of the BI stack funnel
- BI stack compression
- BI stack plurality
Data/User Centric View of the BI Stack FunnelThe main aspect of the BI stack data funnel is that it always starts with unconstrained or loosely constrained data that becomes more and more constrained as it travels through the funnel. The less constrained the data, the more complex processing this data requires, and the more skilled the user needs to be.
Think about Hadoop without Hive or Pig. The promise of Hadoop is that you can keep a lot of data with very few constraints and, in theory at least, this allows for any kind of analysis and intelligence. The catch: This great tool also requires the great skill of an experienced software developer.
The minute there is a SQL interface on top of Hadoop, such as Hive or Pig, life gets easier. A SQL programmer can now use the tool but also fewer things are possible since we just constrained our distributed storage with a semantic model of SQL.
What if we apply an ETL process and pull data out of Hadoop into a data warehouse? We just introduced another set of data constraints. Not only do we have a semantic layer limiting what we can do with the data but we’ve constrained the data set. But life just got a little easier because a data analyst can have access to the data warehouse.
If we continue the process all the way to reports and dashboards and KPIs we end up with increasingly constrained data.
“The key take away is that the level of constraints imposed on data determines the skill level of the user that can deal with this data analytically.”
Is it true that constrained data really mean less analysis, or does that matter? Let’s explore.
BI Stack Compression There are numerous products that qualify as BI tools without being a classical BI stack.
- Analytical engines such as Aster Data (Teradata), Vertica (HP), Greenplum (EMC)
- Desktop tools such as QlikView, Tableau, and Excel
- Analytical tools for clickstream data such as Google Analytics and Omniture (Adobe)
Take Excel, the most broadly used analytical tool.
It can:
- Connect to a lot of different data sources
- Transform data while loading into memory
- Has a lightweight data warehouse in the form of an in-memory engine
- It can create reports, kind of
All the elements of the BI stack are in place along with very important data constraints in the form of a row limit and a grid based data model, ultimately rendering the tool useless for a lot of different analytical applications. These data constraints also define the end user audience for Excel, users that do not know how to manipulate data in a database or a data warehouse but who know hundreds of Excel functions.
Each one of these products simplifies the BI stack by compressing it through fusion of different functional layers while focusing on a specific end-user audience. Constraining the data makes it easier for this audience to consume and explains why the products have had phenomenal success in the market place.
Does this suggest that BI professionals should pick up the best of breed BI tool for the task at hand and forget about general purpose BI products? Let’s continue.
BI Stack PluralityThere are two notable trends in the BI world. The BI stack is getting compressed while BI users have access to more specialized tools. The legacy vendors such as Oracle, IBM, SAP, MicroStrategy, and Microsoft are the proverbial elephant in the room. Many organizations are using a BI solution from one or more of these vendors.
The answer to this “plural stack” conundrum can be summarized as:
- Understand the different BI user roles that exist in your organization
- Understand the different levels of constrained data and how it maps to different BI user roles
- Understand how to create and store data sets with different constraints
- Implement a BI stack funnel whose main purpose is to create different data sets with different constraints
- Treat each data set as a starting point for a “mini BI stack” that is highly specialized to address the needs of its end user audience
ExampleA BI stack implementation supports:
- Data sources – files, databases, message busses
- Data cache – a snapshot of a data source brought into memory or some other transient storage
- In-Memory engine – a columnar database or OLAP
- Reports – report definitions with report data
Mini BI stacks supporting these data sets include:
- Tools to understand data distribution, data quality, data integrity
- Tools to explore data analytically through the data model employed by the data cache
- Tools to do multidimensional analysis of data
- Tools to analyze report data, trend analysis, search, and KPIs
The most important thing is that mini BI stacks don’t have to be limited to traditional functions for a particular data set. They can be a more complete functional set usually attributed to a full BI stack.
For example, multiple instances of the same report created for different dates could be turned into a multidimensional cube by simply manipulating the data that is already present in the report. Similarly, one could analyze raw data without putting it into a database first. Both cases are examples of highly compressed BI stacks
Conclusions
- Constrained data does not mean less analysis
- The best of breed BI tools can be combined with the classic BI stack
- Compressed BI stacks can be added to a classic BI stack
- The right classic BI stack is the one that closely resembles a data funnel that has integration points for BI tools and compressed BI stacks