The old BI and data warehousing architectures of the past can’t step up to the challenges of big data, analytics, and self-service. Yet many proposed modern data architectures fail to address the realities of legacy data warehousing and BI. Few organizations can take a green field approach to big data analytics. We need architecture that gracefully accommodates legacy BI and recognizes the need for evolutionary and iterative modernization projects.
A Common Sense Approach to Modern Data Architecture
There is no doubt that data architecture must change. It must adapt to the age of big data, analytics, and self-service. The data warehousing and BI architectures of the past can’t gracefully handle the many innovations of the past decade. Each of these changes – big data, analytics, and self-service – is a mix of great potential and big challenges when taken individually. Putting them together creates an environment of abundant opportunity and extreme complexity. Carefully considered architecture is the key to managing complexity without undue complication. We must modernize data architecture. The hard questions are about what and how to modernize.
Let’s begin with a quick look at where we’re coming from. Many companies today still have at least one foot in the old world of legacy data warehousing and BI. That world is characterized by linearity, structure, and latency (see figure 1).
Inherent in this architecture are several characteristics that are often barriers to realizing the full potential of big data and self-service analytics:
- Linear data flow and work flow
- Structured enterprise data
- Batch processing with corresponding data latency
- Rigid infrastructure where scale-up is the dominant growth strategy
- Central services with high dependency on IT departments
Modern data architecture must accommodate multi-directional data flow, iterative processing, unstructured and external data, stream processing, low-latency and real time data, scale-out growth management, and self-service with autonomy. This begins with a new information supply chain where we think of data in a different way. We have traditionally treated data as a thing to be stored, processed, and managed as a technical asset. Today we must manage data as a resource that is available and accessible to all who need it. Think of data not as something that is static and stored, but as something live, dynamic, and flowing through every business process.
Live and dynamic data is achieved through a new kind of information supply chain that is iterative, intelligent, and adaptive. The five stages of the information supply chain are:
- Ingestion to bring data into the analytics ecosystem
- Cataloging to manage the inventory of datasets and supporting metadata
- Preparation to improve, enrich, format, and blend data
- Analysis to explore, model, and visualize data
- Action to turn analytic insights into business results
The new supply chain depends on enabling technologies for data pipeline management, data cataloging, and data preparation. See my report on Big Data Management Software for the Data Driven Enterprise for more about these technologies.
The new information supply chain becomes the basis upon which a modern analytics architecture (see figure 2) is based.
Comparing the modern analytics architecture with traditional BI architecture, several shifts become apparent:
- From linear data flow and work flow to multi-directional data flow and iterative work flow
- From structured enterprise data to structured and unstructured data, both enterprise and external
- From batch processing and data latency to batch, stream, and real-time processing to deliver right-time data at the speed of analysis
- From rigid infrastructure and scale-up growth to elastic infrastructure with scale-out growth
- From central services and dependency to self-service and autonomy
Perhaps most importantly, this is an architecture that doesn’t discount or diminish the past. It fully supports legacy integration. Few organizations can take a green field approach to modern analytics. Most still have at least one foot in the old BI and data warehousing world. A pragmatic architecture accommodates the past (shown in blue) as well as the future (shown in orange) and fully supports evolutionary and iterative modernization.