Modernizing Data Architecture

ART01x - feature image - EDThe old BI and data warehousing architectures of the past can’t step up to the challenges of big data, analytics, and self-service. Yet many proposed modern data architectures fail to address the realities of legacy data warehousing and BI. Few organizations can take a green field approach to big data analytics. We need architecture that gracefully accommodates legacy BI and recognizes the need for evolutionary and iterative modernization projects.

A Common Sense Approach to Modern Data Architecture

There is no doubt that data architecture must change. It must adapt to the age of big data, analytics, and self-service. The data warehousing and BI architectures of the past can’t gracefully handle the many innovations of the past decade. Each of these changes – big data, analytics, and self-service – is a mix of great potential and big challenges when taken individually. Putting them together creates an environment of abundant opportunity and extreme complexity. Carefully considered architecture is the key to managing complexity without undue complication. We must modernize data architecture. The hard questions are about what and how to modernize.

Let’s begin with a quick look at where we’re coming from. Many companies today still have at least one foot in the old world of legacy data warehousing and BI. That world is characterized by linearity, structure, and latency (see figure 1).

Figure 1: Traditional BI Architecture

Figure 1: Traditional BI Architecture

Inherent in this architecture are several characteristics that are often barriers to realizing the full potential of big data and self-service analytics:

  • Linear data flow and work flow
  • Structured enterprise data
  • Batch processing with corresponding data latency
  • Rigid infrastructure where scale-up is the dominant growth strategy
  • Central services with high dependency on IT departments

Modern data architecture must accommodate multi-directional data flow, iterative processing, unstructured and external data, stream processing, low-latency and real time data, scale-out growth management, and self-service with autonomy. This begins with a new information supply chain where we think of data in a different way. We have traditionally treated data as a thing to be stored, processed, and managed as a technical asset. Today we must manage data as a resource that is available and accessible to all who need it. Think of data not as something that is static and stored, but as something live, dynamic, and flowing through every business process.

Live and dynamic data is achieved through a new kind of information supply chain that is iterative, intelligent, and adaptive. The five stages of the information supply chain are:

  • Ingestion to bring data into the analytics ecosystem
  • Cataloging to manage the inventory of datasets and supporting metadata
  • Preparation to improve, enrich, format, and blend data
  • Analysis to explore, model, and visualize data
  • Action to turn analytic insights into business results

The new supply chain depends on enabling technologies for data pipeline management, data cataloging, and data preparation. See my report on Big Data Management Software for the Data Driven Enterprise for more about these technologies.

The new information supply chain becomes the basis upon which a modern analytics architecture (see figure 2) is based.

Figure 2: Modern Analytics Architecture

Figure 2: Modern Analytics Architecture

Comparing the modern analytics architecture with traditional BI architecture, several shifts become apparent:

  • From linear data flow and work flow to multi-directional data flow and iterative work flow
  • From structured enterprise data to structured and unstructured data, both enterprise and external
  • From batch processing and data latency to batch, stream, and real-time processing to deliver right-time data at the speed of analysis
  • From rigid infrastructure and scale-up growth to elastic infrastructure with scale-out growth
  • From central services and dependency to self-service and autonomy

Perhaps most importantly, this is an architecture that doesn’t discount or diminish the past. It fully supports legacy integration. Few organizations can take a green field approach to modern analytics. Most still have at least one foot in the old BI and data warehousing world. A pragmatic architecture accommodates the past (shown in blue) as well as the future (shown in orange) and fully supports evolutionary and iterative modernization.

Share

submit to reddit

About Dave Wells

Dave Wells leads the Data Management Practice at Eckerson Group, a business intelligence and analytics research and consulting organization. Dave works at the intersection of information management and business management, where real value is derived from data assets. He is an industry analyst, consultant, and educator dedicated to building meaningful and enduring connections throughout the path from data to business value. Knowledge sharing and skills development are Dave’s passions, carried out through consulting, speaking, teaching, and writing. He is a continuous learner – fascinated with understanding how we think – and a student and practitioner of systems thinking, critical thinking, design thinking, divergent thinking, and innovation. He can be reached at dwells@eckerson.com.

  • Ted Hills

    Hear, hear! Many established data architectures, such as the warehouse, continue to have value. The challenge is to incorporate the new with the old.

  • Martijn ten Napel

    It would be nice if for once people would draw architectures based on use cases, requirements and consequences instead of technological flows.

Top