The Data Maze January 2010

Over the last several years, the rapid explosion of data along with business clamoring for that data and the need to turn it into information faster and faster has driven our calm, staid data world into a new era. This new era demands the ability to capture data, rationalize it and disseminate it in a rapid fashion along with the ability to use that information as the foundation for decision making based on analytics.

This new emphasis on analytics has become the drumbeat for corporate America as it continues to compete in a global marketplace. Corporate leaders are learning that in order to survive and thrive they must optimize their business and incorporate analytics as a necessary tool in their competitive toolkit. Companies such as Google, Amazon and Progressive Insurance, along with many others, understand this new paradigm and are embracing it throughout their infrastructure and culture. This column will start to define how this groundbreaking new way of business is affecting some of the cornerstones of our information environment.

Before we delve into that topic, we need to make sure we have a common definition for what we mean by analytics. I referred to two sources, Wikipedia, which sometimes has to be taken with a grain of salt, and the renowned author on competitive analytics, Tom Davenport.

First, Wikipedia defines analytics as “…the science of analysis. A simple and practical definition, however, would be how an entity (i.e., business) arrives at an optimal or realistic decision based on existing data. Business managers may choose to make decisions based on past experiences or rules of thumb, or there might be other qualitative aspects to decision making; but unless there are data involved in the process, it would not be considered analytics.

From an expert’s viewpoint, Tom Davenport wrote in his book, Competing on Analytics, that “By analytics we mean the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions.

Both of these definitions define data as not only being the foundation for analytics but, more importantly, how successful companies are using revamped business processes to make use of analytic techniques and how their decision making is based on both fact-based data and information and the ability to use it optimally to drive decision making.

So how does all this affect a data professional in his or her daily life? It does in many ways, and a dozen or more books could be written and are being written on the multitudes of ways. In addition, because of the rapid changes currently underway, these books need to be published rapidly and updated frequently, which makes this column a good delivery mechanism to start broaching the subject. In future columns, I will expand into different areas from a data perspective on how and where we are affected, but for today’s column I will concentrate only on the database platform perspective.

Today, the majority of the modern corporation’s data is currently stored in a relational database such as Oracle, DB2 and SQL Server. These database management system (DBMS) vendors designed and implemented databases for transactional or online transactional processing (OLTP) systems. As data warehousing or decision support systems (DSS) grew, it made sense that since the enterprise licenses were already available and paid for and the labor skill set already in place that these same database platforms be used to store data for the DSS data stores such as the operational data stores, atomic data warehouses and departmental data marts that were and still are proliferating across the enterprise. Traditionally, these legacy database platforms were designed to update a small set of rows, for small and limited boundary queries and a “one size fits all” (OLTP and DSS) mentality. Because of the rapid change caused by competitive analytics, these legacy database platforms are no longer enough for the enterprise and a need for database platform specialization has come to pass.

This need for database platform specialization between OLTP applications and DSS applications has been proven in the marketplace because of three main reasons:

  1. Businesses are being inundated with data. Data is growing at an unprecedented pace not only because of additional transactional data and sources but because of the rapid growth of unstructured data and new streams of data from sensors and other types of instruments that feed data on an ongoing basis. Much of this data is a rich source of information, but it is currently just being stored and dropped because there are no resources to analyze it and convert it into competitive analytics.
  2. Businesses need information faster. As discussed earlier, the driving need for real-time analytics across the enterprise is driving architecture and design changes. Not only is the data deluge getting bigger, but businesses need it faster in a consumable format.
  3. Current software and hardware limitations hamper the previous two drivers. In today’s new world the ability to scale-up, scale-out and “parallelize” workload is an overriding requirement.

Database platform specialization has started to address many of these concerns through many methods and the proliferation of companies and products not seen since the advent of the web explosion. In future columns I will describe how the traditional relational DBMS vendors have extended their products to enable analytics more readily and delve deeper into the different types of specialized database platforms that are available on the market, how the field is exploding on a daily basis and how successful companies are leveraging these technologies for a competitive advantage.

As data professionals, we cannot afford to learn one or two new skills but must constantly learn and keep up with the fast-paced changes happening in our sphere of influence. We must be prepared when our management asks about ways to modify and/or enhance the current architecture and infrastructure to enable analytics across the enterprise to meet the pent up demand for these capabilities. To that point, as we both learn together, hopefully we will be motivated to take the necessary steps to broaden our knowledge so that we are prepared to support these future business requirements with an efficient approach.

Share

submit to reddit

About Dan Sutherland

Dan is an IT Architect at IBM specializing in business intelligence solutions and integrated data architectures. Over the past 20+ years, he has gained valuable experience working in multiple technical leadership roles defining requirements, architecting solutions, designing large scale relational database management systems using accepted design practices and successfully implementing systems on multiple software and hardware platforms. 

Top