Getting Big Data Right: A Checklist to Evaluate Your Environment

ART01x - image EDAnyone who has been awake for the past few years cannot but help to have noticed that there is a groundswell of interest in Big Data.

Big Data is the technology that has the following properties –

  • Very large amounts of data can be handled
  • The storage medium for the data is inexpensive
  • Data is managed by means of the “Roman Census” method
  • Data managed under Big Data is unstructured

Big Data certainly has potential. There is a wealth of information that is available in Big Data. But the reality of unlocking that potential is such that most corporations are failing. Consider the three following different anecdotal points of reference –

Article WALL STREET JOURNAL, Dec 2013 – “a recent survey states that the return on investment of Big Data has been a disappointing $.55 per dollar spent…”

Manager of Big Data for a large consulting firm – “in the past 18 months our firm has done over 150 proofs of concept for Big Data. Five of those projects ended up going into production. The remaining projects were abandoned. The failure rate for Big Data is over 95%…”

Large financial institute – “we have been working on Big Data for two years now. We have bought everything our vendors have told us to buy. We have tried everything our vendors have told us to try. But after two years time we have nothing in the way of business value to show for our efforts and investment.”

So, there is no question that there is great potential in Big Data, and there is no question that organizations have put a lot of resources and effort into trying to make Big Data succeed. But where are the concrete results? In fact, is it possible to find concrete results? And why are concrete results so difficult to come by with Big Data?

To that end we have written this short simple paper – what do you need to do in order to achieve concrete results with your Big Data project?

There are twelve recommendations that we need to make in order to ensure concrete results –

1 – Look for Business Value Before You Start Building Your Big Data Infrastructure

Do not build your Big Data infrastructure and hope that you can find business value. Before you start to build even the first part of your Big Data infrastructure, you need to have a clear idea of what you expect to find and what value that will have for your organization. In addition, you need to understand whether your value is in the form of operational, day to day data or whether you expect the value to be in the form of informational, analytical data. Before you start, you need to have a very clear idea what output you expect, the business value of that output, and whether that output will be used operationally or informationally. If you do not have a clear and concise definition of your expectations before you start, you should not be doing a Big Data project.

2 – Know How Your Are Going to Analyze Your Results When Your Find Them

The structure of Big Data is fundamentally different from the structure of classical data base management systems. You cannot use classical analytical tools against Big Data. But in order to for you to derive business value out of Big Data you must determine how you are going to do analysis. Analyzing Big Data is a completely different proposition than analyzing classical structured systems.

3 – Understand the Difference Between Search and Analysis

There is a fundamental difference between search and analysis. Search is a simple count of objects. Analysis requires context in order to qualify the object during the search process. In order to do analysis you need to be able to derive the context of your Big Data. Do not think that just because you can count raw data that you can analyze it. You need to understand that there is a fundamental difference between search and analysis. In almost every case, in order to get business value from data you have to do analysis, not a simple count of objects. If you don’t understand the difference between search and analysis you shouldn’t be doing a Big Data project.

4 – Build Your Big Data Infrastructure in an Iterative Manner

Do not attempt to build your Big Data infrastructure in a Big Bang approach. There is no reason to build your environment all at once and there is every reason to build your infrastructure a step at a time. Building it a step at a time ensures that you can make mistakes and have minimal consequences. Given that the Big Data environment is brand new, it is a sure thing that there will be mistakes. Make sure that the consequences of those mistakes are minimal and recoverable, not large, expensive, and politically embarrassing.

5 – Make Sure You Know How You Are Going to Relate Your Big Data Environment to Your Existing Operational / Analytical Environment

Many shops build their Big Data environment as if the Big Data environment were going to exist on a different planet than the existing operational/analytical environment. Such will NEVER be the case. Such mundane subjects as how to transfer data, how to find and compare keys, how to audit the quality of data, how to create a unified effort/result from the Big Data/operational/analytical environment needs to be addressed BEFORE the Big Data infrastructure is built. The successful Big Data environment is one that is integrated smoothly with the existing corporate analytical environment.

6 – Plan for the Separation of Useful Big Data from Less Than Useful Big Data

It is inevitable that some Big Data will be more useful than other Big Data. That simply is the nature of Big Data, and trying to lump all of your Big Data together is a terrible strategy. You need to be able to separate your Big Data according to its usefulness. Determining that some data is less than useful does not mean that you should throw the data that is less useful away. It simply means that there is a hierarchy of data based on importance and usefulness of data. This hierarchy should be recognized in your architecture.

7 – Be Open to the Fact that You Will Have Multiple Vendors

No one vendor has a complete solution (despite what the vendor tells you.) If you say that all you are going to use is technology from a single vendor, then you are going to greatly limit your chances of success.

8 – Have a Clear Idea Who Your User Is

Is your user the IT department? Is your user marketing? Sales? Finance? Management? In many cases it is not clear who the user is. It really helps you to understand your objectives if you understand who your user is (and is not). There are a thousand good reasons for catering to your ultimate corporate end user – political, economic, technological, and so forth.

9 – Have a Clear Measurement of Success

Unless you have a clearly stated objective and a clearly stated means of measuring success, you will never be able to tell whether your Big Data project has been successful or not. The measurement of success can take many forms. It can take the form of availability of new data, of new queries being written and satisfied, of increased revenue, of increased sales leads, and so forth. If you are serious about success with a Big Data project, you will outline the criteria for success at the outset.

10 – Enable Exploration of Data

One thing is certain with Big Data, and that certainty is that there will be new data to exploit. But exploiting data is an art. An infrastructure is required for exploitation, but the right people with the right motivation are required as well. Exploration of data requires a different mind set and a different set of skills than most organizations are staffed for. Most organizations are geared up for creating and examining a set number of key performance indicators (KPI’s). While the organization certainly needs people with the repetitive KPI mindset, with Big Data there needs to be a complementary set of skills that are geared for finding new KPI’s, and new opportunities.

11 – Understand Textual Ambiguation

All Big Data is unstructured. As such, there is NO context to be found with Big Data in the normal sense of context. There are no attributes, no keys, no records. But context is necessary in order to do sophisticated analytical processing. Therefore it is mandatory that the organization understands how to do textual disambiguation. With textual disambiguation, the context that is naturally in the unstructured text is found and structured into a form that is familiar to analytical processing. The problem is that vendors of Big Data technology have little or no understanding of the technology of textual disambiguation.

12 – Support Metadata

Underlying all of Big Data is the fact that in order to do effective analytical processing, it is necessary to support metadata. In Big Data, there are different forms of metadata, and all of them are needed in order to effectively use and analyze the information found in Big Data.


A Simple Self-Readiness Test

A simple little self-readiness test can then be constructed. This test is like solitaire. You are only cheating yourself if you fudge on the answers –

1 – Do I have a clear vision of my business objectives for Big Data?   Yes/no

2 – Do I know how to do sophisticated analysis on Big Data when I get it captured? Yes/no

3 – Do I understand the differences between search and analysis?   Yes/no

4 – Is my infrastructure built (or to be built) iteratively? Yes/no

5 – Can I relate the data found in my Big Data environment to the data in my existing analytical environment? Yes/no

6 – Do I know how to separate my useful Big Data from my less then useful Big Data? Yes/no

7 – Am I open to technology from multiple vendors? Yes/no

8 – Do I know who the end user of the information found in Big Data will be? Yes/no

9 – Do I have a clear and concise way to measure the success of the Big Data project?   Yes/no

10 – Do I have the people and tools with the knowledge of how to explore new types of data? Yes/no

11 – Does my vendor or I know what textual disambiguation is, and why it is central to success with Big Data?  Yes/no

12 – Do I know how I am going to identify and support metadata from the Big Data environment?   Yes/no


If you scored 12 yesses, your chances of success are very high. If you scored from 9 to 11 yesses, you have a reasonable chance of success. If you scored from five to eight yesses, you would be advised to do some more research and preparation before you waste money on a Big Data project. If you scored less than five yesses, Big Data is almost sure to be a wasted effort in your environment.

The truth is that ALL of these factors are needed in order for Big Data to be a success. Unfortunately, MOST of them are ignored by the vendors of Big Data. The vendors of Big Data focus on what is familiar and known to them. If it is out of their comfort zone, then they simply ignore the factor and try to push more of their technology down the throat of their customers. At the end of the day, the vendor is there to sell his/her product, not to make your organization successful.


submit to reddit

About Bill Inmon

Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.