Architecture For Cloud Based Processing

Data in the Cloud

For a variety of motivations, many organizations have decided to place data and processing on the cloud.

One approach to using the cloud is to just throw a lot of data onto the cloud. The cloud vendors advocate this approach.

But — for a variety of reasons — there is an architectural refinement to just throwing data on the cloud that can be made that greatly enhances economical and efficient usage of the cloud.

There is a much better way to manage data and processing on the cloud that is:

  • Much less expensive
  • Much more efficient
  • More functionally enabled
  • With no loss of any functionality

This approach can be called the “Architected Cloud Management Approach” and is described in this paper.

So, what can an architected approach to the management of cloud data do for you?

The Architected Cloud Management Approach can:

  • Save lots of money by reducing data on the cloud that you aren’t going to use in any case. The sheer removal of large amounts of data from the cloud pays you directly back, immediately.
  • Speed up queries. Now, you don’t have to do runaway queries. You will have much more control over exactly what data gets to be processed. This too saves you money.
  • Increase functionality. Now, you can immediately start to process textual data on the cloud, understand context, and use standard databases for analytical processing.
  • Not lose any functional capabilities of going on to the cloud. In fact, you improve analytical processing capabilities on the cloud.
  • Do all of this is done in a simple fashion.

Too Good To Be True?

Does this sound too good to be true?

Let us show you how… because it is true.

(And by the way, all of this is easier to do than you would have ever imagined.)

Basic Types Of Data

Our story for an architecture for cloud management begins with an understanding of the basic types of data you are putting onto the cloud.

There are three basic types of data on the cloud:

  • Structured, transaction-based data (the “standard” kind of data)
  • Textual data (conversations, memos, Internet snips, call center conversations, etc.)
  • Analog/IoT data (generated by machines)

Lots Of Textual Data

Of the different types of data, there is usually far more textual data than structured data. If there isn’t now, there will be tomorrow. The cloud vendor encourages you to put as much data on the cloud as you can.

The vendor makes money by charging you for the data on the cloud and for the analytical processing done against that data. The more data the vendor can get you to put on the cloud, the more you pay — either directly or indirectly. You pay directly for data on the cloud. But you pay even more indirectly for the queries that run against that data. The more data you have, the larger the query. The larger the query the more you pay.

So, what if you could put a lot less data on the cloud and still do all of the processing you want to do?

REPEAT: The vendor makes money by charging you for the data on the cloud and for the analytical processing done against that data. The more data the vendor can get you to put on the cloud, the more you pay – either directly or indirectly…

Other Issues

But the volume of data on the cloud is not the only stubborn and expensive issue. The next major issue is that of your ability to actually process and use textual data analytically on the cloud.

There are lots of reasons why textual data — on the cloud or anywhere else — is so hard to process analytically.

Textual data contains lots of verbiage that will never have any value analytically.

Consider a simple sentence: “I talked with my daughter when we were in the store.”

There may be some analytical value in the words “daughter” and “store” (although I doubt it), but it is farfetched to think that words like “I”, “the”, ”with”, “in” and so forth will ever have any analytical value. Yet, the vendor is asking you to pay as much for these words as any other. The problem is that there is a lot more useless textual data than there is useful textual data. Yet, the vendor asks you to pay for all of the data that is never going to be useful. Never.

In other words, the vendor wants you to pay endlessly for something you are never going to use. How much sense does that make?

Another major issue with the analytical processing of text — on the cloud or anywhere else — is the need for both text and context to be captured in order to do analytics. You can’t process text successfully without understanding the context of text, on the cloud or anywhere else. The problem with context is that it takes a very different form in text than it does in the structured world. It is harder to derive context from text than it is to actually handle the text itself.

Another major issue with text is the duplication and confusion of words that are found in language. For example, what does the word “fire” mean? Is it that happens when my house burns down? Is it what happens when my boss tells me I am no longer working at their company? Is it what happens when I pull the trigger on a gun?

In fact, the word “fire” is all of these things and more. But to understand text and to use it analytically, I have to have a very precise understanding of the word. And language is FULL of such anomalies. FULL!

And the list of obstacles of reading raw text and using the raw text for analysis goes on and on. When you store raw text in the cloud, you really can’t do much with it. You certainly cannot analyze raw text efficiently or effectively in the cloud. But the vendor makes you pay for it and it clogs up your processing, making queries inefficient and expensive.

So, it is not just the amount you pay for data storage and processing. It is the ability of your analyst to actually use what is there once you have put it on the cloud.

The Architected Cloud Management Solution

So… is there a solution to all of these problems? Yes, thankfully there is.

The solution is called the Architected Approach to Cloud Management.

Before textual data is placed in the cloud, it can be passed through data store management technology called textual ETL. With textual ETL, raw text can be reduced to a contextualized data base. This data base reduction:

  • Greatly reduces the amount of data you want to put on the cloud (and saves you money)
  • Greatly speeds up the queries that you do (and saves you money and time)
  • Greatly enhances your analytical capabilities because now you have a contextualized database that is suitable for immediate analysis. Now, all the barriers to doing text analysis are removed.

In other words, you have a viable solution with all of the advantages of the cloud and none of the disadvantages.

What About Unknown Future Analysis?

One of the immediate questions that is asked at this point is: By not putting all my text on the cloud, don’t I lose the ability to do future unknown analysis? What if I want to analyze something that I didn’t put on the cloud?

There is no problem. You simply decide what you want to analyze, go back to bulk storage, find what you are looking for, and then put it into a contextualized data base and place it on the cloud. So, you can still analyze any data that you deem to be interesting.

Only now, it is inexpensive and efficient to do that analysis on the cloud.

An Archival Facility

Furthermore, you can use bulk storage as an archival facility for all kinds of data – not just textual data. By moving data that is not being used and likely will not be used off of the cloud, you can reduce the amount of data you have on the cloud and speed up your processing on the cloud.

The Choice Is Yours

So, the choice is yours — do you want to do expensive, wasteful unnecessary processing on the cloud that consumes a lot of time and resources? Do you enjoy paying eternally for resources that you will never use?

Or do you want to take an architected approach, saving time and money, and making your analytical process streamlined?

The choice is yours.

Share this post

Bill Inmon

Bill Inmon

Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

scroll to top