The Data Forecast: (Un-)Reasons to do Data Analytics in the Cloud

COL01 - image for new column - edLet’s get this out of the way: you need to move your data analytics to the cloud.

Some people may read that and say, “Anthony, that’s your opinion. There’s a bunch of reasons to keep data analytics in the data centers that we spent so much time and money building.”

Nope. Sunk cost fallacy — go look it up.

There ARE reasons to keep some data out of the cloud. Like for operational systems that may not have always-on internet connectivity. But when you need to analyze that data, you should put it in the cloud. On-premises options for data analytics are simply too slow, limited, or expensive to justify continued investment.

As much fun as it might be to continue arguing by myself, that’s not what this article is about — that’s what the comments AFTER the article are about. (So bring it on! Isn’t the Internet great?)

Though there are far more than five reasons to move your data to the cloud, there were only five I could think of that start with “Un-“ before things got too silly. The goal here isn’t to provide a complete rationalization for your specific use case — but if you haven’t yet given the cloud much thought for data analytics, this article will hopefully shed enough light on the topic that you will seek out far-less-entertaining sources for more information.

So, with that, here’s five “un”-reasons to do data analytics in the cloud:

1. Un-planned Infrastructure Availability

Anyone who has built technology systems for any amount of time has had this happen. Your project finally gets the green light from the business, the finance people, and anybody else who likes to say no, and what happens?

You wait.

Well, actually, first you go through the hassles of ordering the new data center equipment, and then you wait for it to be processed, shipped, installed, configured, provisioned, and then you might be able to start working with it; typically weeks – if not months – of hassle and delay.

In the cloud, the process is roughly the same, except the steps all happen through a console on your PC. Oh, and it takes about five minutes from start to finish.

And that’s for planned infrastructure availability. Unplanned infrastructure availability takes the same amount of time once you have made the decision to do it. You’ll want to have some internal controls in place to keep people from going nuts with it, but that’s a problem worth solving to cut weeks or months off of every acquisition of new computing power.

After all, isn’t this how things should be? The hard part should be strategically deciding what to do — everything else is just friction between the potential value of an opportunity and realizing the value from it. Comparing it to the old ways of procuring infrastructure, the cloud makes traditional data center supply chain delays seem nothing short of ridiculous.

2. Un-limited Scalability

Speaking of ridiculous, at various points in previous, unenlightened, on-premises stops in my career, I would be building a new database environment and realize I’d need new servers on which to run my creation. I’d go beg for some money, get approval, and then it would take time to order the equipment.

Inevitably, I’d be contacted by somebody in a deeper, darker IT cave than the one in which I lived, and they would ask me the question I hated most: “So what specs do you need on that server?”

As a database nerd, I knew I wanted something good, but I had no idea how my database performance translated into the server specs. Knowing I’d only have one chance to buy the machine I needed, I wasn’t going to err on the side of underpowered or undersized. So I’d get the biggest, fastest, most expensive option I could get funded.

Every. Single. Time.

And the IT purchaser couldn’t have cared less — they just wanted their headache (me) to go away. The net result is that companies probably spent more than they needed to on servers, but their databases had plenty of processing headroom.

But what was the alternative, accidentally get too weak of a machine? To doom the project before one piece of software is written or applied? One could argue that this misalignment of computing power acquired versus what was actually necessary is just one more intrinsic cost of living in the old data center world.

Now with the cloud, we can manage infrastructure resources the way we develop code: trial and error. See if it works, if not, then make some changes. The scalability of the cloud is so limitless, it enables entirely new ways of working with infrastructure. I can start with my best guess, and then iteratively measure performance and change the underlying capabilities on the fly.

This is especially helpful in new environments that may receive little traffic initially, but will grow in usage over time. You can even configure environments to scale automatically based on usage metrics that are natively tracked. But the most exciting extension of unlimited scalability is not how big you can grow it, but how small.

3. Un-lock New Development Approaches

Some of the latest innovations in cloud technologies are micro-services and server-less computing. Server-less computing essentially allows the compute resources to exist only as long as you need to run a small application. Micro-service applications typically do a discrete task, like land an individual data file into a database table.

The magic is in how they are instantiated. You can schedule them to run at set times, but that misses the point. The beauty of micro-services are that they can be event-driven. Cloud service providers have built mechanisms that watch for particular conditions to occur, like files being saved to disk, and then fire up the micro-service applications that are set up to respond to those conditions.

This allows for much greater variability in how we process data. ETL should have moved away from rigid processing structures and once-a-day batches long ago. With micro-services and server-less computing as a backbone, we can change everything about how we process data and start breaking down the time and complexity barriers that cause delays between data being available and the business being able to act on it.

Just think, now you can truly build iteratively with data! As business requirements evolve, so can your technical approach and underlying hardware. In a pay-what-you-use model, you have perfect elasticity in keeping and enhancing what works while discarding the less effective approaches. Again, the limitation falls back to where it should be — on the ability of the business to make decisions, and the skills of the team that carries out those decisions.

4. Un-paralleled Power

As we see above, cloud technologies unlock capabilities that were recently only fantasy. An individual business can plug into effectively infinite computing power, changing resource allocations in constant response to its evolving needs.

The most exciting aspect is that we shouldn’t need to wait weeks or months to make new analytics capabilities available to the business. Turnaround times can be as short as minutes or hours for tasks that used to take weeks. The power of the cloud enables us to implement fully-interactive, data warehouse-driven solutions in the time it once took to deliver a basic report.

To accomplish this, however, we need to adopt new methodologies and train our people to interact differently: as real partners. Business and technology must solve problems together, iteratively, with the impact to the business always top of mind. Whether we choose to adopt a well-defined Agile Scrum approach, or simply change reporting structures and leave it up to individual managers, we must learn to operate in a new way.

It’s like the discovery of nuclear energy. Cloud technologies are that powerful in data and computational areas. As nuclear energy did for submarines, cloud technologies can be harnessed to provide businesses power that fundamentally changes what they can accomplish. Similarly, without proper controls and governance, things can rapidly get out of hand.

These are the kinds of challenges you want as a business. All the power you could ever use, with a model that allows you to start small, experiment, and build on top of what works. This leads to the bottom line: a cost/benefit perhaps more compelling than any other investment your business makes.

5. Un-matched Cost/Benefit

I often deploy for clients a cloud-based, columnar storage/massively-parallel processing data warehouse product. This product has all the scalability capabilities of the cloud, but can be had for about $1,000 per terabyte per year, and starts at about $250 a month all-in. It wasn’t too long ago that the only way in to this kind of technology was a dedicated data center appliance, with a $100,000+ price tag before you’d even turn it on.

Because the technology is so inexpensive at small scale, you can do a pilot project with real data, in just a couple of weeks from start to end. When is the last time any data-related project with a measurable business impact was done in 2 weeks? Most projects can’t even be scoped in 2 weeks!

Therein lies the opportunity, and the challenge. What we’ve taken a quick glimpse at in this article just scratches the surface of what the cloud is, and how it can impact your world of data and analytics. In every industry, there are competitors seizing this opportunity and upending the old institutions that seemed immune to threats.

In the short-term, it may feel easier to ignore the world changing around us, and seek comfort in the inertia of big boxes in big rooms with big price tags. But that decision may erode the competitiveness of your business to the point of no return. The decisions we make today about our technology infrastructure will have a greater impact on the future of our businesses than they ever have before.

The time is now to decide whether your company will be bold and join the disruptors, or avoid the cloud and hope the status quo will keep you competitive.

Good luck — and un-til next time, go make an impact!

Share

submit to reddit

About Anthony Algmin

Anthony J. Algmin was the first Chief Data Officer for the Chicago Transit Agency and is now Chief Data Officer for Uturn Data Solutions, a Chicago-based consultancy that helps companies use data and cloud technologies to get better at what they do best. For more information on OpenGrid or Uturn Data Solutions, contact Anthony at aalgmin@uturndata.com.

Top