The term “big data” has been bandied about for a number of years now, and it has gotten to the point where it has been used so much that it is a part of IT culture. It’s hard to specifically define, yet everyone seems to have a good idea what is meant by it; big data is here to stay. And that is a good thing!
I typically define big data as being the result of a confluence of trends that coincided at the same time. Incessant data growth, alternative data storage and management systems (Hadoop, NoSQL), improved analytical tools, AI and machine learning, cloud computing, social media, sensor-based data, and mobile computing have all contributed to what we refer to as ‘big data.’
Moreover, big data refers to the shift from not just disk-based data storage, but also in-memory storage and processing. It is not just relational, but also NoSQL—and not just DBMS, but also Hadoop and Spark. It is not just commercial software, but also open source. Not just on-premises data and computing, but also in the cloud. Take note that these shifts are not resulting in the replacement of technology and capabilities, but in the addition of it. Relational databases are not outdated or obsolete, but should be a core component of your multiple data platform strategy.
Furthermore, it is not just the technology we use, but how we are using it and what we are doing with it that is shifting. Big data is a result of the transition from mostly internal data to information from multiple sources; from transactional to add analytical data; from structured to add unstructured data; from persistent data to add data that is constantly on the move.
I’m sure you will recall the analyst definition of big data as consisting of four V’s: Volume, Velocity, Variety, and Variability. Although interesting, and a noble attempt at defining something so all-encompassing as big data, I don’t think it matters much.
Other analysts had denigrated the term big data altogether, saying that it is not about the volume of data so much as what we are doing with it. Well, sure, but that has always been the case.
To me, big data is so simple that it needs no definition. It is similar to saying big dog because you immediately know what I’m talking about. Big data is all about a lot of data. Big data doesn’t have to be NoSQL. And, you don’t have to sit there counting up your V’s to see if you’re doing it. Real-time analytics on large relational data warehouses qualifies as big data to me. And it should to you, too. Our heritage transactional systems are generating a large amount of data that is the most interesting for large enterprises to process in their big data analytics systems.
The point I’m making here is given in the title of this piece: It is all big data! And that is the way you should be thinking. How can we better store, manage, integrate, administer, analyze, and ingest all of our data to make better business decisions? How can we augment our data with partner data, social media data, and other sources of relevant data? What tools will help us do that?
If you are a DBA, then all of the management and administrative tools that you use or need to manage databases at your organization are big data tools. By adjusting the way you think about your requirements, you can focus your budget requests to hit that “big data budget” and perhaps finally get those performance or recovery tools that you’ve needed for years. The amount of data DBAs are managing is growing at many times the rate at which the number of DBAs is growing, so management and automation tools will be imperative to succeed.
Although I’m usually skeptical of industry trends, this one is different. Many recent IT trends have been process-orientated (e.g., object-oriented programming, web services, SOA), but I believe that data is more important than code. As I’ve stated before, applications are temporary, but data is forever! And if the big data trend helps us better protect, administer, and use our data, then I’m all in favor of it.
We can use the rise of big data to the forefront of computing as a means to improve data quality, institute data governance, and pay more attention to our data management infrastructure. After all, if you’re going to have big data, it had better be good big data. Big data forever!