You’ve all heard the expression before: “Go jump in a lake.” It means the same thing as “Get outta town” or “I don’t believe you.” These days many organizations are jumping into a lake – a Data Lake – a relatively new feature of the data topography of many companies. It’s all in the data.
What is a Data Lake? Some sources say that a Data Lake is a place to keep your data; it holds a vast amount of raw data in its native format until it is ready to be used. If not managed appropriately, organizations have come to call these things “data swamps” (or even “data cesspools”). Your organization may, in fact, be setting sail onto your first data lake.
But that’s not what this column is about. I want to talk to you about how “data lakes” are the latest example of the slang used in the data management industry.
Data people are unique. Believe me that I know this first hand. People in my industry come up with awkward names for different aspects of data or managing data. To the data people these names seem totally logical. To non-data people these terms just muddy the water.
In this column I want to share with you some of the “slang-iest” data terms being used today and definitions that will help you to understand what the heck your data people are talking about. Here is a brief list of most-searched on data terms and a simple definition of what they mean in most situations:
Meta Data is a term that has been newsworthy in recent times. Meta data (or metadata as it is commonly used) is data about the data (or in news cycles – information about the Snowden phone calls rather than the conversations themselves), data documentation, so to speak, that defines such things as what the data is called, a business description of the data, and where the data has come from and the valid values that the data can take on. Metadata is the backbone to successful data management as it improves the value and understanding of the data, which results in better usage.
Big Data is just what you might expect. Big data is the name given to sets of data that are so large and complex that traditional tools to process the data – so it can be used in applications and software packages – are inadequate to deal with them. Big Data is often considered in terms of several Vs – Volume, Velocity, Variety, and Veracity.
Small Data needs explanation. Small data are tiny (by comparison) and finely-tuned sets of data that are used to serve a specific purpose for a selected audience. Small data is a newly-added term only recently making it into the data management industry. Small data will become a more interesting topic once organizations begin showing value from their Big Data initiatives.
Smart Data is less straight forward. Smart data is data that is formatted so it can be acted upon both where it is collected and then downstream in an analytical platform. In the analytical platform, further data consolidation and analytics take place. What makes this data smart is the advanced thought and design that is put into how the data will be immediately fit-for-purpose.
Data Warehouse is the grand-daddy of all data terms. A Data Warehouse is not a building where the data is stored. A data warehouse is a data resource, or a system, designed specifically for reporting and data analysis. The data warehouse is considered a core component of a business intelligence strategy. Metadata about the data in the warehouse is a core component provided to bring successful return on investments in data warehouses.
Data Lakes is also a relatively new term. As I wrote earlier, a data lake is another place to keep your data, that holds a vast amount of raw data, in its native format, until it is will be used. Data Lakes are initially undocumented (little metadata) until the time is such that the data must be made digest-able to the business and analytical communities.
And finally…
The Internet of Things is a term that is popping up more and more these days. The Internet of Things (or IoT for short) is the network of physical devices, vehicles, home appliances, and other items embedded with electronics, software, sensors, and network connectivity, which enable these objects to connect and exchange data. You can think of the IoT as D2D (Device to Device) data exchange the same way that B2B in focused on business to business exchange.
There are new words, terms, and phrases being added to the data landscape year over year. Hopefully this quick fix of new and old data terms will help you to understand the data people’s lingo and get you started with asking the question about how these “data things” are related and can add value for you and your organization. As I have told you before … It’s all in the data.