Metadata has been with us since the first human smeared charcoal from their campfire and placed the mark of their hand on the wall of their prehistoric cave.
That hand-print could signify:
- Someone staking a claim on the cave (in case some other proto-human wandered in while the inhabitants were out hunting and gathering)
- A philosophical statement of identity
- A pretty picture of a hand
What is missing is the context that describes for us what that hand print is and what it means. It is mysteries like this that have archaeologists and anthropologists puzzling over pictures of animals in prehistoric cave paintings, wondering what they represent.
If we could pull a Neanderthal from the Lascaux caves in modern-day France forward in time (and if, by some sheer fluke of science fiction she was able to speak perfect English), she’d be able to explain the animal paintings to us and clarify what they mean. But we can’t, so archaeologists and anthropologists have to dig around for artifacts and clues about what the drawings mean, or they have to make guesses based on studies of other hunter-gatherer tribes that exist today. (And UFO fanatics point at the drawings and say “That’s a spacesuit” and decide we’ve been visited by Ancient Aliens, but that is a discussion for another day).
What is missing is metadata. Metadata is often described as “data that describes other data.” More importantly, it is the data that provides the context for the piece of data that it is associated with to produce meaningful information. For example, if I was to write “007” and “Russia” on a whiteboard, what meaning would you draw from that? Am I talking about a certain fictional spy? Am I cryptically referencing a movie featuring that fictional spy?
If I give you the context that “007” is an international access code for direct dial telephone numbers in a particular country, that piece of metadata allows you to infer, with some confidence, a meaning:
“007 is the international access code for Russia”
That fact represents a piece of business metadata that gives context to some raw data, and allows a fact to be derived. There are other types of metadata.
Technical metadata describes the format and flow of information through your organization’s systems. From the mundanity of field lengths in a database table, to the challenge of identifying the source and target for data transfers in the log files of the organization, this technical metadata can tell you things like:
- How long a string of text is allowed to be (and how long it is in a given instance)
- What source system a particular data set has come from and when it was uploaded
- Where data fields are being used in downstream processes in the organization
From this technical metadata, analysts can begin to build a map of the meaning and purpose of information in the organization, and can begin to develop classifications of that information in the overall architecture.
Another type of metadata is data relating to the classification of data. This could be a security classification (e.g. Commercially Sensitive, Confidential, Top Secret), or it could be a classification based on the type of data itself (e.g. Personal Data, Sensitive Personal Data, Personal Financial Data). This provides yet another layer of context and meaning to data that informs the potential uses and access rights for that data.
From this classification metadata, staff working in Risk and Compliance roles can identify critical areas of risk to the organization, and staff working in data handling and processing will be able to identify the appropriate controls and precautions to apply when accessing, sharing, or using that data.
Metadata is important because it is what gives meaning and informs action. It helps provide a map of the information landscape, but more importantly, it provides context and meaning for that picture so that technical analysts and business stakeholders can infer the correct basis for action or insight from the data they are presented with.
- Without metadata, you would not know if a Red flag on a project status report was a good thing or a bad thing.
- Without metadata you would not know until AFTER you had tried to migrate data from one database to another if the fields were correctly mapped, or if they were of the right type, or if there was any data truncated or lost because the field lengths were different.
- Without metadata you would not know if the data which you are about to send out as an email attachment is subject to any regulatory constraint.
What Does it All Mean?
Somewhat ironically, that question is its own answer. What does metadata mean? It gives meaning to ‘all of the things.’ Without metadata, you are left staring at the shadow of a hand print on the cave wall of your organization, wondering what the heck it means and if it is important.
This is why metadata is important to the business. It’s why metadata needs to be managed and governed by the business. It’s why metadata is not a technical or technology issue. By encoding meaning, metadata enables effective internal and external communication about facts, events, and concepts. And, ultimately, Information Management is all about communication.
[A shorter version of this piece first appeared on Adaptive.com’s blog in January 2017. The original article can be found here: http://blog.adaptive.com/what-is-metadata-and-why-does-it-matter]