Metadata is a much abused term that has entered mainstream vocabulary in recent years through the revelations about mass surveillance by agencies such as Government Communications Headquarters (GCHQ) in Great Britain and the National Security Agency (NSA) in the United States. Unfortunately, much of the data that was labelled ‘metadata’ in media commentary on surveillance isn’t – it’s actual data recording the facts of a phone call or an email.
Metadata is the stuff that is attached to a piece of data to give it meaning, context, and value. That ‘stuff’ includes technical metadata about the labels associated with data fields (e.g. “x_cust_lbl_en”), the format of the content of the data fields (e.g.” varchar (30)”, or “integer”), and the business metadata describing relationships between ‘things’ that are described by your data (“a customer has only one billing account”) and the rules governing interactions between those things. This kind of information is essential to understanding the map of the universe of information in your organization.
Yawn. Yeah, technobabble bores me too. Here are some other examples of metadata:
- The track length, genre, and beats per minute associated with the songs in your MP3 collection.
- The date and time you uploaded tracks to your MP3 collection, and whether you bought them from an online source or ripped them from a CD.
- The name of the person who authored a document in Microsoft Office, the names of users who edited the document, and when they edited the document.
- Star reviews for products on Amazon, movies in the cinema, or shows in the theatre.
So, why should non-technologists care about metadata? Here are five reasons why:
Provide Context for Decisions
What does the data mean? What is the significance of the fact you are deriving? How will the correctness of this information affect your decision? Metadata provides the context for assessing how much you can trust the information for your purposes.
An example from my personal experience is a management report that could not be trusted to give reliable information for decision making. Data was being extracted from two different systems and joined based on two fields – <Customer ID> in System A and <Customer Number> in System B.
Everyone assumed that this report would provide wisdom and insight. Instead it was producing rubbish. The root cause was found in the metadata. <Customer ID> in System A was a numeric field (just numbers) that uniquely identified a customer entity. <Customer Number> in System B was an alphanumeric field (letters and numbers) that actually identified a billing account that had already been mapped and linked to the Customer ID in System A. The report was trying to link an apple to an orange to get a count of the number of apples: all because of a poor business understanding of the relevant metadata.
Save Time and Reduce Your Stress
Over the years, I’ve sat in client offices working on Data Governance or Information Quality initiatives and observed the end of month or end of quarter reporting cycles go to management boards. Numbers are taken from different teams and disparate divisions. The numbers are put into executive board briefing packs, and then someone has the job of trying to reconcile differences in reported figures from different business units.
This results in late nights, fudged fixes, and inevitable questions from observant leadership. “Why does 2+2=5 for sales growth this quarter?” This is avoidable. One example from an organization I’ve worked with concerned variances in the value for a raw materials input figure depending on which part of the internal supply chain the figure was taken from. It turned out that the metadata showed the data was coming from three different modes of capture:
- One figure was supplied by the suppliers of the raw material as part of their billing process
- Another figure was supplied by the operational teams based on manually input information
- Another figure was obtained from the process machinery based on its measure of volume of materials consumed
All three figures had been given synonymous labels. But they measured different things, albeit within the context of the same process. Understanding that allowed management to pick a number that was their reference metric for that process, reducing effort required to produce quarterly reports.
Understand Information Flow
The example above also highlights the role of metadata in understanding the lineage of your data. It is important to know where the information you are basing your decisions on comes from. It can affect the quality of your decisions. Do you have the most up-to-date information? Is the data coming from a source that is trustworthy?
An example of this is Spotify’s streaming music service. Often, after searching for tracks by a particular artist, you might find yourself listening to cover versions of songs by those artists. A key contributor to this is the quality of recording artist metadata in the music industry.
In a business context, understanding the flow of information lets you better understand the impact of planned changes, determine the best point to measure the quality of data for different purposes, and helps inform your assessment of the ‘fitness for purpose’ of data for decision making. Understanding the flow of information (its lineage) is actually a key requirement of Regulatory controls across a range of industries, industries.
Moving to “Self-Service” BI Model
In the old days, if your report was wrong, you could kick a developer or blame IT. But if you are running your own reporting in a self-service Business intelligence environment you are responsible for the outputs. Whether it is a fancy tool like Tableau or the old-reliable Microsoft Excel Pivot table analysis of a data extract, you need to know that you are comparing apples with apples, and that you have the right apples. In a self-service environment you need to understand metadata so you understand the reporting you are producing, so you can then make the correct decisions based on that information.
Metadata Describes How Your Business Works
Metadata describes how your business works in reality. Regardless of the languages used in your organization, the inherent meaning of things can be mapped, understood, and communicated clearly and consistently.
I’ve lost track of the number of times I’ve seen the “ah ha!” moment in organizations when the assumptions about how data flows, how it is transformed, what it means, and how it works are validated and corrected through a holistic approach to defining the meaning of ‘things that matter.’
I’ve also lost count of the times I’ve seen strategically important business initiatives flounder in disputes and debates about the meaning and purpose of information in the organization because each stakeholder brought their own map of the information universe to the table and could not see where the overlaps and errors were. A proper business understanding of metadata helps you fill in the bits on your maps that are currently filled in “Here be Dragons.” Metadata matters. By being able to build a trusted map of information, and it’s meaning and purpose, organizations can better identify and mitigate information-related risks.
This article is based on an April 2016 blog from Daragh OBrien originally appearing on Adaptive, International.