The terms meta-data and metadata have been used, misused, and abused to the point that the real meaning is totally unclear and may even be unknown. These terms are an excellent example of the huge lexical challenge that data resource management faces today. The confusion about a concept as important as understanding the meaning of all the data in an organization’s data resource leads to increased data disparity and low quality data that do not fully meet an organization’s current and future business information demand.
The term meta-data has typically been defined as data about data, which is a tautology. That definition is neither comprehensive nor denotative and can be interpreted in many different ways through individual connotative interpretations. If a group of people were asked to define meta-data, there would be as many different definitions as there were people in the group.
The term meta-data has become quite meaningless and has led to numerous misconceptions about what meta-data represent and how they are used. It’s those errors that are seriously impacting formal data resource management.
The first common misperception is the difference between meta-data and business data. Meta-data traditionally represented data names, data definitions, data structure, data edits, and so on. They were data that helped people understand and manage their data resource so that it could be fully utilized to support the organization’s current and future business information demand. However, that traditional definition has expanded unreasonably due to poor definition and many connotative interpretations.
For example, publishers consider publication title, author, publisher, publication date, ISBN, price, and so on as business data. Other data about publications, such as reviews, sales, distribution, and so on, are considered meta-data. However, the financial officers in a publishing company consider sales, distribution, and so on as data, and the title, author, copyright, and so on, as meta-data.
Another example is the photo industry that considers the photo itself as a piece of data and the f-stop, speed, filters, time, date, location, and so on as meta-data. A software company refers to their products as the data and the details about those products as meta-data.
These different views have led to the statement One person’s data is another person’s meta-data. Different parts of an organization have a set of data useful to them that they consider business data, while any other data that are ancillary to the business data are considered meta-data. These views indicate meta-data are a perception about business data that varies with the observer.
A second misperception with meta-data is the spelling. The term originated as meta-data and was used consistently for many years. Ron Ross, in his Database Newsletter, which was published and copyrighted, wrote an article circa late 1980s that meta-data had come of age and deserved to be elevated to metadata.
Subsequent to that article, a company in California trademarked the term metadata and proceeded to threaten lawsuits against anyone using that term. They suggested using the terms meta-data or meta data instead. Such lawsuits would likely not have been won because that company failed to protect their trademark.
A recent Internet search showed over 27,000,000 hits for metadata and over 87,000,000 hits for meta-data, with literally thousands of those hits being definitions that are incomplete, overlapping, and conflicting. Many of the definitions are tautologies, such as data about data.
The term meta data is not proper in the English language. The dictionary defines meta as a prefix, meaning it cannot stand by itself. The dictionary goes on to list at least 20 words with meta as a prefix, such as metaphysics, metamorphosis, metastasis, metaplasia, metaphase, metathesis, metacarpal, metacenter, meta-galaxy, meta-fiction, and so on. Meta must be combined with a root word, either with a hyphen or by concatenation. Therefore, the only proper terms are meta-data and metadata.
The DAMA Dictionary of Data Management defines meta-data as:
Literally, “data about data”, data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes. Because the term ’metadata’ is a trademark of The Metadata Company, DAMA specifically uses the term meta-data.
That Dictionary goes on to define meta-data architecture, meta-data integration, Meta-data Management, meta-data repository, meta-data synchronization, administrative meta-data, business metadata, meta-data stewardship, descriptive meta-data, meta-data preservation, metadata process, meta-data rights management, structural meta-data, technical meta-data, and meta-data usage. That definition makes the management of meta-data sound like something totally different from the management of business data.
A third misconception about meta-data that complicates an already bad situation is the proliferation of terms like meta-meta-data, meta-meta-meta-data, and so on. Terms have been proposed clear up to meta-meta-meta-meta-meta-meta-data, often referred to as 6-meta-data. These terms are purely academic and quite esoteric, and do not warrant any explanation or use with respect to data resource management.
A fourth misperception is use of the term meta-model. Meta-model means a model about models, which is another tautology. However, when most people use the term meta-model, they really mean a meta-data model. Most discussions about meta-models are really discussions about meta-data models. Few discussions ever cover true meta-models.
A similar situation is use of the term meta-process, which means a process about processes. People using that term usually mean a process model. Similar confusion arises over the terms data standard and meta-data standard, and the terms data interchange and meta-data interchange.
A fifth misperception is that meta-data do not need to be normalized. Meta-data typically appear in many different software products, data modeling tools, database management systems, and other repositories. The same meta-data often appear multiple times within and across different products, and those multiple existences are often incomplete and out of synch. Any changes or enhancements to the meta-data are seldom applied to all instances of that data, making the situation progressively worse.
These misperceptions tend to portray meta-data as something magical and mythical that is managed separately from business data in an organization’s data resource. They imply that meta-data must be designed and managed separately from business data. These judgements are self-defeating and perpetuate the burgeoning quantities of disparate data, which leads to disparate meta-data, and in some situations massively disparate meta-data.
People frequently ask what is perpetuating the lexical challenge in data resource management. The answer is that people are simply pumping the words without realizing what they are saying or what the words really mean. Data management professionals must stop using these words without understanding their true meaning, and start using words and terms that have a comprehensive and denotative meaning based on roots, prefixes, and suffixes as defined in the dictionary. That is the only way to stop the lexical challenge and promote a formal data management profession.
The point has been reached where the term meta-data has become meaningless and cannot be resolved. A real meta-data fiasco is evolving and will not be resolved with any formal definitions or formal spelling of the terms meta-data or metadata. Any emphasis on formally defining meta-data or metadata, no matter how strong or by what organization, cannot and will not solve the meta-data fiasco.
The terms meta-data and metadata need to be abandoned! The cause has been lost!
This article is an excerpt from Michael Brackett’s Simplexity series of books available through Technics Publications.