Metadata is all about integration of data across the various levels of data management abstraction (e.g., Zachman Framework) – business, application, data, and technology. This integration is found both across and throughout the business and technical domains found in a company. There are a few key points about metadata to remember here:
- Metadata is data about integration
- Metadata tenders the characteristics that allow data to be located, understood, and consistently used and reused
- Metadata enables the transformation of data into information and actionable insight
- Metadata connects data historically and process-wise
- Metadata provides data with a context that enables people to think about and share data in useful ways
- Metadata is data – expected content, structure, and context – that imbues meaning and facilitates managing information
Famed English writer Samuel Johnson once said, “Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it” (James Boswell, Life of Samuel Johnson LL. D., 1791). Metadata is data that helps to describe a data element, thereby enabling it to be located, accessed, linked to other data, properly secured, and trusted. The scope of Metadata is large, and becoming larger as new business activities and technologies spin off even more metadata.
There is an authoritative reason why Metadata plays an important role in Data Management, especially as it pertains to the Data Repository environment and Enterprise Advanced Analytics – In the Data Repository environment, the first thing an analyst needs to know is what data is available and where it is found.
- When an analyst receives a request, the first thing to know is what data might be useful in fulfilling the query
- The Metadata inherent in the Data Repository is vital to the preparatory work done by data analysts
- Not having Metadata has a huge negative impact on any data integration effort
Although new types of Metadata emerge constantly, for simplicity, the Metadata Repository contains four types of Metadata defined broadly:
- Business
- Technical
- Operational
- Organizational
Until recently, Metadata served only the Information Technology professional. As Metadata informs and provides context for business data, it now plays a much more active and important part in information management and integration across all environments and domains. Metadata in the integration environment serves the decision support community and requires a much more formal and intensive level of support than just the information technology community.
Metadata Defined Further
The Gartner Group and others state, “Metadata is data about data. It describes data content but it is not the content.” Heretofore, this has been the relied upon definition, which originated from Information Technology Data Administration work over the past few decades. Frustratingly, many attempts to define Metadata have emerged.
- “Metadata is defined to be data that defines and describes other data.” From ISO/IEC 11179 [Metadata Registries]
- NICO [Understanding Metadata] states, “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.”
- Open Data Support provides, “Metadata provides information enabling to make sense of data (e.g. documents, images, and datasets), concepts (e.g., taxonomies, classification schemes) and real-world entities (e.g., people, organizations, places, images, products).”
- USGS considers, “Metadata describe information about a dataset, such that a dataset can be understood, re-used, and integrated with other datasets. Information described in a Metadata record includes where the data were collected, who is responsible for the dataset, why the dataset was created, and how the data are organized. Metadata generally follow a standard format, making it easier to compare datasets and to transfer files electronically.”
- DAMA (Dictionary of Data Management) states, “Literally, “data about data”; data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes.”
An Example of Metadata
Metadata Management as Part of a Holistic Perspective
An all-encompassing approach is mandated to turn data into usable assets from which to improve decision making and augment competitive advantages. Metadata Management raises confidence in data, as a method to deliver a ‘Single version of truth’. Always start from a strong, holistic Enterprise Information Management base.
- Metadata Management establishes the foundations from which to gain actionable insight (via enterprise integrated analytics) by imbuing data with context (characteristics, context, meaning, legacy…) in a language commonly understood by the business (semantics).
- Metadata Management is a methodology that focuses on data policies, rules, procedures, and standards to govern the understanding of data and confirm data is ‘fit for use and purpose’
- Metadata Management is about data integration throughout the technology stack and landscape, as well as across the enterprise
- Applies whether data is held internally or sourced externally
- Becomes even more important when bringing external data into the fold (e.g., Big Data)
- Comprises a comprehensive solution set that incorporates people, process, and technologies
- Metadata shines best while elucidating context by adding relevant characteristics, traceability, and lineage especially as it pertains to sourcing, compliance, proprietary algorithms, and other such derivations requiring certified and proper formulation
- Metadata mandates governance, hence Metadata Management – Metadata Management is inextricably entwined with Data Governance
Metadata provides the connective tissue for the information architecture and facilitates the integration and leverage of data assets across departments, functions, and application
- Metadata Management comprises the processes, tools, solutions, and capabilities necessary to create, control, govern, integrate, access, and analyze Metadata
- The value of Metadata in data management activities is not an artifact of technology; it is an artifact of the proper business management of data
- Incremental Metadata capability development allows developing a better understanding of the value of data as the metadata about a data element expands in complexity and nuance
- Investment in Metadata is an investment in sophisticated technologies, infrastructure, and process management
- The complexities of Metadata Management and governance require close coordination across many disciplines and technologies, so much so that the role of Metadata has been transformed from being an afterthought into a fundamental approach to manage complex data requirements
Components of Metadata Management
- Metadata Tools:
- Capabilities of the Metadata Tools
- Customization of the Metadata tools
- Acquisition, Storage, Delivery Techniques
- Engaging Users (Consumers)
- Involvement of Users
- Feedback and suggestions
- Richness of Content
- Relevance of Content
- Metadata System Support
- Support Structure
- Support Processes
- Development processes
- Report Development Process Touch Points
- Integration
- Completeness
- Correctness
- Awareness and Usage
- Extent of Awareness
- Extent of Usage
- Training
- Adoption
- Sustainability
- Governance and Stewardship
- Ownership
- Stewardship
- Conflict Resolution
- Oversight
Management of Metadata
The core philosophy around Metadata Management centers on the alignment of data, connecting the end points. Key thoughts on the management of metadata:
- Bi-directional meta-repositories provide an active means to align business drivers, governance policies and procedures, orchestration, event management, organizational structures, and enterprise data management
- These traditional approaches now augment emerging tactics for gaining control over data management using maturing technologies originally developed to trace data lineage begetting Metadata management
- The bi-directional Metadata repository architecture provides the requisite workspace in which to organize enterprise integrated analytics into an intuitive, useful, and inviting discipline framework that combines policies, standards and data management, for the management and delivery of business intelligence
- Metadata Management tools have been in use for nearly three decades, but only recently have they been incorporated into the technology suites of many vendors
- These catalog scrappers and organizers prove worthy to answer regulatory compliance issues, and are now finding their way into maturing governance organizations as a means of identifying and verifying the proper alignment of data with traditional constructs such as Master Data, Data Governance, Data Quality, and Business Process Management for the concise and appropriate management of data as a corporate asset
A Cautionary Tale
The Zachman Framework™ is a meta-model useful for the instantiation and organization of metadata. Processes based on an ontological structure, such as the Zachman Framework, will be predictable and produce repeatable results. Conversely, processes without ontological structures are ad hoc, fixed and dependent on practitioner skills
Toward a Better Understanding of Metadata
Metadata is certainly “data about data”. To be more succinct, metadata encompasses a realm that includes:
- Define
- Meaning, context, requirement, and purpose to a data element depending on use, viewpoint, or perspective
- Business rules, taxonomies, classification schemes, and allowed values supporting consistent use and reuse across all domains
- Terminology defined for a specific purpose or use
- Using a business-oriented lexicon and native language exposes the most significant relationships and semantics based on business specific usage, delineating denotation versus connotation of the data element from which there arises and supports competing perspectives or viewpoints of the same underlying data
- Ensures understanding of the data elements for collaboration
- Provenance
- Provides lineage, derivation, traceability, auditability, and history.
- Records the transformations and participations that transpired during the lifecycle of a data element
- Meta-models of provenance connects and relates data across levels of abstraction as well as within each level for lineage and traceability
- Locate
- Allows a data element to be found through search, query, or direct access using specific coordinates, relationships, or inferences
- Data that cannot be found is unusable; Data that cannot be found in the context of the query or search is also unusable
- Govern
- Oversight of Metadata same as that for Data – inextricably entwined
- Development of Policies, Standards and Guidelines
- Business Rules
- Classification Schemes and Taxonomies
- Certifiably correct and consistent
- Guidelines for rich Metadata definitions
- Scoring integrity and veracity
- Allowable values
By now it should be easy to see that Metadata enables the transformation of data into information, actionable insight, and risk amelioration. Also important to consider, Metadata allows and promotes competing perspectives of the same underlying data without necessitating changes to that data. Fortunately, new Metadata Management capabilities allow for the holistic management and administration of Metadata.