The Metadata Repository offers a place to put stuff. It is inherently complex and requires careful management and administration.
The Metadata Repository
The Metadata Repository provides a single—albeit often logical—repository for gathering, integrating, storing, sharing, and visualizing metadata and its incumbent capabilities and structures. The repository within which metadata resides provides key functions:
- Identify and sanction origins of business data and metadata
- Hold the metadata in a form that can be analyzed, manipulated, and promoted
- Ensure traceability and lineage of business definitions and derivations from their origin to the point of consumption or obsolescence
- Manage and administer the Metadata Repository tool(s) as a common structure across entities and domains, ensuring data integrity and consistency
- Define and implement standard operating processes and precepts, including those for Data Governance
- Offer unencumbered availability and easy access for business users
- Build branding, communication, and adoption of Metadata and the Metadata Repository to encourage usage
- Establish accountability
A key component of Metadata Management solutions involves the rigorous identification and description of Metadata Categories, Types, Classes, and Sub-classes – a taxonomy, if you will, for the organizational structure and easy retrieval of Metadata:
- Data Management
- Automation
- Orchestration
- Interoperability
- Heterogeneity
- Security
- Data Quality
- Certification
- Verification
- Reproduction
- Repeatable
- Data Discovery
- Searchable
- Locatable
- Available
- Obtainable
- Data Consumption
- Interoperable
- Analytic
- Visualization
- Report
The Metadata Catalog
The Metadata Catalog (Repository) provides a structure within which to effectively manage and organize Metadata for easy retrieval and use. The Metadata Catalog organizes resources being allocated to various Metadata Management View, Abstraction, Implementation, and Infrastructure areas to identify elements, dependencies, priorities, efficiencies, and opportunities to leverage and reuse resources.
The Metadata Catalog:
- Illustrates the state of Metadata structures across the enterprise, including how various systematic events interrelate
- Organizes projects, programs, and initiatives to synchronize efforts and identify essential Metadata Management capability elements to address
- Serves as a common frame of reference to communicate Information Technology requirements and limitations between the Information Technology providers and the operational data consumers
- Holds the requisite characteristics of Metadata organized by use or function to enable clearly articulated governance, responsibilities, relationships, meanings, orchestrations, and other constructs that augment the veracity of data used for decision making and allow for the garnering of insight from Advanced Analytics
The Metadata Repository Services
Proper construction of the Metadata Repository enables a variety of very important services. Most importantly, the structural and organization components of the Metadata Repository enable representation of Metadata in meta-models that promote many business-critical services:
- Create, load, change, manage, and navigate Metadata models
- Aggregate and stitch Metadata models from disparate sources
- Run queries and reports on repository content
- Transform Metadata models into standard meta-models
- Export Metadata models into different formats
- Collaboration facilitation
- Logical stores for partitioning Metadata models
- Metadata model versioning and comparison
- Access control for security and role focus simplicity
- Web application for universal access
The Metadata Taxonomy holds the evolving set of Metadata Catalog entries. The Metadata Taxonomy design drives the ability to create “views” for the Metadata Consumption Services capabilities and tools.
Metadata Domains
A variety of Metadata Domains allow for the categorization of Metadata elements into meaningful subjects for effective management.
Domain | Example |
KPIs & Metrics | Business Rules (e.g., Privacy) |
Security | Access Rights & Permission, Authorization, Encryption, Mask |
Audit | Version |
Data Model | Logical, Physical, Schema, Attribute |
Governance | Policy |
Lineage | Source, Target, Intermediary, Mappings |
Traceability | Derivations, Manipulation, History, provenance |
Taxonomy | Codification, Patterns, Classification Schemes, Category |
Orchestration | People, Process, Technology |
Ontology | Dictionary, Lexicon, Glossary |
Semantics | Business perspectives and views |
Technological | Infrastructure, Operating System, DBMS, table, tablespace, server, node, views |
Business | Glossary, Rules, Functions, Unit, Process |
Location | Geography, Region, Territory |
Application | Component, Package |
Organization of the Metadata Repository
The Metadata Repository holds entries into domain-specific subsets of metadata that make visible business-specific metadata along with its associated glossary, processing, and supporting capabilities in a holistic, self-contained environment within which to operate. Key components of the Metadata Repository include:
- Glossary, Dictionary, Lexicon, Ontology, Language
- Taxonomy, Classification schemes, Domains
- Views (support competing perspectives)
The prototypical Meta-Model for Metadata Management relies on a 5-tier model:
- Business Viewpoint
- Application Viewpoint
- Component Viewpoint
- Technology Viewpoint
- Deployment Viewpoint
The Meta-Model must address 2 Primary Concerns:
- Behavior
- Information
In addition, the Meta-Model may address numerous Secondary Concerns
- Roles
- Security
- Governance
- Analytics
- Collaboration
- Compliance
Enterprise Meta-Model (Metadata Framework)
The prototypical Enterprise-Meta Model encompasses multiple layers of abstraction, process, roles, and management. The Meta-Model may be extended using periods of time to enable historical change management.
The Metadata Repository Model
To take a step further, a prototypical Meta-Model of the Metadata Repository emerges. The Metadata Repository Model encompasses various services, capabilities, and orchestrations that enable executing Metadata Management for each business perspective, even those competing for data services.
- The Metadata Repository holds a variety of types of metadata:
- Operational metadata supporting reporting requirements
- Federated integration aspects to facilitate data virtualization or reporting capabilities across multiple locations and domains
- Runtime metadata completes historical and lineage requests
- Front-end applications provide the Metadata Consumption Services, such as the catalog and glossary, for easy consistent access across the Metadata Repository
- For example, business-specific front-end tools pull a very focused set of data out of the Metadata Repository to help folks do their work
Metadata Repository Capabilities
To delve deeper into the capabilities of the Metadata Repository consider:
Sources of Metadata
A robust Metadata Management capability discovers and collects metadata throughout the technology stack.
- Dashboards & Reporting
- Business Objects
- Cognos
- Microsoft Analysis and Reporting Services
- Microstrategy
- SAP Business Warehouse
- …
- Data Integration
- ELT
- ETL (Informatica Data integration)
- Data Quality (e.g., Informatica PowerCenter, Cloudera)
- …
- Data Models
- CA ERwin
- Embarcadero ERStudio
- …
- DBMS
- IBM DB2 (Linix, Unix, Windows)
- IBM DB2 zOS
- JDBC
- Microsoft SQLServer
- Netezza
- Oracle
- Sybase ASE
- Teradata
- …
- System Services
- Platforms
- Cloudera
- Hadoop
- Runtime
- …
- KPIs & Metrics
- Validity
- Patterns
- Quality Scores
- Reference Data
- Taxonomies
- Codification
- Classification Schemes
- Hierarchies
- …
- Business
- Ontology
- Glossary (Dictionary, Lexicon)
- Semantics
- Rules
- …
Expected Outcomes – Taxonomy
Establishing the initial groundwork for a Metadata Management program incorporates people, process, and enabling technology components into a holistic strategy.
- Dashboards & Reporting
- Reporting Standards
- ETL Standards for Reporting
- Analytics Standards
- Data Warehouse Metadata
- Self-service
- …
- Data Integration
- Sourcing
- Quality
- Transforms
- Integration
- Conformance
- Business Rules
- …
- Data Models
- Conceptual
- Logical
- Physical
- Dimensional
- …
- DBMS
- Catalogs
- Performance Measures
- System Services
- Platforms
- Runtime Metrics
- …
- KPIs & Metrics
- Metric to KPI causal effects
- Validity
- Patterns
- Meaningfulness
- …
- Reference Data
- Taxonomies, Classification Schemes, Hierarchies that enable competing perspectives
- Underlying data element remains immutable and sacrosanct
- …
- Business
- Lexicons and Semantics that enable competing perspectives
- Data Valuation
- …
Vendor Landscape
As vendors flood the Metadata arena, so too does the confusion surrounding functions, capabilities, and collaborations – “fit for purpose.”
- Established Vendors
- Informatica
- IBM
- ASG Rochade
- Oracle
- SAP – Business Objects
- Talend
- SAS/Data Flux
- Trillium
- …
- Gartner’s Cool List of Vendors
- Active Navigation
- Masai Technologies
- Saffron Technology, Inc.
- Megaputer
- Palantir Technologies
- Adaptive
- Geotix
- …
- Emerging Capabilities
- Cloudera
- Hadoop
- …
Unfortunately, there are few vendors that provide holistic services, such as illustrated herein, in a single tool, suite, or offering. However, this provides a start from which to begin building your Metadata Repository and its services and capabilities.