Current meta data management products tend to be mainframe-oriented and rely on the use of a centralized meta data repository. Although this approach has been somewhat successful in the online
transaction processing (OLTP) systems world, it has met with limited success in the data warehouse environment because it cannot deal with the current state of the market and the technology.
While enterprise repository efforts require a centralized approach, most data warehouse environments today are distributed in nature. They consist of mixed architectures including enterprise data
warehouses, data warehouses, operational data stores (ODS), hybrids, and independent and federated data marts. Furthermore, most data warehouses are heterogeneous, using multiple vendors’
databases; extraction, transformation, and loading (ETL) tools; and business intelligence tools (OLAP [On-Line Analytical Processing], query, and reporting) — all distributed among various nodes
of the data warehouse environment.
In other words, the typical data warehouse consists of many different products, each generating its own meta data, which is then distributed around the various nodes of the data warehouse. Unless
you have implemented some sort of centralized or enterprise repository, this meta data has remained, for the most part, stranded at its own node and disconnected or unavailable to other systems and
Building a centralized or enterprise repository is a massive undertaking that requires a considerable organization-wide commitment to implementing standards for data warehouse tools and development
techniques. Some refer to it as an “all or nothing” approach to meta data management.
Enterprise repository efforts are difficult to implement, mainly due to scalability and integration issues, which are attributed to the lack of any real standards among data warehouse tools for
exchanging and sharing meta data. As a result, there are now several competing meta data standards/repository efforts under way to develop a common “industry standard” for the exchange of meta
data between development tools and business intelligence tools.
Most visible in this area is the Microsoft Repository effort (in conjunction with Platinum Technology). Microsoft’s main competition in this endeavor is the Object Management Group’s (OMG)
repository effort spearheaded by Oracle, IBM, and Unisys. The establishment of both of these standards will go a long way toward easing the difficulty of exchanging meta data in heterogeneous
warehouse environments. However, the industry will still find itself needing to support several meta data management standards.
There are a number of reasons why the centralized repository approach has met with only limited success in the data warehouse environment:
- Centralized repositories use proprietary formats and provide limited synchronization capabilities.
- They do not provide all the meta data needed by administrators and end users in the formats they want.
- They are expensive to build and maintain.
- They are not easily scalable, requiring an “all or none” implementation.
- They are centralized, whereas data warehouse environments are distributed.
In addition, most meta data management products have tended to focus on managing technical, not business, meta data and do not provide any real link between the two forms (which is important
because it enables users to drill back to the detailed meta data maintained in the repository). One of the key challenges to managing meta data is the ability to provide local autonomy for meta
data as close as possible to the end user and the ability to share meta data throughout the distributed data warehouse and DSS (decision support systems) environment.
Centralized meta data management environments have difficulty meeting these conflicting requirements because they involve extracting, transforming, and loading meta data from different distributed
warehouse nodes into a centralized repository. Although administrators and IT staff are afforded a “global view” of the organization’s meta data, this meta data is far removed from the end
users. In addition, once you get the meta data into the central repository, you don’t know what’s going to happen with it, who owns it, who will change it, or where it will go.
What is needed is meta data management architecture that will support existing distributed data warehouse architectures. In this manner, ownership and local autonomy of meta data can be implemented
naturally, by location, without sacrificing the ability to share the meta data that is needed throughout the organization. This environment also needs to provide the ability to exchange meta data
(created by different products) throughout the data warehouse environment, so that local users will always see and have available the meta data in their local format, regardless of its origin.
Also, there should be some way to synchronize meta data throughout the environment. This synchronization capability should be automated using timed scheduling so that users remain confident that
they are seeing the most recently updated version of meta data, and will therefore believe they can rely on the information they get out of their data warehouse.
Finally, users should be able to implement the meta data management system in an incremental fashion to avoid having to commit to an “all or nothing” meta data management approach.
This is an excerpt from a larger article which appeared in the November 1998 issue of Data Management Strategies covering Pine Cone Software’s new “Meta Exchange” data warehouse meta data