Published in TDAN.com July 2000
Most large organizations today have had some experience with data warehousing implementations. Today, these typically take the form of data mart style implementations in various departmental focus
areas such as financial analysis or customer focused systems assisting business units. Many organizations have multiple warehousing initiatives underway simultaneously and these systems will most
likely be based on products from multiple data warehousing vendors, in the typical decentralized approach of most corporations. This approach has worked to date in that it has allowed reasonably
rapid implementation of these systems and demonstrated to the organization the benefit and potential of data warehousing as a business tool at a fraction of the cost of the enterprise data
warehouse model.
However, this is the typical “ready, fire, aim” approach which got us to the legacy data Tower of Babel we have today, and in keeping with that, some areas of the business are beginning to show
signs of stress as a result of this approach to implementing data warehousing. Data and meta data are spread across multiple data warehousing systems, and system managers are wondering how best to
coordinate and manage the dispersed meta data mess they have today. How do we maintain consistency when business rules change as a result of corporate reorganizations, regulatory changes, or other
changes in business practices? What happens when an application wants to change the technical definition? How many places are impacted for each of these potential changes? These issues among others
are forcing businesses to take a larger view — an enterprise view — of meta data management systems. Coordinating meta data across multiple data warehouses is one significant step in
the right direction, and a repository is just the tool to do that.
A Repository as a Meta Data Integration Platform
Ideally, a corporation should adopt a repository as a meta data integration platform, making meta data available across the organization. This would serve to manage key meta data across all of the
data warehouse and data mart implementations within an organization. This would allow all of the participants to share common data structures, business rule definitions, and data definitions from
system to system across the enterprise.
The platform would accept and manage information from multiple sources. These would include systems from major vendor technology databases (e.g. IBM, Informix, Oracle, Microsoft, Sybase, etc.) and
across a broad spectrum of tools, from extraction tools to analysis tools. On the output side, the system should provide open access by multiple tools as well as API’s for custom needs.
The meta data repository also facilitates consistency and maintainability. It provides a common understanding across warehouse efforts promoting sharing and reuse. If a new data element definition
is required for a data mart implementation, the platform should permit versioning to support the need. With a shared meta data repository the exchange of key information between business decision
managers (facilitated by good solid end user access tools) becomes more feasible. And, when multiple data marts and data warehouses are involved, a central meta data platform will simplify and
reduce the effort required to maintain them when viewed as a whole.
Lifecycle Issues
Repository systems need to contribute to and integrate with the existing legacy system environment and play an active role throughout the lifecycle of data warehousing systems to be truly
considered enterprise meta data repositories.
Documenting database and legacy information are important capabilities in meta data repositories. Legacy models provide the information sourcing, data inventorying, and design that are key to
developing an effective data warehouse. The meta data surrounding the acquisition, access, and distribution of warehouse data is the key to providing the business user with a complete map of the
data warehouse.
The repository should play an active role in the entire life cycle of the data warehouse and all the output attributes of system and business value. This includes existing legacy system as sources,
third party tools, etc. This then leverages the repository’s role so it contributes in the development phases as well as the bulk cost of all IS systems (the downstream support and maintenance
costs). These would include systems management, database management, business intelligence, and application development tools and components listed below.
- Systems management tools that can be used to manage jobs, improve performance, and automate operations, not only in operational systems but also in data warehouse systems.
- Database management tools that can help create and maintain the database management systems for operational systems, data warehouses, and data marts.
- Data movement tools that transform and integrate disparate data types and move data reliably to the warehouse.
- Business intelligence tools that provide end-user access and analysis for making business decisions.
- Business applications that provide packaged warehouse solutions for specific markets.
- Data warehouse consulting that uses a methodology based on the experiences of hundreds of other companies, thereby reducing the risk associated with making uninformed business decisions.
- Application development solutions that help you build, test, deploy, and manage operational and warehouse applications throughout the enterprise.
- CASE tools support that provide consistency and maintainability immediately by developing consistent terminology and structures.
- Repository-to-CASE interfaces that enable an organization to manage multiple CASE workstations from the repository. These tools are designed to allow an organization to better utilize the data
maintained in their CASE workstations by providing a central point of control and storage. - Sophisticated version control, collision management, and bi-directional interfaces, enabling the sharing and reuse of meta data among programmers and analysts working independently.
What Needs to Be In an Enterprise Repository to Make the Warehouse Work Better
Some areas to focus on in reviewing repository functionality are discussed in the following sections …
Nonproprietary Relational Database Management System
A repository should ideally use an industry standard DBMS which provides significant advantages over vendor-developed DBMSs. These advantages include advanced tools and utilities for database
management (such as backups and performance tuning) as well as dramatically enhanced reporting capabilities. Furthermore, maintainablity and accessibility are enhanced by an “open” system.
Using a standard database also allows the repository vendor to focus on the quality of the repository, not the features of the database management system. In addition, it allows the vendor to take
advantage of new features made available by the DBMS vendor.
Fully Extensible Meta Model
A repository should be a complete self-defining, extensible repository based on a common entity/relationship diagram. By using a model that reflects industry standards, it can provide users with
the ability to easily customize the meta model to meet their specific needs. The repository should support the following meta model extensions:
- adding or modifying an entity type,
- adding or modifying a linkage between entity types (associations or relationships),
- adding user views (with different screen layouts or validations) to entities or relationships,
- adding, deleting, or modifying attributes of relationships or entities,
- modifying the list of allowable values for an attribute type,
- adding or modifying commands or user exits,
- adding custom command macros, and
- adding or modifying help and informational messages.
The vendor should also support the Microsoft Open Information Model, which will allow information to be shared across multiple vendor products. Ideally, the vendor will be part of the Open
Information Model design team.
Application Programming Interface (API) Access
An API access to the repository can provide an organization with the flexibility needed to create a meta data management system which suits their unique needs. Architecture can make the repository
powerful by allowing users to create custom applications and programs.
In addition, the separation of meta data from the tools that access and manipulate it by the API is a flexible feature. The tools can manipulate meta data through the API, thereby allowing
transparent access to the data. If the data structures change, the tools do not need to be changed. This allows for greater efficiency and flexibility in an organization’s application development.
Central Point of Meta Data Control
The repository serves as a central point of control for data, providing a single place of record about information assets across the enterprise. It documents where the data is located, who created
and maintains the data, what application processes it drives, what relationship it has with other data, and how it should be translated and transformed. This provides users with the ability to
locate and utilize data that was previously inaccessible. Furthermore, a central location for the control of meta data ensures consistency and accuracy of information, providing users with
repeatable, reliable results and organizations with a competitive advantage.
Impact Analysis Capability
If the repository has an impact analysis facility it can provide virtually unlimited navigation of the repository definitions to provide the total impact of any change. Users easily determine where
any entity is used or what it relates to by using impact analysis views.
An impact analysis facility answers the true questions in the analysis phases without forcing a user to sift through large quantities of unfocused information. Furthermore, sophisticated impact
analysis capabilities allow better time estimates for system maintenance tasks. They also reduce the amount of rework resulting from faulty impact analysis (e.g., a program not being changed as a
result of a change to a table that it queries).
Naming Standards Flexibility
A repository should provide a detailed map of data definitions and elements, thereby allowing an organization to evaluate redundant definitions and elements and decide which ones should be
eliminated, translated, or converted. By enforcing naming standards, the repository assists in reducing data redundancies and increasing data sharing, making the application development process
more efficient and therefore less costly. In addition, an easily enforceable standard encourages organizations to define and use consistent data definitions, thereby increasing the reuse of
standard definitions across disparate tools.
Versioning Capabilities
In repository discussions, “versioning” can have many different definitions. For example some version control capabilities are:
- version control as in test vs. production (lifecycle phasing);
- versions as unique occurrences;
- versioning by department or business unit; and
- version by aggregate or workstation ID.
The repository’s versioning capabilities facilitate the application lifecycle development process by allowing developers to work with the same object concurrently. Developers should be able to
modify or change objects to meet their requirements without affecting other developers.
Robust Query and Reporting
The repository should provide business users with a vehicle for robust query and report generation. The end user tool should seamlessly pass queries to its own tool or third party products for
automatic query generation and execution. Furthermore, business users should be able to create detailed reports from these tools, increasing the amount of valuable decision support information they
are able to receive from the repository.
Data Warehousing Support
The repository provides information about the location and nature of operational data which is critical in the construction of a data warehouse. It acts as a guide to the warehouse data, storing
information necessary to define the migration environment, mappings of sources to targets, translation requirements, business rules, and selection criteria to build the warehouse.
Conclusion
Organizations are becoming increasingly aware of the limitations of their own systems and internal data. The attempts to liberate and leverage data across the organization’s stovepipes have been
replete with frustration and too many examples of failure. These experiences, coupled with drivers demanding flexibility in business processes, are hastening the day that businesses will implement
an enterprise level view of meta data. Activity to supply this enterprise level capability is being aggressively pursued by all major vendors. It is critical that corporations understand the issues
at hand as they adopt enterprise strategies and that they be in a position to evaluate what set of vendor products are appropriate to their situation. Business Information Demand — An
organization’s continuously increasing, constantly changing need for current, accurate information, often on short notice, to support its business activities.