Some people talk about meta data all the time as though it holds the answer to ALL questions about managing data. I may even be considered as one of those people. If YOU are already getting tired of talking about meta data, you are probably in the wrong business.
Meta data is not really new. It was around long before you and I arrived in this industry. Meta data will be around long after we go to that great repository in the sky. But it was not until the past ten years, maybe much less, that the importance of managing meta data became apparent. This increase in importance has been a result of companies integrating the vast number of information systems and databases, creating decision support environments, and paying attention to managing data as a valued corporate asset.
Meta data (as data about data; or documentation about data) has now become an integral part of EVERYTHING having to do with information technology. If you do not believe me, consider the following questions.
1. Should meta data be a consideration in the evaluation of IT tools?
Yes, Meta data should always be a consideration when evaluating new and old IT tools. The data (or information) that is found in all IT tools is meta data. The meta data is, more often than not, used ONLY by the tool to perform its function (extraction, movement, security, modeling, reporting, …). Each function is very important but the meta data typically is never viewed or used by people who are not directly associated with the use of that tool.
The truth is that the meta data has the potential for value far beyond how it is used by the IT tool in which it originates. But since no person can view it, the meta data never provides additional value. Examples of under-used meta data include business rules in data models, allowable values in auditing and transformation tools, user ids and privileges in security packages, and on and on. Business rules are valuable information but often nobody sees them and they are not actively being used to control processes. Allowable values are often audited when data is moved to the warehouse, however, the lists of accepted values and their descriptions can not be viewed by the processes that are generating the data. User ids and privileges from security packages can be the basis of information stewardship if only that information, the meta data, was available. … And these are just three examples.
If one is to uncover the hidden value in meta data, it becomes important to pay closer attention to the meta data sources. Some IT tool vendors store meta data in proprietary formats while other stores meta data in open formats. When using tools that store meta data in a proprietary format, it becomes difficult (if not impossible) to use that meta data for any reason other than to operate the tool. Tools that provide open (and documented) meta data structures provide the opportunity to extract that meta data for use in a centralized meta data repository or data asset catalog. This leads to providing meta data to individuals that may benefit from its use.
Meta data, in all of its shapes and forms in the dozens of IT tools we all use, has increasingly become the knowledge bank of what companies are doing with their data. Therefore, it is better to evaluate and understand the meta data availability and capabilities of an IT tool prior to spending the money, than it is to find out too late that the meta data will serve one purpose only and stay hidden.
2. Is meta data a factor in achieving a high Return on Investment (ROI) in Data Warehousing?
Simply stated, if meta data is not provided to the knowledge workers, the chances of data warehouse / mart under-use or mis-use are significantly increased.
Most individuals want to know simple information about the warehouse data, such as, what types of data exist, how the data is named, defined, and referenced, and how the data can be selected. There are also end-users that want to know more complex information about warehouse data, such as, what business rules were used to select the data, how the data was mapped, transformed, cleansed, and moved to the data warehouse, and how the data is audited and balanced. (1)
Warehouse end-users are less likely to use the data warehouse if they do not understand the data. A client of mine said it succinctly when he stated that “if they are confused, they will not use it”.
In many organizations, time spent researching, selecting, extracting and verifying data, consumes more time than the time spent doing data analysis. “Our goal is to flip the percentages from 70% of the time preparing data and 30% of the time doing data analysis, to 30% and 70%” stated one vice president of a large financial institution. This type of statement is fairly common and is often used to cost justify data mart development. One way to “flip” the figures is typically through improved understanding of the warehouse data using meta data.
Warehouse mis-use is also a risk when meta data is not available. When several individuals spend their time researching the same data and come to different conclusions through different results, or results from one data source do not match results from another data source, the cost of data preparation is multiplied (as is the frustration of the decision makers). In this type of situation, the company is often required to select the “best” answer to the original question, as opposed to THE answer. More often than not, the reason for the discrepancy in the results comes from the lack of data understanding or inconsistencies created through less-than-adequate data management practices. Both of these can be improved through improved meta data management practices.
3. Is meta data a component of data quality initiatives and data quality improvement?
There are several reasons why companies initiate data quality efforts. The driving reason may be poor quality data discovered during the integration of several legacy systems into packaged solutions such as SAP, PeopleSoft, or Oracle Financials. Another reason may include the same discovery during the development of a decision support environment. One more reason may include known and documented faults in operational data that are causing business problems such as delayed and/or rejected transactions. These are all legitimate reasons for focused efforts on improving data quality.
Companies that initiate (or have on-going) data quality efforts spend a large amount of resources investigating data make-up and definition, documenting data accountabilities and responsibilities (stewardship), and mapping data across the corporation. This information can be found in meta data.
If the meta data that is used for the corporate data quality efforts is available, the company has a tremendous competitive advantage over similar efforts at companies that do not manage and make available meta data. If the meta data necessary for the data quality effort is not available, companies should consider taking advantage of the research and documentation created during the data quality initiatives by capturing and maintaining meta data in a centralized data asset catalog (repository) to support future IT & data quality programs.
4. How do you control or reduce data redundancy without using meta data?
It is very difficult to manage something if you know little or nothing about its existence. This observation holds true about data and the prevention of duplicate or redundant data. In many companies, the data administrator or the data modeler is the first line of defense when defining new data. Often, how well these individuals define and reuse data is a result of the information on hand about existing data.
As an example, the “best practices” approach to data modeling includes sharing entities and attributes from an enterprise data model across multiple project (or subject area) data models. In the absence of an enterprise model (or some form of reusable data model entities), the same data becomes defined repeatedly by different individuals in the organization. Data modeling by itself, on a project by project basis, does not provide the ability to share data unless there is access to the meta data (data about data models) that already exist.
If the intent is to share data across the enterprise, each application development area needs to know what data structures already exists before it can define the requirements for what does not yet exist. The data documentation that provides this ability to see what exists, and to share and re-use data, is meta data.
5. Is there a relationship between data modeling and meta data?
The information that is manually entered into the CASE (data modeling) tools is meta data (or data about the logical or physical components of the data). Therefore, if data modeling is a part of your IT processes and data modeling information is important to your organization, meta data is important as well.
When the data modeler creates an entity relation diagram (ERD), they define the way that data is represented logically and physically in your company. The modeler creates data entities and their attributes, the relationships between the data entities, the logical and physical names for the data, domains, and more … that represent how data is defined for that enterprise, project, or subject area. All of this information is meta data.
The first question in this article mentioned the use of meta data beyond the IT tools themselves. In this case, the modeling meta data remains in the CASE tool where no one other than the modelers can view it. The business names and definitions that originate as meta data in the CASE tool should be shared with knowledge workers and application developers that are interested in how the data of the organization is defined and related.
6. Is meta data helpful when forcing compliance to IT standards?
Forcing compliance to IT standards is typically an on-going battle. Some companies are successful and some companies are less than successful. Often the success or failure of following standards is a result of the corporation’s environment (use of packages, merging companies and IT functions, centralization of IT service functions, etc.). Other times, the success or failure is based on the company’s ability or willingness to force individuals to comply to the rules of IT development.
Most IT standards are based on meta data. Component naming, storage location, and component interaction, are a few examples of meta data that can play a significant role in IT standards.
Naming standards, for example, define specific ways in which components are named. Examples of standards for naming include embedded component types (identifying JCL, program, table, etc.), embedded owning applications or contexts (through prefixes/application coding), or the identification of the platform and tier on which the component is based (personal, departmental, organizational, etc.). Many common standards are created and controlled by meta data (ex. component A can not move to production because it does not have an appropriate application code in positions 2 and 3 of the name and it does not start with the letter “P” for production).
Meta data (stored in a repository) can be used to identify components that do not follow naming standards. Meta data can be used to identify how many versions of each component exist and where they exist. In an ideal environment, standards based on meta data could be built into the change management environment making it impossible to process components that do not follow standards. Since many IT standards are based on meta data, the ability to track and report on that meta data is very helpful when forcing compliance to IT standards.
7. Does meta data play a role in package implementation?
The first six answers to the first six questions made the case that meta data is an integral part of every tool and every component of most company’s information technology. Over the past twenty years or so, while companies were developing their IT architectures and data structures, companies inherently knew the data structures, development tools, and software modules and features (so to speak) like the back of their hands. With that in mind, companies that purchase packages have the dilemma of moving data that they understand well (hopefully) to a piece of vendor software that they had nothing to do with developing.
When implementing packages like SAP, PeopleSoft, D&B, … the amount of knowledge your company has about the vendor’s software and data (the meta data) is typically non-existent at the beginning of the implementation. Moving data to a vendor developed system without knowing the target data definition, the flow of the processes, the program names, the data field names, and on and on, is a recipe for disaster. This information is found in meta data. Some package vendors are better than others when it comes to supplying meta data in the form of data models, data definitions, and component documentation. Buyer beware.
A company that is implementing packaged software spends a lot of time researching, coordinating, and developing the movement of data into and out of the package data structures. It is common sense to believe that integrators need to know the meaning and structure of both the source and target data. In many organizations, since meta data for older systems was not managed, the information gained through the research of the legacy applications should be considered valuable meta data. It is good advise to take advantage of research that is performed for large package software integration or data warehousing implementation, since most of the documentation created during the development effort can prove to be very valuable to the package users and warehouse end users.
8. Does meta data play a role in your Year 2000 effort?
The Year 2000 problem (Y2K) is a complicated problem that was caused by and can be fixed using meta data. The Y2K problem is in existence because of meta data problems like: dates being stored in non-date data types, the elimination of the century part of the year to reduce the date to four positions saving data storage costs, and because fields that are dates can not be identified through element names.
Other complex Y2K meta data problems include business rules that were coded in programs that do not account for the use of “00” as the year, sorting problems, embedded dates in production data as a means of record keeping, and more.
Companies are spending millions of dollars having SOMEONE ELSE document how their data interacts between applications, locating date fields, and coordinating system changes with internal and external sources and targets of data. Most of the outsourced vendors and consultants ship source code and data definitions to a “factory” when it is scanned/parsed and automatically analyzed for occurrences of potential date related problems. The results of the parsing, and the data that is analyzed to solve Y2K problem, is meta data.
Companies that have managed meta data for years (and these number few and far between) have a step up on solving the Y2K problem. Companies that do not manage meta data are quickly finding that meta data holds the key to successful completion of the Year 2000 work. It is not unheard of for companies to purchased their meta data repository solely as a tool to prepare them (and their systems and data) for the new millenium.
These eight questions (and therefore the related meta data) most likely touch on the data and information architecture core of most companies.
Many companies have a difficult time justifying the cost in money, time and resources to manage meta data and implement meta data repositories. If the individuals who control the IT budget would evaluate the role that meta data plays in EVERY aspect of the IT environment, the benefits of such an investment would become more apparent and justifying the expense of meta data management would become much easier.
(1) Pre-Decision Support, Robert S. Seiner
The Data Administration Newsletter (TDAN) – www.tdan.com – 12/1/1997
Decision Support Star (DS*) – www.tgc.com – 2/3/1998