The world of information technology has “grown up” dramatically in the last twenty-five years – the term of my comparably short career. From the days of punching cards and feeding deck readers at midnight at the university computer lab to the world of dot-coms, electronic business, business intelligence, big data, artificial intelligence … one might believe that they have seen it all.
But not even close … One can only imagine what the next twenty-five years have in store for us. Post-Y2k and for the foreseeable future, the need and speed to manage data, information, and knowledge will (if it has not already) become THE business driver.
“Managing data, information, and knowledge will be the business driver.”
That is a phrase worth repeating several times if you don’t already know it. A company’s ability to manage data, information, and knowledge will determine how successful a company can be — or whether they can be successful at all.
To manage data, information, and knowledge, companies need to know what data they have. Companies need to know precisely how their data is being used and how that data can be used to create competitive advantage. To know these things, a company needs to manage and use its Metadata.
Metadata is information, documented in IT tools, that improves both business and technical understanding of data and data-related processes. (1) This definition is significantly longer then the “data about data” definition that is overused by folks in our industry. When you break this definition into pieces it tells us what Metadata is, where it can be found, how it can be helpful, and who it will help.
Metadata will become increasingly important over the next several years. Metadata will no longer be the “Wednesday’s child of information processing systems,” (2) as stated by the father of data warehousing, Bill Inmon, in Data Management Review. Every company has Metadata. There is no question about that. Databases are built on Metadata. Data models are built on Metadata. Programs, screens, reports, queries, data movement … all the components of information systems are built using Metadata. This, on its own, should make it obvious that managing Metadata is important. But it doesn’t.
[From time-to-time, TDAN.com republishes articles that were popular when originally published and still appear relevant and important in today’s data-driven age. This article was originally published more than ten years ago and reappears, with minor modifications, in these pages due to the popularity of the topic.]
Metadata Questions
Questions are still raised about Metadata. What exactly is Metadata? How much will it cost to manage Metadata? How do I justify the “investment” in Metadata? Who uses Metadata? How does one get started managing Metadata? These are all very important questions in which the answers become a key determinant of whether a company will proceed with a Metadata management strategy and
implementation plan.
These questions are not always easy to answer – specifically in the case where the person asking the questions is somewhat separated from the daily building of the data and technical architectures to support the enterprise – namely the person most likely to flip the bill to pay for the effort. Experts have written volumes that answer these questions. These questions will not be addressed in
this article.
Instead of answering these questions, I choose to take a different approach. Instead of focusing on “answers” to Metadata questions, this article will focus on the “questions” that Metadata can answer.
Questions Categories
The “questions Metadata can answer” fall into ten categories. I selected these ten categories solely because it is a good, round number. There was really no reason other than this is a logical breakdown of Metadata that I have used before. If these categories do not suit your needs, organize your own according to the requirements of your organization. The ten categories I selected
include:
- Database Metadata
- Data Model Metadata
- Data Movement Metadata
- Business Rule Metadata
- Data Stewardship Metadata
- Application Component Metadata
- Data Access / Reporting Metadata
- Rationalization Metadata
- Data Quality Metadata
- Computer Operations Metadata
Reading the Questions
While you are reading the list of “questions Metadata can answer”, ask yourself three simple questions:
In your current environment:
- Can my company answer these questions?
- What is it costing my company to answer these questions?
- What is the results when we are not being able to answer these questions?
I think you will be surprised at how easy it is justify Metadata management if you can look at your answers to the three questions listed above regarding the Questions Metadata Can Answer.
Many of the questions fall under multiple categories. For example – during data movement, data flows from source to target. The action that is taken (value assigned) to the target may come from a map list (or conversion table) depending on the source or several sources. The action that is taken when source data is missing, or source values do not have an assigned target values (sometimes known as a missing rule) can be considered data movement Metadata or data quality Metadata. I list questions once thinking that you can draw the connection if necessary.
The questions should not be considered all encompassing. Rather consider the Metadata questions as a “starter kit” that can assist your company to understand that:
- The answers to these questions are important.
- The answers to these questions are NOT always available.
- The IT division will “perform” better if they have access to this information.
- “Cost savings” and “competitive advantage” are associated with managing data through Metadata.
Questions Metadata Can Answer
Database Metadata
Database Metadata describes the physical data. Database Metadata is typically stored in the database catalog or in copybook/segment definitions and is accessed by developers and db administrators using database or file-aid type tools.
- Does the data exist in a database (or a flat/sequential file)?
- What databases exist?
- What is the physical name of the database where the data is stored?
- Where is the data located? (i.e. platform (or dbms), server, etc.)
- What are the names of the tables in the database?
- What columns are on the tables?
- What is the primary key?
- What other indexes exist?
- How is this table related to other tables?
- Is this table part of any views?
- When was the database last updated?
- Who last updated the data?
- What flat and sequential files exist?
- What is the physical name of the dataset where my data is stored?
- Where is the data located? (i.e. mainframe, region, dataset name, etc.)
- How many generations of the data exist?
- Do the datasets exist on tape or on storage?
- What copybooks represent the data in the file?
- What programs use the copybook?
- What job streams execute the programs?
- How is the data processed, combined, and sorted?
Data Model Metadata
Data Model Metadata describes the logical design of the data and the mapping from the logical design to physical data. Data model Metadata can also include business rules, entity relationships, domain values, etc. Data model Metadata is typically found in data modeling and CASE tools although some may still track this information in diagram and spreadsheet tools.
- What data models exist?
- Where can the models be found?
- Is there an enterprise data model?
- Who created the models and for what purpose, project / database, etc. ?
- Who is responsible for keeping the models up to date?
- What business entities have been defined and what models do they exist on?
- Where are the business entities represented in databases-tables or systems-files?
- What are the definitions of the business entities?
- What attributes make up these entities?
- What is the business definition of the attributes?
- Do the attributes have restrictive domains?
- What are the allowable values for the attributes?
- What is the relationship between the logical data model and the physical data model?
- Is the physical data model in synch with the logical data model?
- Is the physical data model in synch with the physical database?
- What maps exist between entities and tables, attributes and columns, etc.?
Data Movement Metadata
Data movement Metadata describes the movement of data from source to target. Data movement Metadata includes information about the selection and extraction of data, mapping, transformation, and loading of data. Data movement Metadata can be found in ETL or data movement tools, spreadsheets, desktop databases, or in the logic of the code written to perform the data movement.
- Where did my data originate? What system or database did it come from?
- What field was used to populate this data or was the field derived?
- How was the data derived? Using calculation, conditionals, or both?
- In the derivation, what other data was used?
- Is the value of this data dependent on the values of other data? What data and how?
- Is the target data allowed to be null?
- What was done when data was missing?
- What action was taken when source data did not fall within quality guidelines?
- What action was taken when the source value was not assigned a mapped target value?
- What values can the target data take on?
- How do these values map to the previous values?
- When is the data moved?
- Has the data always “moved” this way or is there a history of changes over time?
- When did those changes take place?
Business Rule Metadata
Business Rule Metadata describes how the business operates using its data. Business Rule Metadata describes entity relationships, cardinality, domain rules, etc. that define the use of data. Business Rule Metadata typically exists in data modeling or CASE tools, or in other forms of documentation maintained outside of a tool, word processing, spreadsheet, or others.
- What is the relationship between two entities of data in the logical data model?
- What is the cardinality between those same entities?
- What are the conditions under which a piece of data can take on certain values?
- What values can a piece of data take on? What are the values meanings?
- How is data created, updated, and deleted?
- When are rules established? By whom?
Data Stewardship Metadata
Data Stewardship Metadata describes who in the organization is accountable for actions taken using data. Data Stewardship Metadata defines who in the organization defines the data, who in the organization creates, maintains, and eliminates data, and who consumes the data or directly uses the data or information in their jobs. Data Stewardship Metadata is not maintained by a lot of companies (yet!) but those that do manage this type of Metadata use desktop databases and spreadsheets.
- Who do you call if you have a question about the data?
- Who is responsible for defining, creating, reading, updating, and deleting the data?
- What accountabilities go along with the actions that individuals can take with the data?
- Who are the data “consumers” who use the data as part of their job?
- What information can be shared within the company? Outside the company?
- Who has to approve reports that are being distributed outside the company?
- Who is responsible for assigning acceptable values for the data?
- How does the stewardship program relate to the company information policy?
- What information exists in the information policy?
- Where can I find the information policy?
Application Component Metadata
Application Component Metadata describes all objects of an application from data files or tables, to programs, to scripts and jobs, to screens, and more. Application Component Metadata is a giant cross reference of all the components that make up a system and how the components are shared and re-used. Mainframe cross-reference tools and desktop tools with repositories often are the place where this information is stored.
- What application components are considered standard re-usable objects?
- How was this “re-usable object” determination made?
- How were these objects tested and who maintains these objects?
- What programs (& data & screens, etc.) are part of a system (or process or function)?
- What jobs (or procs or scripts) execute the programs?
- What data is used by the programs and jobs? How is the data used?
- How is the data passed from program to program, job to job, system to system, etc.?
- What system is the data dependent on? What system is dependent on the data?
- What programs and jobs are reused? Where are they reused?
- What changes have been made to the programs and jobs over time?
- Who wrote the programs and jobs?
- Who is responsible for supporting and maintaining the programs and jobs?
- What programs update the data?
- What reports display the data? What screens report the data?
Data Access / Reporting Metadata
Data Access and Reporting Metadata describes how to access the data and which reports have already been created that can be read or recreated. Data Access and Reporting Metadata may also describe the steps that must be taken to get authorization to read the data, the description of how the data can be interpreted, available tools, descriptions of reports, etc. Data Access and Reporting metadata typically is found within reporting tools and in traditional types of documentation (i.e. desktop databases, word processing and spreadsheets).
- What reports have been written that use the data?
- What is the description of a report?
- How do I access the reports?
- What steps should be taken to get authorization to use the data?
- How do the reports select, organize/sort, group, total and display the data?
- What data was used by my report?
- What reports use my data?
- When were the reports last updated?
- Do I have to execute the report myself or are the results already available?
- Where will I find the results?
Rationalization Metadata
Rationalization Metadata describes standard “corporate accepted” pieces of information and how those pieces of information are represented or mapped to data captured in the systems. The standard pieces of information can be a select list of data elements that have accepted meanings, histories, and values and/or the standard pieces of information can come from an enterprise data model. The Rationalization Metadata can describe the degree to which the data elements are the same piece of information and the differences. Rationalization Metadata is often stored in repositories and traditional types of documentation.
- What is the standard (core) elements that exist in the company?
- What are the business names and definitions of these elements?
- How were the standard elements chosen? By whom?
- Are the standard elements verified for re-use?
- Where do the standard elements map to existing data?
- How should the standard elements be used?
Data Quality Metadata
Data Quality Metadata describes the quality of the data. Data Quality Metadata describes the accuracy confidence level, the change management, the history of the data values and definitions and how changes over time affect how data can be understood. Data Quality describes what actions are taken when data is “bad”, missing, or a duplicate. Data quality Metadata is tracked using data quality tools, repositories, and traditional documentation types.
- How has the accepted values of the data changed over time?
- When did the accepted values change?
- How has the definition of the data changed over time?
- When did the definition of the data change?
- What constitutes “bad” data?
- What quality checks were performed against my data?
- What are the quality check procedures? Who wrote and executed them?
- Who analyzed the results?
- With what level of confidence can I trust my data?
- What is the accepted level of confidence before the data is considered “low quality” data?
Computer Operations Metadata
Computer Operations Metadata describes the activities of the data and scheduling center. Computer Operations Metadata describes data storage, tape usage, job operations, server operations, scheduling dependencies, abend procedures, backup and restore procedures, etc. Computer Operations Metadata can be found through scheduling systems, storage systems, operating and server systems, and others.
- What operations / jobs are scheduled to run against my data?
- What types of data backup and recovery are available?
- When was the last time my data was backed up, restored, verified?
- What is the process for backing up and restoring data?
- Who is responsible for backup and recovery?
- Who has security privileges to use my data? Create, Read, Update, or Delete?
- When is the best time to run a program/report against specific data?
- What operations are dependent on data from another process?
- What are the actions taken when job or system fails or abends?
- Who should be called when a job or system fails?
- What version of the software are we running?
- If licensed, how many licenses do we have, who is using them?
- When are the licenses scheduled to expire?
- When is the next release of the software due to be installed?
- What changes/enhancements are being made to the software with the new release?
- How much disk storage is available?
- How much disk storage is being used? At what rate is the data growing?
- Who allocates storage and should be contacted for questions about disk storage?
- How are the tape storage headers defined?