Published in TDAN.com January 2000
Built on Metadata
The world of information technology has “grown-up” dramatically in the last fifteen years – the term of my comparably short career. From the days of punching cards and feeding deck readers at
midnight at the university computer lab to the world of dot-coms, electronic business, and business intelligence, one might believe that they have seen it all.
But not even close … One can only imagine what the next fifteen years have in store for us. Post-Y2k and for the foreseeable future, the need and speed to manage data, information, and
knowledge will (if it has not already) become THE business driver.
That is a phrase worth repeating several times if you don’t already know it. A company’s ability to manage data, information, and knowledge will determine how successful a company can be; or
whether or not they can be successful at all.
To manage data, information, and knowledge, companies need to know what data they have. Companies need to know precisely how their data is being used and how that data
can be used to create competitive advantage. To know these things, a company needs to manage and use its Metadata.
Metadata is information, documented in IT tools, that improves both business and technical understanding, of data and data-related processes. (1) This definition is significantly longer then the
“data about data” definition that is overused by folks in our industry. When you break this definition into pieces it tells us what Metadata is, where it can be found, how it can be helpful, and
who it will help.
Metadata will become increasingly important over the next fifteen years. Metadata will no longer be the “Wednesday’s child of information processing systems” (2) as stated by the father of data
warehousing, Bill Inmon, in Data Management Review. Every company has Metadata. There is no question about that. Databases are built on Metadata. Data models are built on Metadata. Programs,
screens, reports, queries, data movement … all of the components of information systems are built using Metadata. This, on it’s own, should make it obvious that
managing Metadata is important. But it doesn’t.
Metadata Questions
Questions are still raised about Metadata. What exactly is Metadata? How much will it cost to manage Metadata? How do I justify the “investment” in Metadata? Who uses Metadata? How does one get
started managing Metadata? These are all very important questions with the answers becoming a key determinant of whether or not a company will proceed with a Metadata management strategy and
implementation plan.
These questions are not always easy to answer – specifically in the case that the person asking the questions is somewhat separated from the daily building of the data and technical architectures
to support the enterprise – namely the person most likely to flip the bill to pay for the effort. Experts have written volumes that answer these questions. These questions will not be addressed in
this article.
Instead of answering these questions, I choose to take a different approach. Instead of focusing on “answers” to Metadata questions, this article will focus on the “questions” that Metadata can
answer.
Questions Categories
The “questions Metadata can answer” fall into ten categories. I selected these ten categories because … it is a good round number. There was really no reason other than this is a logical
breakdown of Metadata that I have used before. If these categories do not suit your needs, organize your own according to the requirements of your organization . The ten categories I selected
include:
- Database Metadata
- Data Model Metadata
- Data Movement Metadata
- Business Rule Metadata
- Data Stewardship Metadata
- Application Component Metadata
- Data Access / Reporting Metadata
- Rationalization Metadata
- Data Quality Metadata
- Computer Operations Metadata
Reading The Questions
While you are reading the list of “questions Metadata can answer”, ask yourself three simple questions:
In your current environment:
- Can my company answer these questions?
- What is it costing my company to answer these questions?
- What is the results when we are not being able to answer these questions?
I think you will be surprised at how easy it is justify Metadata management if you can look at your answers to the three questions listed above regarding the Questions Metadata Can Answer.
Many of the questions fall under multiple categories. For example – during data movement, data flows from source to target. The action that is taken (value assigned) to the target may come from a
map list (or conversion table) depending on the source or several sources. The action that is taken when source data is missing or source values do not have an assigned target values (sometimes
known as a missing rule) can be considered data movement Metadata or data quality Metadata. I list questions once thinking that you can draw the connection if necessary.
The questions should not be considered all encompassing. Rather consider the Metadata questions as a “starter kit” that can assist your company to understand that:
- The answers to these questions are important.
- The answers to these questions are NOT always available.
- The IT division will “perform” better if they have access to this information.
- “Cost savings” and “competitive advantage” are associated with managing data through Metadata.
Questions Metadata Can Answer
- Database Metadata
administrators using database or file-aid type tools.
– What databases exist?
– What is the physical name of the database where the data is stored?
– Where is the data located? platform (or dbms), server, …
– What are the names of the tables in the database?
– What columns are on the tables?
– What is the primary key?
– What other indexes exist?
– How is this table related to other tables?
– Is this table part of any views?
– When was the database last updated?
– Who last updated the data?
– What flat and sequential files exist?
– What is the physical name of the dataset where my data is stored?
– Where is the data located? mainframe, region, dataset name, …
– How many generations of the data exist?
– Do the datasets exist on tape or on storage?
– What copybooks represent the data in the file?
– What programs use the copybook?
– What job streams execute the programs?
– How is the data processed, combined, sorted?
– much more …
- Data Model Metadata
domain values, … Data model Metadata is typically found in data modeling and CASE tools although some may still track this information in diagram and spreadsheet tools.
– Where can the models be found?
– Is there an enterprise data model?
– Who created the models and for what purpose, project / database, … ?
– Who is responsible for keeping the models up to date?
– What business entities have been defined and what models do they exist on?
– Where are the business entities represented in databases-tables, systems-files?
– What are the definitions of the business entities?
– What attributes make up these entities?
– What is the business definition of the attributes?
– Do the attributes have restrictive domains?
– What are the allowable values for the attributes?
– What is the relationship between the logical data model and the physical data model?
– Is the physical data model in synch with the logical data model?
– Is the physical data model in synch with the physical database?
– What maps exist between entities and tables, attributes and columns, …?
– much more …
- Data Movement Metadata
loading of data. Data movement Metadata can be found in ETL or data movement tools, spreadsheets, desktop databases, or in the logic of the code written to perform the data movement.
– What field was used to populate this data or was the field derived?
– How was the data derived? Using calculation, conditionals, both, …?
– In the derivation, what other data was used?
– Is the value of this data dependant on the values of other data? What data and how?
– Is the target data allowed to be null?
– What was done when data was missing?
– What action was taken when source data did not fall within quality guidelines?
– What action was taken when the source value was not assigned a mapped target value?
– What values can the target data take on?
– How do these values map to the previous values?
– When is the data moved?
– Has the data always “moved” this way or is there a history of changes over time?
– When did those changes take place?
– much more …
- Business Rule Metadata
data. Business Rule Metadata typically exists in data modeling or CASE tools, or in other forms of documentation maintained outside of a tool, word processing, spreadsheet, …
– What is the cardinality between those same entities?
– What are the conditions under which a piece of data can take on certain values?
– What values can a piece of data take on? What are the values meanings?
– How is data created, updated, deleted, …?
– When are rules established? By whom?
– much more …
- Data Stewardship Metadata
organization creates, maintains, and eliminates data, and who consumes the data or directly uses the data or information in their jobs. Data Stewardship Metadata is not maintained by a lot of
companies (yet!) but those that do manage this type of Metadata use desktop databases and spreadsheets.
– Who is responsible for defining, creating, reading, updating, and deleting the data?
– What accountabilities go along with the actions that individuals can take with the data?
– Who are the data “consumers” who use the data as part of their job?
– What information can be shared within the company? Outside the company?
– Who has to approve reports that are being distributed outside the company?
– Who is responsible for assigning acceptable values for the data?
– How does the stewardship program relate to the company information policy?
– What information exists in the information policy?
– Where can I find the information policy?
– much more …
- Application Component Metadata
Application Component Metadata describes all objects of an application from data files or tables, to programs, to scripts and jobs, to screens, … Application Component Metadata is a giant cross
reference of all of the components that make up a system and how the components are shared and re-used. Mainframe cross-reference tools and desktop tools with repositories often are the place where
this information is stored.
– How was this “re-usable object” determination made?
– How were these objects tested and who maintains these objects?
– What programs (& data & screens, … ) are part of a system (or process or function) ?
– What jobs (or procs or scripts) execute the programs?
– What data is used by the programs and jobs? How is the data used?
– How is the data passed from program to program, job to job, system to system, … ?
– What system is the data dependant on? What system is dependant on the data?
– What programs and jobs are reused? Where are they reused?
– What changes have been made to the programs and jobs over time?
– Who wrote the programs and jobs?
– Who is responsible for supporting and maintaining the programs and jobs?
– What programs update the data?
– What reports display the data? What screens report the data?
– What programs use
– much more …
- Data Access / Reporting Metadata
the steps that must be taken to get authorization to read the data, the description of how the data can be interpretted, available tools, descriptions of reports, … Data Access and Reporting
metadata typically is found within reporting tools and in traditional types of documentation (ie. desktop databases, word processing and spreadsheets).
– What is the description of a report?
– How do I access the reports?
– What steps should be taken to get authorization to use the data?
– How do the reports select, organize/sort, group, total and display the data?
– What data was used by my report?
– What reports use my data?
– When were the reports last updated?
– Do I have to execute the report myself or are the results already available?
– Where will I find the results?
– much more …
- Rationalization Metadata
pieces of information can be a select list of data elements that have accepted meanings, histories, and values and/or the standard pieces of information can come from an enterprise data model. The
Rationalization Metadata can describe the degree to which the data elements are the same piece of information and the differences. Rationalization Metadata is often stored in repositories and
traditional types of documentation.
– What are the business names and definitions of these elements?
– How were the standard elements chosen? By whom?
– Are the standard elements verified for re-use?
– Where do the standard elements map to existing data?
– How should the standard elements be used?
– much more …
- Data Quality Metadata
changes over time affect how data can be understood, … Data Quality describes what actions are taken when data is “bad”, missing, duplicate, … Data quality Metadata is tracked using data
quality tools, repositories, and traditional documentation types.
– When did the accepted values change?
– How has the definition of the data changed over time?
– When did the definition of the data change?
– What constititues “bad” data?
– What quality checks were performed against my data?
– What are the quality check procedures? Who wrote and executed them?
– Who analyzed the results?
– With what level of confidence can I trust my data?
– What is the accepted level of confidence before the data is considered “low quality” data?
– much more …
- Computer Operations Metadata
scheduling dependencies, abend procedures, backup and restore procedures, … Computer Operations Metadata can be found through scheduling systems, storage systems, operating and server systems,
…
– What types of data backup and recovery are available?
– When was the last time my data was backed up, restored, verified?
– What is the process for backing up and restoring data?
– Who is responsible for backup and recovery?
– Who has security privledges to use my data? Create, Read, Update, Delete, … ?
– When is the best time to run a program/report against specific data?
– What operations are dependant on data from another process?
– What are the actions taken when job or system fails or abends?
– Who should be called when a job or system fails?
– What version of the software are we running?
– If licensed, how many licenses do we have, who is using them?
– When are the licenses scheduled to expire?
– When is the next release of the software due to be installed?
– What changes/enhancements are being made to the software with th new release?
– How much disk storage is available?
– How much disk storage is being used? At what rate is the data growing?
– Who allocates storage and should be contacted for questions about disk storage?
– How are the tape storage headers defined?
– much more …
PLEASE feel free to send me additional Questions Metadata Can Answer and I will be glad to add them to the article. Almost every question that is asked during the system development life cycle or
by end-users of data can be answered by Metadata if the Metadata is captured and made available.
Before leaving the article, consider rescanning the “Reading the Questions” paragraphs one more time and keep in mind that Metadata is very important to running a successful IT “shop”. That
said, Metadata is essential to the business community when the business community becomes closer to and more dependant on the services of the IT “shop”.
“Questions Metadata can answer” is a new way to look at a constantly re-visited topic (Metadata) without getting bogged by political mumbo-jumbo. The questions offered a different way to look at
the importance of Metadata.