Introduction
Meta data exists in every function of Information Technology (IT). Information about data and data-related processes exists in Application Development/Package Implementation, Information Resource Management, Data Center Operations and Network Services to name just a few IT functions. The volume of meta data that is generated by a company makes the task of managing this meta data seem like a monstrous, if not impossible, feat.
One of the reasons why meta data efforts struggle to get off the ground is that meta data is considered a technical resource. Dedicated IRM management and staffers find themselves expending a large amount of energy convincing executive management and business management that meta data is vital to the well being of the company. It is rare to find a company that funds its meta data initiatives through money allocated by business management. Meta data often starts and ends with funds provided under the IT budget, specifically that of the Information Resource Management (IRM) division.
To be successful, IRM management must first properly size and understand the effort that will be required to manage meta data. As the first step towards sizing the effort, a company must select a finite amount of meta data to be managed.
Since IRM budget is used to fund the effort, it is logical that meta data that originates and is controlled by IRM should be the focus of the effort. IRM can be separated into sub-functions such as data administration, database administration, data movement, and business intelligence to make it easier to select the right meta data to manage. This article will break IRM meta data into these four categories of meta data as a starting point for selecting the “right” meta data to manage.
The Categories of IRM Meta data to be discussed in this article include:
- Data Administration Meta data
- Database Meta data
- Data Movement Meta data
- Business Intelligence Meta data
- Summary
Once the categories of meta data are identified, companies must define specific meta data in that category that will be managed, how this meta data will be used, and who will benefit from having this meta data. Meta data advocates must be able to answer the following questions in regard to the categories of meta data that are selected:
- What meta data types makes up this category?
- What questions will this meta data answer?
- Who will benefit from the availability of this meta data?
This article defines four categories of IRM meta data as Data Administration meta data, Database meta data, Data Movement meta data and Business Intelligence meta data and answers these three questions in regards to each of the categories.
Data Administration Meta data
In order to properly use the data in a data warehouse, data users must understand the data and have access to quality business definitions of the data. The management of data definition is a primary responsibility of the data administration or data management unit of a company. Data Administration facilitates the identification and capture of data definitions, however, it is not the responsibility of data administration to create the business definitions. That responsibility falls on the shoulders of the business.
There are several reasons why companies find it difficult to manage the business definition of data. In companies with decentralized MIS or in companies with poor data administration practices, it is “normal” to define data as it pertains to a particular business function and not the enterprise as a whole. This results in the creation of multiple (and often different) definitions of the same data for a single company.
Mergers and acquisitions result in multiple companies with multiple definitions of data joining together as a single enterprise therefore compounding the problem even further. Many large companies have lived through decentralized IT, poorly managed data administration and merged business units.
An inability to manage data definition can result in an extremely large amount of time being spent performing unproductive work. It is not out of the ordinary for data analyst to spend eighty-percent of their time identifying and researching data leaving only the other twenty-percent of their time to performing analysis. The amount of unproductive work alone can be used as a means of justifying the need for managed meta data. Meta data strategies should focus a large percentage of the effort on reversing those numbers such that the data users spend more time doing their jobs and less time trying to locate and understand the data.
There are typically two places in the organization where the business definition of data exists. The first place is in people’s heads. Unwritten rules for defining, requesting, entering, processing and making decisions from data exist at every point in the company where data is touched. Companies that survive on unwritten rules and definitions of data make themselves vulnerable for low quality data and data misuse as a result of the lack of consistency and the lack of confidence in the data.
The second place that business definitions of data exist is in data models. Data modeling tools such as ER/Win and Cayenne do an acceptable job of collecting logical business information about data including (but not limited to) subject areas, data entities, attributes, domain values and business rules. Data modeling tools are also used to define the physical representation of data and to generate data definition language (see database meta data) for database administration.
Companies that are inconsistent in how they manage their data models or companies that use data modeling tools only on occasion can suffer from the same problems as those with decentralized IT, poorly managed data administration and merged business units. Multiple data modeling tools can also propagate inconsistent data definition if the information captured in the tools is not shared between data modelers or delivered to the data users.
The lists that follow include samples of data administration meta data types, questions that data administration meta data can answer, and the individuals in the company that will benefit from the availability of data administration meta data.
Data Administration Meta data Types:
- Data Model Names and Descriptions
- Entity Names and Descriptions
- Attribute Names and Descriptions
- Domain Names and Descriptions
- Value Tables and Allowable Values
- Keys Attributes
- Business Rules Relating Entities
- Anomaly Information / Confidence Factors / Missing Rules
- Data Rationalization and Aliases / Data Mapping
- Data Modeling Standards / Policies / Procedures / Restrictions / Landmines
- Business Policies Affecting Data Capture / Data Reporting
- Mapping Between Logical Data Models and Physical Databases
- Changes to Policies and Data Models Over Time – Versioning
- Translation of Business Names to Physical Names and Vice-Versa
- Glossary / Token / Abbreviation Information
- Accountability / Stewardship Information
Questions That Data Administration Meta data Can Answer:
- What business entities exist in the company and how are they defined?
- What facts (attributes) are kept about those business entities?
- Does a standard definition exist for this piece of data? What is that definition?
- What values can those attributes take on? How can those values be interpreted?
- What other information is available that is related to this business entity?
- How have business changes impacted the data’s definition?
- What affect have these changes had on how the data can be interpreted?
- How does definition of this data differ from business area to business area?
- Who is responsible for the quality of the data and data definition?
- Who do I contact if I have a question about the business definition and use of data?
Who Will Benefit From Data Administration Meta data:
- Application Development / Package Implementation Management and Staff
- Business Analysis Management and Staff
- Business Management and Staff
- Data Administration Management and Staff
- Data Analysis Management and Staff
- Data Architects
- Data Research Management and Staff
- Data Warehouse Architects
- Database Administration Management and Staff
Database Meta data
Database management systems such as a DB2, Oracle, Sybase, SQL Server and Informix provide the DBMS catalog as the meta data resource for the database environment. All of the information found in the catalog is meta data. DBAs often have direct access to the physical database meta data through database management tools. It is for this reason that database administrators do not complain about having a lack of access to meta data. DBAs are often the first and largest users of meta data in the company.
Database meta data includes information about the physical databases, tables, views of tables, columns, indexes, partitions, packages, plans, … some of which can be very useful to a data user. Information about the table structures, columns, and indexes is absolutely required by anyone who must query information from the DBMS. This information is always available to the database administrator through the catalog but it is not standard operating procedure to allow general access to the database catalog. DBAs are potentially the biggest meta data users in the company. Life would not be pretty if the DBAs access to the catalog was cut off for one hour let alone one day.
One difference between data administration and database meta data is that the DBMS can not operate without the database meta data that is captured into the DBMS catalog. This meta data is typically fed to the catalog through the execution of data definition language (DDL). DDL can be is generated manually or as output from a data modeling tool.
When questioned in detail, DBAs are quick to admit that they do not have all of the meta data that they need to do their jobs. Increased complexities in database management make it necessary for DBAs to look for network and server information, security information, and data ownership information that is not included in the database catalogs.
Legacy applications often use copybooks or record layouts as their means of defining data in flat files, VSAM files, and IMS segments. Copybooks do not contain the same level of detail about the physical data as the DBMS catalog. However, copybooks contain the significant amounts of information about the data in the files (copybook names, record names, element names, PIC clauses, 88-values, …). It is important to consider copybooks as a primary source of database meta data.
The lists that follow include samples of database meta data types, questions that database meta data can answer, and the individuals in the company that will benefit from the availability of database meta data.
Database Meta data Types:
- Database Names
- Table / View Names
- Copybook Names
- Column Names
- Indexes
- Network / Server Information and Addresses
- Schema Names / Dimensions / Facts
- Connectivity Information
- Authority / Privileges Information
- Balancing Row Information
- Row Counts / Growth Information
- Data Usage / Activity / Timing
- Data Access Performance / Timing
- Data Refresh Schedules / Completion Information
- Accountability / Stewardship Information
Questions That Database Meta data Can Answer:
- What databases exist and what are the tables that make up each database?
- What columns are stored on each of the tables?
- What copybooks exist and what elements make up the copybooks?
- What are the physical characteristics of the data?
- How is the data indexed?
- What data is being used? By whom? When is the data “busy”?
- How will I know if I receive the information that I request?
- Are the business descriptions in synch with the databases?
- When was the last time this data was updated? Did that process end successfully?
- Who are business stewards responsible for the databases?
- Who do I contact if I have a database problem or question?
Who Will Benefit From Database Meta data:
- Application Development / Package Implementation Management and Staff
- Business Analysis Management and Staff
- Business Management and Staff
- Data Administration Management and Staff
- Data Analysis Management and Staff
- Data Architects
- Data Research Management and Staff
- Data Warehouse Architects
- Database Administration Management and Staff
Data Movement Meta data
Data does not stand still. Data moves from process-to-process, function-to-function, database-to-database, and business enterprise to business enterprise. Data is, perhaps, the most dynamic asset of the company. As data is created, processed, and moved, the mapping of data from one place to another and the activity that takes place during these processes (value changes, calculations/derivations, …) determine exactly how data should be interpreted.
Data movement meta data has always been captured in programs that manipulate and process data. This programming code that drives the data movement and transformation processes is not typically made available to data users. Even when program code is provided to the data user, the code is time consuming to decipher and difficult to understand. As a result, data movement activities are not generally understood by data users.
Data movement information is recognized as a key ingredient in the understanding of data in decision support databases. Data users require information about where the data came from and the values that the data take on when they are asked to move away from their “comfortable” sources of data to a new data resource called the data warehouse or data mart.
As a direct result of the growth of the data warehousing industry, software products have been created to make it possible for information about data movement to be entered (without being coded) into a tool. These tools allow the person responsible for the data movement to enter instructions that define the extraction, transformation and loading of data. Tools such as Prism, ETI, Carleton are an entry point of data movement meta data.
The lists that follow include samples of data movement meta data types, questions that data movement meta data can answer, and the individuals in the company that will benefit from the availability of data movement meta data.
Data Movement Meta data Types:
- Source Element / Table / Database Names
- Target Element / Table / Database Names
- Source to Target Mappings
- Transformation Logic and Types
- Transformation Versioning
- Mapping Values and Types
- Calculations / Derivation Rules
- Movement Timing / Completion Information
- Staging Information
- Verification / Confidence Information
- Accountability / Stewardship Information
Questions That Data Movement Meta data Can Answer:
- Where does this data come from?
- What source systems supply data to this reporting data source?
- What data is supplied by each source system?
- How is the data extracted? What selection criteria are used?
- How is this data value determined?
- What changes occur in the movement of data from source to target?
- Where does this data feed other data?
- Has the data value always been determined this way?
- Who is responsible for the data creation?
- When was the data moved?
- Did the data extraction / transformation / load complete acceptably?
- What actions are taken with data exceptions?
- Who do I contact if I have a problem or question about how data was moved, transformed?
Who Will Benefit From Data Movement Meta data:
- Application Development / Package Implementation Management and Staff
- Business Analysis Management and Staff
- Business Management and Staff
- Data Administration Management and Staff
- Data Analysis Management and Staff
- Data Architects
- Data Research Management and Staff
- Data Warehouse Architects
- Database Administration Management and Staff
Business Intelligence Meta data
The last category of IRM meta data to be discussed in this article is that of business intelligence meta data. For the purpose of this article, business intelligence meta data will include information that enables data users to turn data to knowledge. Case can be made that the three prior categories also fall under the business intelligence category using this definition. Business intelligence meta data goes beyond the data definition, beyond the data structure, beyond the data movement, to enable data users to turn the data to knowledge.
To reverse the 80% research and 20% analysis time breakdown (from the data administration meta data category introduction) companies will be required to provide business intelligence meta data that makes the data user’s time more productive.
Business intelligence meta data includes catalogs of reports and queries that have been written, verified, and are presently being distributed. Business intelligence meta data includes connectivity information, contact information for data stewards that can grant data authority, tools that are available to access the data, and help desk information to quickly resolve data access issues.
Without these types of meta data, data users get frustrated and struggle to gain access to the data that they need. Without this meta data, data users create their reports and queries from their own library of queries or from a blank piece of paper.
The lists that follow include samples of business intelligence meta data types, questions that business intelligence meta data can answer, and the individuals in the company that will benefit from the availability of business intelligence meta data.
Business Intelligence Metadata Types:
- Connectivity and Security Procedures
- Report / Query Tools Available
- Verified Reports / Query Names
- Verification Process Information
- Report Distribution Information
- Report Data Source Information
- Help Desk Information
- Accountability / Stewardship Information
Questions That Business Intelligence Metadata Can Answer:
- What reports / queries are already written may give the results I need?
- Who else uses this report / query?
- Where did the data on these reports come from?
- What confidence can I have in report / query results?
- Who wrote these queries / reports?
- Have these reports / queries been verified?
- Who wrote these queries and Who uses these queries?
- Who do I contact if I need authority to view certain data?
- How do I gain access to a particular data source?
- Who do I contact if I have a problem getting connected to the database?
Who Will Benefit From Business Intelligence Metadata:
The last category of IRM meta data to be discussed in this article is that of business intelligence meta data. For the purpose of this article, business intelligence meta data will include information that enables data users to turn data to knowledge. Case can be made that the three prior categories also fall under the business intelligence category using this definition. Business intelligence meta data goes beyond the data definition, beyond the data structure, beyond the data movement, to enable data users to turn the data to knowledge.
To reverse the 80% research and 20% analysis time breakdown (from the data administration meta data category introduction) companies will be required to provide business intelligence meta data that makes the data user’s time more productive.
Business intelligence meta data includes catalogs of reports and queries that have been written, verified, and are presently being distributed. Business intelligence meta data includes connectivity information, contact information for data stewards that can grant data authority, tools that are available to access the data, and help desk information to quickly resolve data access issues.
Without these types of meta data, data users get frustrated and struggle to gain access to the data that they need. Without this meta data, data users create their reports and queries from their own library of queries or from a blank piece of paper.
The lists that follow include samples of business intelligence meta data types, questions that business intelligence meta data can answer, and the individuals in the company that will benefit from the availability of business intelligence meta data.
Business Intelligence Meta data Types:
- Connectivity and Security Procedures
- Report / Query Tools Available
- Verified Reports / Query Names
- Verification Process Information
- Report Distribution Information
- Report Data Source Information
- Help Desk Information
- Accountability / Stewardship Information
Questions That Business Intelligence Meta data Can Answer:
- What reports / queries are already written may give the results I need?
- Who else uses this report / query?
- Where did the data on these reports come from?
- What confidence can I have in report / query results?
- Who wrote these queries / reports?
- Have these reports / queries been verified?
- Who wrote these queries and Who uses these queries?
- Who do I contact if I need authority to view certain data?
- How do I gain access to a particular data source?
- Who do I contact if I have a problem getting connected to the database?
Who Will Benefit From Business Intelligence Meta data:
- Business Analysis Management and Staff
- Business Management and Staff
- Data Administration Management and Staff
- Data Analysis Management and Staff
- Data Architects
- Data Research Management and Staff
- Data Warehouse Architects
- Executive Management and Staff
Summary
Prospective meta data managers can consider this information as a starting point for a meta data initiative. The IRM functions of data administration, database administration, data movement and
business intelligence are the ideal focus for a meta data management effort.
This article has provided information about four categories of IRM meta data that can vastly improve a company’s understanding and ability to use its data. The article included samples of the
types of meta data that exist in each category, questions that these meta data types will answer, and the individuals that will benefit from the availability of this meta data. The breakdown of
meta data into categories and the lists provided in this article provide an easy to understand introduction to selecting the “right” meta data to manage.