Organizations that are implementing data management and/or data governance programs look to the software tools of the trade as an enabler for their program’s success. While the software does not solve the organizations’ data problems by itself, the metadata stored and managed in these tools becomes a valuable resource to improve understanding, quality, analysis, and protection of data. In fact, some people, myself included, will tell you that you cannot manage or govern your data without metadata.
Metadata is not a new term. Most people know metadata as “data about data”. Many people think of metadata as a technical resource, though it is beginning to be viewed more as a business resource. My definition of metadata is “data recorded in Information Technology tools that improves the business and technical understanding of the data and data-related assets.” Let me break that definition down for you:
Data – Metadata is data. Metadata is data about data – the location, meaning, values, quality…
Recorded in IT tools – Data about your data cannot be made available until it is recorded somewhere.
That improves the business and technical understanding – Everybody needs to understand the data.
Of data and data-related assets – Data about the process, people, reports etc. … associated with data.
The need for metadata is obvious to people in the data management and governance industries and, increasingly, the general public. The term metadata has also taken on other meanings. Metadata is the term now used to describe the detailed information about phone calls and emails. Edward Snowden brought the term into the public spotlight regarding phone tapping, and the former Madame Secretary and First Lady has kept it in the news regarding her emails. The importance of metadata is on the rise. There is no denying this fact.
When organizations begin the process of identifying metadata tools to bolster their data management or data governance programs, it is important to identify both technical and business requirements that will be used to match vendors and tools with the organization’s needs. The business metadata requirements, or how the business will use the information about the data, is extremely valuable to the selection process. These requirements can include the need for metadata about the business rules, classification and handling of data, availability and access to the data.
However, there are ten specific categories of technical metadata tool requirements that must also be reviewed as part of the selection process, no matter what is defined as the business requirements.
The categories are:
- Metamodels and Software Releases
- Self-Defined Loads
- Role Representation
- Process Integration
- Change Control and Versioning
- End User Requirements
- Training and Education
- Resource Requirements
Metamodels and Software Releases
Metamodels are the logical and physical models that represent the metadata stored in the tool, how that metadata is stored, the connections between the items, the database information, and any other pertinent information regarding the metadata in the tool.
Often vendors reuse components of the metamodels for categories of metadata maintained in their tools. For example, the physical database metamodel will reuse components of a physical database (i.e. database name, table name, and column name, key) and the logical metamodel of a logical database design (entities, attributes, domains) for the different tools they support.
The information in the metamodels will be very important:
- if you will consider accessing the metadata with means other than through the tool itself
- if you will be developing your own metadata load capabilities
- if you will be developing your own reports against the metadata in the tools
- if you will be integrating metadata between tools and platform
- if you want to assess the quality of the metadata in the tool
Metamodels are often associated with releases or versions of the metadata tool. For example – when a product progresses from version 2.3 to version 2.4, there are typically changes to the way the metadata is stored in the tool. The metamodels are updated to represent those changes.
Extensibility is the ability to add metadata categories and attributes to the out-of-box implementation of the metadata tool, and incorporate them into existing metamodels and tool functionality.
Extensibility becomes important when you find the tool does not precisely meet your need or requirements. Extensions typically result in changes to the metamodels mentioned on the previous page. Organizations typically maintain a development environment to create, test, and integrate the extended changes physically in the tool database.
Metadata populating extensions to the tool typically require either the direct ability to enter and maintain the metadata in the tool itself or create a customized (self-defined) load process discussed in the next section of this article.
Self-Defined Loads is a term used to describe the ability to design and build customized processes to load metadata into the tool. Vendors typically provide the tools to load metadata from a finite number of other tools that they provide and from other vendors. The ability to quickly and easily define and deploy a customized loading process is critical to being able to control your ability to load metadata from places not supported by the metadata tool vendor.
Another reason these loads are important is that, even though you can purchase the “engine” to move metadata from a tool to the metadata tool, vendors often release new versions and releases of their software independently and the engine may cease to function properly when the vendor changes the way the metadata is stored in their tool. It is KIK’s experience that it becomes a software “wait and see” approach to synchronizing tool’s ability to pass metadata between them. Therefore, having the ability to define and deploy self-defined loads becomes critical to keeping the metadata in the metadata tool current.
This feature or functionality relates closely to the extensibility function defined on the previous page. When the metadata tool is extended to include new information, the engines that function between tools do not populate the extensions. Or the extensions result in a category or subject area of metadata that cannot be loaded through traditional means. Therefore, the ability to write and maintain customized load function becomes critical to loading metadata in the tool.
Role Representation is the ability of the tool to record people in roles that have been defined as part of your data governance operating model of roles and responsibilities. Clearly defined accountability for data lies at the heart of your data governance program. This accountability must be for the data being governed as well as the metadata associated with that data. Metadata tools must have the ability to associate 1) a person to a role and 2) a role to a function of the tool.
Often, multiple people will play the same or similar roles in a data governance program. For example, there will be several people on the Data Governance Council, there will be several Subject Matter Experts for different data, and multiple people can participate in the same role for the same data. For example, all people that use a specific type of data must abide by the same data protection, risk management and data quality rules. And a single person may participate in several roles.
Therefore it is important that 1) people can be associated with multiple roles and 2) that multiple roles can be associated with data, metadata, and processes at the same time. It is important that the metadata tool has the ability to represent these relationships effectively.
Process Integration is a term used to describe how the metadata tool can become part of daily activities associated with data and the governing of that data. The metadata in the tool will only become valuable if it is stored and used in a manner that makes it easy for people to utilize that metadata to improve efficiency and effectiveness of business operations. Therefore, it is important that the tools can be integrated into processes easily and effectively.
Process integration may include simple metadata processes such as adding new metadata, updating that metadata, and deleting metadata. Or the processes may be more complex like providing data meaning and lineage along with data on a report on a screen or providing the capability to coordinate the feedback process on new or refined data definitions. The ability to put the metadata in the hands of people is paramount to metadata tool success.
Change Control and Versioning
Change Control is an important aspect of metadata tool implementation. Loading and entering metadata into the tool requires significant attention but the manual and automated change control of the metadata in the tool is critical to the on-going successful implementation and use of the tool. Without successful change control, the metadata that is stored in the tool becomes a “snap-shot” of the metadata at a point in time. For example, loading the database structure for the data in the data warehouse into the metadata tool is extremely important. However, keeping that information up to date when the data warehouse design evolves is critical to maintaining current and high quality of metadata in the tool.
Change Control will differ depending on how the metadata is loaded in the tool. If the metadata load routine is automated (for example you scan the database catalogue to load tables and columns), a process and function must be put in place to update the metadata tool to reflect changes to the tables and columns. Automated change control can be triggered automatically if there is a “hook” or alert built into the process of making the database change. If metadata is entered manually into the tool, processes must be put in place to make certain that metadata is kept up to date and current.
Metadata that is out of date or incorrect can result in misuse and misunderstanding of the data nullifying the positive impact of the metadata tool and potentially leading to decision making based on misinformation. Change control is put in place to assure that the metadata is current and of high quality.
Versioning is also an important aspect of tool implementation. Recording and keeping record of changes to metadata becomes a valuable asset to the organization. Metadata tools typically have effective capabilities to keep versions (for example to track the history of changes) of the different categories of metadata from Data warehousing metadata, to Database metadata, modelling, movement and reporting metadata.
Another use of versioning allows for the management of a data element called the same thing but that has different meanings depending on the usage or physical storage of the data. Therefore it is also necessary to be able to store different versions of things with the same name. Metadata tool vendors must be able to provide information on change management and versioning.
Communications also lies at the heart of a successful data governance program. The metadata tool’s ability to improve communications associated with critical data becomes vital to getting the most out of the metadata tool.
It is important that the metadata tool provide the most basic of communications functions. These functions can be as simple as providing the phone number or the email address of a person that needs to be contacted regarding a question about the data or a report or a process, or as complex as sending and monitoring emails that are sent to all participants in a specific process or notifying all users that a rule associated with the data has changed (compliance rule, business rule, etc.).
End User Requirements
End User Requirements are required for the user to be able to navigate and gain access to the information that they require to complete their job function. Basically, end user requirements cover functions and operations that are required to be associated with the critical metadata that becomes vital to getting the most out of the metadata tool.
It is important that the metadata tool provide the most basic of metadata end usage functions. These functions can be as simple as providing and limiting metadata availability based on an individual’s or group’s approved access and capabilities.
Training & Education
Training and Education must be provided to orient people to the metadata tool, its functionality, and how it can and will be used. Most vendors provide significant training to their customer base via different channels (for example off-site, on-site, on-line/remote and on-demand). It is important that your company consider training and education in the metadata tool evaluation process.
Training and Education are typically part of the purchase and maintenance price and must be considered as part of the total cost of ownership. Training and education must be considered for management of the product (including installation, testing, development of the test and production environments, maintenance and new releases of the tool, development and synchronization of metadata engines for moving metadata to and from the tool) as well as usage of the product.
End-user Training and Education also becomes very important in the implementation of the tool. Providing the end user base with a solid foundation on data governance and metadata definition, production and usage is important as well as teaching them and refreshing them on tool access and functionality. On-going end-user support is also extremely important not only when the tool is being rolled out to technical and business users, but on a regular and as-needed basis as well.
Resource Requirements are critical success factor when it comes to investing in and implementing a metadata tool. Metadata tools require (sometimes significant) resources to achieve a successful and sustainable implementation. The number of required resources varies depending on the size of the company and the size and breadth and goals / expectations of the metadata implementation. Organizations have deployed from a single part-time resource to a dedicated team of individuals responsible for the tool implementation.
More often than not, having the resources required to implement and sustain a metadata tool becomes one of the key determining factors of success. KIK Consulting often observes that companies operate with a very lean staff with people participating in many different roles. Therefore, it is critical that your company gain a thorough understanding of the resources that will be required to get the most value and use out of their metadata tool. The vendor should be able to provide approximations of the required resources to deploy the tool as well as support the technical and business communities within the company. The vendor should be able to provide information based on similar implementations with expectations similar to those of your company.
I hope this list of technical metadata tool requirements and their descriptions will be useful in your search for the best metadata tool to support your data management and data governance programs. Each of these categories requires specific questions that should accompany any Request for Proposal (RFP) that will be delivered to evaluate and select the vendor and metadata tool that is right for you. Feel free to request a sample list of technical metadata tool requirement questions that are associated with each category.
While matching business requirements with appropriate tools or toolsets is invaluable, the technical tool requirements must play a valuable role in selecting the best tool for your environment. In a future article I will write about how to collect, record, and share business requirements for metadata to support your data management and data governance programs.