The Road to a Well-Ordered Enterprise is Through Metadata Management

Published in April 2007

No one would argue that enterprises should have their financial books in order. How else can you know where you’ve been, where you are and where you project you’d like to go? But just
what is the “data” that’s in the financial books? It’s not real data; rather it’s abstract representations of the real data. If it were real data, then you’d actually see
inventory data, reserves data, manufacturing cost data and HR benefits data. But you don’t. What you see are abstracted financial representations of these data. That is, when you look at
financial statements, you’re looking at data about these data, or to use the formal term, you’re looking at metadata.

What then is the financial system that manages all the financial data? It’s a financial metadata management system. Of course, you’ll have names for these systems such as General
Ledger. A key measure of the quality of the enterprise is the quality of its financial metadata within its financial metadata management system. You cannot have enterprise-wide, integrated and
non-redundant quality financial data about your products, sales, employees and customers without an integrated, federated and non-redundant financial [metadata] management system. They go hand in

Analogously, IT has its set of books, and those books contain metadata. As with financial metadata, you need a metadata management system to manage IT metadata. It also follows that a key measure
of the quality of the enterprise is the quality of its IT metadata management system. You cannot have enterprise-wide, integrated and non-redundant quality IT metadata data without an integrated,
federated and non-redundant metadata management system. They, too, go hand in glove.

It’s all about quality, and this paper is focused on quality IT metadata. How important is good data quality? A Google search on the phrase “data quality” turns up more than 2,600,000 hits,
largely dealing with data quality problems and solutions. Data quality problems are rooted in discordant semantics, which are the rules for meaning and usage. For example, what happens if your
company lacks policies for:

  • Having or not having dashes in social security numbers
  • Using 0 and 1 for Gender, vs. 1 and 2 for Gender (value domain mismatch)
  • Name consistency: Mike Gorman vs. Michael M. Gorman (same person, different name)
  • Name differentiation: Michael M. Gorman vs. Michael M. Gorman (different persons, same name)
  • Standard formulas: For instance, March East Region Sales = Sales for March of NE Division + Sales of March of SE Division, but the March Sales of NE Division is Net After Expenses, while the
    March Sales of SE Division is Total Monthly Sales.

The challenge is not simply whether there are data quality issues, nor how to fix them, but how to design these data quality issues out of the IT process from the very beginning. Not only will
enforcing data quality make fixing errors faster, it also will make IT system development faster and cheaper to evolve and maintain.

The way to achieve data quality is to install an infrastructure of quality IT metadata, and with it a quality metadata management system. But before you can do that, you’ll probably have to
convince your boss of the benefits of metadata management.

Convincing the Boss

Any enterprise-wide metadata management system and infrastructure has to have a beneficial impact on the bottom line. Here are four key measures that will determine your success:

  • Improves quality
  • Improves productivity
  • Decreases cost
  • Decreases risk

If your installed metadata management system cannot deliver these four key measures, then you should and will be judged a failure. So, what are you going to do? Here are six suggestions for
measuring the quality of IT metadata, which, when accomplished, will lead to improved quality and productivity, and decreased cost and risk:

  • Design away ETL
  • Consolidate and integrate
  • Standardize reference data
  • Engineer and manage your master data
  • Assign enterprise identifiers
  • Remove redundancies

Design Away ETL

Determine how many programs are being written to essentially perform an extract, transform and load (ETL) function. For example, if your sales, customer management, ordering, inventory and billing
systems are all “stove pipes,” then ETL software is required to bring all that data together to have total customer or product management. If you had data with high quality integrated and
non-redundant semantics across these systems and/or if these systems operated off a single integrated database, then these programs would largely be unneeded. Instead of massive quantities of ETL
programs, the increased levels of integration can range from integrated semantics across the set of multiple databases and information systems, to an integrated database that can be accessed by all
the separately created information systems, to a completely integrated system. Yes, programs would be needed for building summaries and the like; but in most enterprises, the online analytical
programs are largely accomplishing that role. The existence of these ETL systems, therefore, is a direct consequence of not having a quality metadata infrastructure. Compute the life cycle cost of
these systems and add this to the cost of bad metadata management.

Consolidate and Integrate

Determine how long it takes to design a program to process against a database. Does the program have to “fight” with the database’s design and build all sorts of temp files, extracts and
the like? All of that is generally due to bad database design. If your enterprise policy is well engineered – that is, integrated, not conflicting, and non-redundant – then so too will
be your database designs. If a database is very well designed, then building the essential logic of the program is a code-generation step. If you are building many database integration programs,
then you need to consider redesigning your databases to be fewer in quantity and more integrated. In short, consolidate and integrate.

If you’ve already done that and your programs still “fight with the database,” then likely the database design is wrong. That’s either a design problem that needs IT attention or an
enterprise policy problem that needs corporate management attention. If your data capture and updating programs are all okay, but your reporting programs are a processing nightmare, then consider
building “data warehouse” databases that contain data from many other databases and are tuned in design, for example, for just reporting the total customer experience with your business (that is,
ordering, delivery, maintenance, returns, and feedback). Determine the quantity of function points of these programs and divide each by the quantity of database tables that are being accessed. You
should have about 80 function points per table. A higher number means more embedded processes as that’s the main source of the counts. A lower number means fewer to essentially no embedded
processes. I estimate the average cost to build the function point, if you are using a code generator, to be about $50.

The very existence of this fight and all the infrastructure that must be created to compensate is a direct result of not having a quality, enterprise-wide and federated metadata management
infrastructure. Compute these costs, take the excess and add it to the cost of bad metadata management.

Standardize Reference Data

Do you have standardized reference data? Reference data is metadata. It is created, managed, distributed and employed across all databases and information systems in the enterprise either
physically or virtually. In the examples provided earlier, are there standard “city-state-ZIP” code tables that are available to all databases and systems? Is there one place for all key customer
data such as addresses, contacts, assessments, rankings and phone numbers? Compute the cost of defining each effectively duplicated set of reference data, and then add all the costs in excess of
creating and maintaining reference data more than once to the cost of bad metadata management.

Engineer and Manage Your Master Data

Do you have authoritative data sources? These are now called master data. Some organizations also call this data strategic data. For example, is there one definitive place to which all key
customer data updates are driven so that there is one definitive place from which all customer data can be referenced and/or employed? Master data is just another version of the key slogan “define
once, use many times.” Master data’s slogan is “create once, store once and update once to then use many times.” Regardless, the process and infrastructure of defining, knowing about and
managing all this master data is metadata management. Thus, if there is more than one definitive source for all multiple-use data, then compute the cost all this data creation and maintenance
(beyond the first instance) and add that to the cost of bad metadata management.

Assign Enterprise Identifiers

Do you have enterprise identifiers for all assets that reside in the IT enterprise? An enterprise identifier is a unique number that is not information-bearing and that is assigned to each asset as
it first comes into existence within the enterprise. Is there a central database and supporting metadata management system for maintaining all metadata about these enterprise-identified assets? In
the earlier example, there would be “master” customer identifiers, product identifiers, employee identifiers and the like. Once these identifiers are defined and deployed, complete knowledge
about all uses of customer data is available because the customer is definitively known through the enterprise identifier. Enterprise identifiers are key to the successful creation and deployment
of authoritative data sources. If enterprise identifiers are not created and maintained, then compute the cost of all the cross-reference identifiers, and compute the cost of all the human and
computer resources necessary to determine the unique set of assets. Subtract from these costs the costs to create and use a single set of enterprise identifiers. Add the difference to the cost
of bad metadata management

Remove Redundancies

When the managers get together to build budgets for your business, do they argue about what the “numbers” mean? That is, do they have different numbers for the count of employees, total sales,
cost of inventory, value of assets, and all that? If they have conflicting numbers, then once the cost of creating all those conflicting numbers is accounted for, compute the cost of all those
arguments and the cost of coming to a determination of the correct “numbers.” The infrastructure that contains all the definitions, the processes that compute the numbers, their supporting
systems, the calendar and business cycles that are operating to compute the results and, of course, the interrelationships among all these definitions, numbers, calendars, and business cycles are
all metadata; and the system to manage it is the metadata management system. Again, compute the cost of all these redundancies, fights, time, effort and energy to resolve the discrepancies, and add
that cost to the cost of bad metadata management.

Now, if you are armed with these numbers, then you can quickly say something like this to the CIO:

If we have quality metadata and a quality metadata management system, then we can reduce the cost of all software design and implementation by at least 40%. We can reduce the quantity of data
storage by 50%.

The CIO will listen to your story because you’re talking the CIO’s language: improved productivity, reduced cost, lowered risk and increased quality. Metadata management is the key
foundation block to a well-ordered enterprise, just as is its set of well-ordered financial books. It’s that simple.


submit to reddit

About Michael Gorman

Michael, the President of Whitemarsh Information Systems Corporation, has been involved in database and DBMS for more than 40 years. Michael has been the Secretary of the ANSI Database Languages Committee for more than 30 years. This committee standardizes SQL. A full list of Whitemarsh's clients and products can be found on the website. Whitemarsh has developed a very comprehensive Metadata CASE/Repository tool, Metabase, that supports enterprise architectures, information systems planning, comprehensive data model creation and management, and interfaces with the finest code generator on the market, Clarion ( The Whitemarsh website makes available data management books, courses, workshops, methodologies, software, and metrics. Whitemarsh prices are very reasonable and are designed for the individual, the information technology organization and professional training organizations. Whitemarsh provides free use of its materials for universities/colleges. Please contact Whitemarsh for assistance in data modeling, data architecture, enterprise architecture, metadata management, and for on-site delivery of data management workshops, courses, and seminars. Our phone number is (301) 249-1142. Our email address is: