Published in TDAN.com July 2001
The following is a excerpted chapter from Adrienne’s soon to be published book titled “Metadata Solutions: Using Metamodels, Repositories, XML, and Enterprise Portals to Generate Information
on Demand”, coming from Addison-Wesley in 2001.
Alvin Toffler – Future Shock (New York: Bantam Doubleday Dell, 1971)
By the 1990s information was an available asset to anyone with a computer. Unfortunately, information does not always result in the knowledge that someone has been seeking. In many cases, people
may be worse off with this information than without it.
Attempts at managing this information have been less than ideal. One area’s success is usually contradicted by another area’s major mistakes. In the end, we are slightly better off, at
Anarchical Data Management
Often, when a resource is considered valuable, organizations create sub-organizations responsible for its management. For example, many large corporations have departmental human resource areas,
where all aspects of an individual’s career (other than salary and benefits) are tracked and managed. The ease with which individuals can get training through such a sub-organization, as
opposed to a centralized human resource function, is dramatic and often encourages the existence of subareas. On the other hand, attempts to integrate information about an individual’s career
at the corporation, across many departments, do not always reveal the same amount of detail in the same style and format when all the sub-corporate information is gathered in one place. Hence, the
perpetuation of anarchical data management.
When data first became valuable, it was clearly an IT resource. The data that was tracked and organized outside of IT was not considered to be of “corporate” value in most situations. Aside from
the fact that non-IT data was often tracked manually, it was generally designed, created, and maintained by the benefiting organization. Over time, the people responsible for data management have
changed, evolving eventually to IT, and that has had a big impact on how data has been managed large organizations.
Standard Data Resource Management
In the beginning, data in most organizations was clearly managed via naming standards and data dictionaries. The use of the data was not of concern, nor were the copying and extraction. As data
propagated, it was clear that the existing data management standards were not sufficient. Data management then moved into its next historical phase—reactive data management.
Reactive Data Management
As data propagated, the standard management practices could not keep up. Data management organizations saw the need not only to manage data at its inception but also to ensure its integrity and
value throughout development and well into the maintenance of post-production applications. The advent of client/server applications made it much easier to bring applications to fruition in their
own limited world, and this often included the creation and definition of new data. In many cases what was perceived to be new data was just a new version of existing data, or a simple set of
Data reproduced; information was created, Data Management reacted. New policies and procedures were created; the infiltration results were assessed. Abstainers from procedures were eventually
discovered and Data Management began introducing new techniques: Enterprise Data Models, Data Stewards, and Corporate Data Definitions. In many cases, data issues were believed to be restricted via
the purchase of packaged software at the Enterprise level (ERP Solutions). With standard software, most felt the software’s internal data would become “standard.” Little did they
realize that the term extract is as familiar to business users as it is to a dentist.
Data warehousing became a popular means of providing integrated data views to those who needed them most. Metadata became popular, but only from the viewpoint of those using the data warehouse.
Eventually data management got tired of not being able to catch up— of not being able to manage the data that was fueling warehouses and not being able to control the quality of the
“metadata” that was going along with all of this data.
Proactive Data Management
Once data became out of control, two major events occurred in the data management community:
- Creation of proactive data management practices that would prevent data chaos
- Entry of newcomers into the data management world, who had never heard of data management but were now unconsciously suffering the repercussions
With proactive data management, policies and procedures affect the life of data, rather than vice versa. In theory, data is then defined from the point of view of the organization rather than that
of a single application, with the intention of having it standardized for the purpose of reuse as opposed to redefinition. Many organizations have embraced this philosophy, yet its implementation
has not quite penetrated to the depths for which it was intended.
Today data management exists not only in scattered parts of most IT organizations, but also, because of data’s newfound value, outside the IT organization. Each sub-organization may or may
not have its rules, and the rules may or may not have their impact. Despite all the variations in implementation, data management is still not an exact science as we begin the twenty-first century.
Methodologies abound, interpretations of the methodologies abound, and software supporting the methodologies abounds—what better definition of anarchy!
The Data Warehouse Web
Anarchical data management does not make it easy for people who need data or information the most to access and interpret it. So data warehouses began populating the corporate coastline, focusing
first on the integration of transaction data (i.e., invoicing and sales transactions) geared toward the analysis of corporate market penetration. Corporate decision makers wanted answers to these
- Who is buying our products?
- Who is not buying our products?
- When are they buying our products?
- Which products are popular?
- Is product popularity based on region? Time of year?
Actual data has more value than market projections do, especially when the data is accurate. Multiple sources, multiple interpretations, multiple definitions…and each organization has its
own analysis requirements and its own reasons for performing the analysis, perhaps with the same data, perhaps not. Data warehouses became the source of instant gratification, the place to get
quick bar and pie charts. Once they were created for one part of an organization, it wasn’t long before smaller variations began rampaging the overall organization, many created by the
end-user organizations with minimal IT assistance. In fact, many have been renamed “data marts” based on their simplicity and single source.
Multiple data marts and warehouses resulted in an unplanned data warehouse web. These data warehousing webs caused problems not only from the previously discussed data management point of view, but
also from the data integration and interpretation angles as well. The exponential increase of “integrated multidimensional data stores” caused a reevaluation of data warehouses: What are they
really? Are they the best quick fix for every data-reporting situation? Data managers were forced to revisit their philosophies, which resulted in new concepts. Corporate data staging areas that
represented extracted, cleansed, and translated data from many production systems were to be placed in a single location for use and further extraction by those who needed them.
It doesn’t take long, as illustrated in Figure 2, for a data warehouse web to take hold as more and more sources are combined, resulting in more and more data warehouses, data marts,
operational data stores (ODSs), or general reporting databases. But because these data stores are designed for “decision support,” they are never as stringently controlled as production data is.
Tools, Tools, and More Tools
Whenever the IT process is not quite what it should be, vendors are always ready with the solution—another tool. The acquisition of a myriad of tools has had a global impact on the data,
information, and metadata in virtually every IT organization, both large and small. Seldom was the decision to purchase a tool based on the tool’s accompanying metadata. As such, the growth
of a metadata web was unintentionally fueled to its current state, and it is still growing!
It makes sense to evaluate a tool based on its functionality. In the early 1980s the “best of breed” philosophy became popular, whereby specific IT groups had permission to acquire their own set
of tricks, specifically a distinct set of software development tools, from a particular vendor in most cases. This philosophy actually started the repository bandwagon. Unfortunately, vendors would not be in business if their tools were not distinct…and this distinctiveness requires an associated set of
distinct information…or is it really that distinct? In most cases, one vendor tool’s information shares many qualities with the information used, accessed, and created by their
competitors’ products. What varies is usually how the information is processed and how it is correlated with other resident and nonresident vendor tool information.
An easy example is the portrayal of the world of the database administrator (DBA) and the tools he or she uses. This individual is distinctly responsible for the performance of an
organization’s databases, many of which are very large and vital to corporate information processing. Depending on the deployed DBMS and platform/operating system combination, specific
performance monitoring tools are available, in many cases from the DBMS vendors themselves. Aside from the monitoring of performance, there is tool-resident capability to modify underlying database
structure made. In even the best of scenarios, these modifications are typically not shared with anything other than the DBA tool itself. So, in the best of worlds, a logical data model may have
originated from a modeling or CASE tool, and the connections between these business-level definitions may have been created when the physical database was created. But that is usually the extent of
business-implementation connection. Once the database becomes part of a production world, DBA organizations are typically responsible for its fine tuning. If that means the definition of new keys,
new indices, new database structures, or even entirely new databases, those actions are well within their responsibilities; vendor tools that also monitor the associated performance typically also
perform associated changes.
In most organizations, these changes are not “back tracked” to any other tool or repository. So the best organization’s intentions are squashed once the DBA, or any other group, acquires
its own set of vendor software with its own database documentation.
Likewise, the frustration of the business user community has led vendors to pursue the largest area of growth in recent years—that of decision support tools. Most of this growth is based on
the recent ability of users to obtain, decipher, and store data for the purposes of marketing-based analysis with or without a data warehouse. Now people outside of IT can obtain data (downloading,
extracting, consolidating, or creating for perhaps the nth time) and create more information. And as more information gets created, the data and information web propagates, without organization or
overview, often to the point of no return.
Tools are perhaps an immediate solution to immediate problems, yet they have a profound tendency to create more of the problems that they were originally marketed to solve, such as the following:
Can’t get to your data? Don’t worry, Tool X can download it at the click of a mouse, and Tool Y can extract it from virtually any type of database. All you need to do is make
sure you have permission to read it (which most users do).
Can’t organize your data into a decent database structure? Don’t worry; you don’t have to do that anymore. Decision support tools assume a standard STAR Schema design, and most
can load the data for you into a predefined set of facts and dimensions. All you have to do is tell the tool where to go to get the stuff.
Don’t understand the difference between your new data and its original source? Don’t worry; every one of these tools has a metadata repository, a place for you to store the
definitions that you finally decipher, so that from now on no one will ever question the reports that result from this new tool.
Maybe metadata is the solution to these problems. Metadata has become the buzzword since the year 2000, and it with all its supporting functions has become a specialty with a required role on most
data warehousing teams.
Metadata: The Silver Bullet
We can understand why virtually every reporting tool comes with a disclaimer that relates to the vendor’s lack of liability due to improper analysis and data-based conclusions. IT themselves
usually “pleads the fifth” and blames improper conclusions on the “lack of proper metadata.” Whose job is it to create this stuff to begin with? What exactly is it?
Metadata is often defined as the “data about the data” and most organizations perceive it to represent descriptive information (element names, definitions, lengths, etc.) about the
populated data fields. In fact, metadata is much more, and the choice of items actually described and detailed is up to the responsible metadata specialist. True metadata involves not only the
descriptive information pertinent to the users of the items in question, but also the generation, maintenance, and display of the items by tools, applications, other repositories, and the
users. Unfortunately, very few of today’s metadata deliverables consider this bigger picture. And the bigger picture must permit each tool, application, piece of software, and individual who
touches a piece of data to answer the 5 Questions:
- What data do I have?
- What does it mean?
- Where is it?
- How did it get there?
- How do I get it?
Remember that data is useless without metadata. Most members of the user community are well aware of this fact, but are not aware that their individual metadata solutions are now making
metadata just as useless as the described data.
The Metadata Web
Metadata, metadata everywhere, and not a drop has meaning. Narrowly focused solutions have caused major organizations to have so much metadata that much of it has to be scrapped and redefined. In a
typical data warehouse architecture, for example, virtually all architectural components contain their own processing-specific metadata (see Figure 3), and few of the individual instances can be
related generically to the metadata in the other warehouse components. The result is a complicated metadata web, with virtually no official starting point.
To avoid the disasters described in this chapter, the best approach to data management is to consider the current state of your data and metadata environments. Would it be better to add strings to
an existing metadata web, or to plan, design, and implement a true metadata solution? The difference between isolated metadata stores and a true metadata solution rests with an understanding that
there is a difference.
Regardless of the planned scope of your metadata solution, it is crucial that you view each possible solution in its larger corporate environment. Continue to discover what metadata really is, why
metadata is created, and how you can leverage that which you already have.
 IBM began the early repository movement with AD/Cycle in an attempt to integrate information from major CASE tool vendors.
 The 5 Questions is a trademark of and the questions and method are copyrighted by Database Solutions, Inc., Bernardsville, New