Why is Meta data Important?
In this article a common problem is addressed. How can we convince managers to plan, budget and apply resources for meta data management? What is meta data and why is it important? What
technologies are involved? Internet and Intranet technologies are part of the answer and will get the immediate attention of management. XML is the other technology.
Every country is now interconnected in a vast, global telephone network. We are now able to telephone anywhere in the world. We can phone a number, and the telephone assigned to that number would
ring in Russia, or China, or in Outer Mongolia. But when it is answered, we may not understand the person at the other end. They may speak a different language. So we can be connected, but what is
said has no meaning. We cannot share information.
Today, we also use a computer and the World Wide Web. We enter a web site address into a browser on our desktop machine – a unique address in words that is analogous to a telephone number. We can
then be connected immediately to a computer assigned to that address and attached to the Internet anywhere in the world. That computer sends a web page based on the address we have supplied, to be
displayed in our browser. This is typically in English, but may be in another language. We are connected, but like the telephone analogy – if it is in another language, what is said has no meaning.
We cannot share information. Now consider the reason why it is difficult for some of the systems used in an organization to communicate with and share information with other systems. Technically,
the programs in each system are able to be interconnected and so can communicate with other programs. But they use different terms to refer to the same data that needs to be shared.
For example, an accounting system may use the term “customer” to refer to a person or organization that buys products or services. Another system may refer to the same person or organization as a
“client”. Sales may use the term “prospect”. They all use different terminology – different language – to refer to the same data and information. But if they use the wrong language, again they
cannot share information. But the problem is even worse. Consider terminology used in different parts of the business. Accountants use a “jargon” – a technical language – which is difficult for
non-accountants to understand. So also the jargon used by engineers, or production people, or sales and marketing people, or managers is difficult for others to understand.
They all speak a different “language”. What is said has no meaning. They cannot easily share common information. In fact in some enterprises it is a miracle that people manage to communicate
meaning at all!
Each organization has its own internal language, its own jargon, which has evolved over time so that similar people can communicate meaning. As we saw above, there can be more than one language
used in an organization. Meta data identifies an organization’s own “language”.
Where different terms refer to the same thing, a common term is agreed for all to use. Then people can communicate more clearly. And systems and programs can intercommunicate with meaning. But
without a clear definition and without common use of an organization’s meta data, information cannot be shared effectively throughout the enterprise.
Previously each part of the business maintained its own version of “customer”, or “client” or “prospect”. They defined processes – and assigned staff – to add new customers, clients or
prospects to their own files and databases. When common details about customers, clients or prospects changed, each redundant version of that data also had to be changed. It requires staff to make
these changes. Yet these are all redundant processes making the same changes to redundant data versions. This is enormously expensive in time and people. It is also quite unnecessary.
The importance of meta data can now be seen. Meta data defines the common language used within an enterprise so that all people, systems and programs can communicate precisely. Confusion
disappears. Common data is shared. And enormous cost savings are made. For it means that redundant processes (used to maintain redundant data versions up-to-date) are eliminated, as the redundant
data versions are integrated into a common data version for all to share.
How Is Meta data Used with XML?
Much effort has earlier gone into the definition and implementation of Electronic Data Interchange (EDI) standards to address this problem of intercommunication between dissimilar systems and
databases. EDI has now been widely used for business-to-business commerce for many years.
It works well, but it is quite complex and very expensive. As a result, it is cost-justifiable generally only for large corporations. Once an organization’s meta data is defined and documented,
all programs can use it to communicate. EDI was the mechanism that was used previously. But now this intercommunication has become much easier.
Extensible Markup Language (XML) is a new Internet technology that has been developed to address this problem. XML can be used to document the meta data used by one system so that it can be
integrated with the meta data used by other systems. This is analogous to language dictionaries that are used throughout the world, so that people from different countries can communicate. Legacy
files and other databases can now be integrated more readily. Systems throughout the business can now coordinate their activities more effectively as a direct result of XML and management support
for meta data.
XML now provides the capability that was previously only available to large organizations through the use of EDI. XML allows the meta data used by each program and database to be published as the
language to be used for this intercommunication. But distinct from EDI, XML is simple to use and inexpensive to implement for both small and large organizations. Because of this simplicity, we like
to think of XML as:
“XML is EDI for the Rest of Us”
XML will become a major part of the application development mainstream. It provides a bridge between structured databases and unstructured text, delivered via XML then converted to HTML during a
transition period for display in web browsers. It includes the following components:
Extensible Markup Language (XML)
– Defines document content using meta data tags and namespaces
Document Type Definition (DTD)
– Defines XML document structure (analogous to database schema)
Extensible Style Language (XSL)
– XSL or Cascading Style Sheets (CSS) separate layout from data
Extensible Linking Language (XLL)
– XLL implements multi-directional links (single or multiple)
Document Object Model (DOM)
– Standard language interface for processing XML in any language
Resource Definition Framework (RDF)
– W3C Interoperability Project for data content interchange
Meta data is used to define the structure of an XML document or file. Meta data is published in a Document Type Definition (DTD) file for reference by other systems. A DTD file defines the
structure of an XML file or document. It is analogous to the Database Definition Language (DDL) file that is used to define the structure of a database, but with a different syntax.
An example of an XML document identifying data retrieved from a PERSON database follows. This includes meta data markup tags (surrounded by , such as ) that provide various details about a person.
From this, we can see that it is easy to find specific contact information in , such as , , and (cell phone) numbers. Although I have not shown it, the DTD also specifies whether certain tags must
exist or are optional, and whether some tags can exist more than once -such as multiple and tags below.
Meta data that is used by various industries, communities or bodies can be used with XML, XSL and XLL to define markup vocabularies. The World Wide Web Consortium (W3C) has developed a standard
framework that can be used to define these vocabularies. This is called the Resource Definition Framework (RDF). It is a model for meta data applications that support XML. RDF was initiated by the
W3C to build standards for XML applications so that they can inter-operate and intercommunicate more easily, avoiding communication problems that we discussed earlier.
There is considerable effort in various industries to define their own standard language, called a markup vocabulary, using XML for their meta data. These become unique languages for
intercommunication between participants in an industry. Markup vocabularies include: Mathematic Markup Language (MathML); Chemical Markup Language (CML); Open Financial Exchange (OFX); Internet
Content Exchange (ICE); Voice Recognition Markup Language (VML); JavaBean Markup Language (JBL); Synchronized Multimedia Integration Language (SMIL); and Wireless Markup Language (WML). Other
markup languages have been defined for Channel Definition Format (CDF), Meta Content Framework (MCF), Open Software Description (OSD) and Web Interface Definition Language (WIDL). For example, the
Channel Definition Format – which was delivered as part of Microsoft Internet Explorer 4.0 and now widely used – is based on XML.
The W3C and RDF web sites are two good starting points for more information about the above markup languages. The RDF web site is at www.w3.org/Meta
data. RDF is now a W3C recommendation, the first step towards becoming a standard. The W3C web site is at www.w3.org. These all will point you to specific web
sites that provide additional details about the above markup languages.
With XML, even more effective applications become possible. For example, an organization can define the unique meta data used by its suppliers’ legacy inventory systems. This will enable that
organization to place orders via the Internet directly with those suppliers’ systems, for automatic fulfillment of product orders. This application and eight other typical XML applications are
available from the Microsoft XML Scenarios web site.
XML is enabling technology to integrate unstructured text and structured databases for next generation E-Commerce and EDI applications. Web sites will evolve over time to use XML to provide the
capability and functionality presently offered by HTML, but with far greater power and flexibility. Netscape Communicator 5.0 and Microsoft Internet Explorer 5.0 browsers will soon be released.
Microsoft Office 2000 will also be released in the second quarter 1999. All of these will support XML. New XML development tools will also be released in 1999 to enable XML applications to be
developed more easily.
The acceptance and application of XML is progressing rapidly, as it offers a very simple – yet extremely powerful – way to intercommunicate between different databases and systems, both within and
outside an organization. This is structured data that is available from databases and legacy files. Yet for most enterprises, over 90% of the knowledge resources exist not in structured databases
and files, but in unstructured text documents, in graphics and images, as well as in audio and video files.
How well an organization accesses and uses its knowledge resources often determines its competitive advantage and future prosperity. The use and application of knowledge will become even more
important in the future competitive Armageddon that we are all about to enter.
The tools are coming, but a greater task remains still remains to be completed. This is the definition of your own meta data, your common enterprise language for intercommunication, so that you can
use these tools effectively. The definition of meta data depends on knowledge of data modeling, previously carried out by IT people. But this is not just a task for IT. As it is vitally dependent
on business knowledge, it also requires the involvement of business experts. Not by interview, but by their active participation. While data modeling has until now been a technical IT discipline,
business data modeling is not. It can be learned by business people as well as IT staff.