XML: Catalyst for Convergence?

In the article XML: The New Esperanto? I suggested that if XML is accepted by a critical mass of e-commerce participants and
industries as a technological enabler, it will then act as an organizational motivator. It appears that this motivation is indeed picking up steam. The primary evidence for this momentum is the
increased activity surrounding industry-specific vocabularies implemented in XML “schemas”.

An XML schema, according to Microsoft, is a definition of a document [type], which includes:

  • the elements that can appear within the document
  • the attributes that can be associated with an element, including whether an element is empty or can include text, and any default values that may exist
  • the structure of the document: which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements

Technological curmudgeons may note that any similarities between an XML schema and a COBOL FD (or a data declaration in just about any computing language) may not be purely coincidental. There are
only so many ways to describe what is, after all, a data record. Also, as we know, describing the syntax of a data structure does not necessarily indicate the meaning of its contents. (There he
goes again with that semantics thing.) One could certainly define within an XML schema an element with a tag name “XY01″. But this is where the “critical sharing mass” comes in: the tag is
understood either implicitly (“heck, everybody knows what an XY01 is!”) or explicitly (i.e., documented in a “schema repository”), or nobody will use/share that document. If a given XML schema
is shared, it would certainly follow that some meaning is being conveyed.

Progress on XML schema development can be tracked in at least two portals (formerly known as Web sites), XML.org and biztalk.org. These are “schema repositories” at which schemas will registered
(stored). There are a good number of participants currently registered at xml.org. Also, there appears to be a significant amount of activity around mapping the older format-standards, such as EDI,
FIX and IFX in the finance industry, to XML schemas. So is some cautious optimism warranted regarding XML as a “catalyst for convergence” toward fewer, more ubiquitous data-element-naming

Maybe not, due to several factors. The same element name (tag) can occur in an unlimited number of schemas. The same element name can occur multiple times within the same schema. And it’s also
highly likely, for the sake of expediency, that the data elements within various older formats will merely be mapped one-for-one to XML schemas. Let’s look at some examples —
maybe hypothetical, maybe not — related to the representation of bank account balances.

  • Say OFX (Open Financial Exchange) maps its data elements one-for-one to XML schemas. The OFX element named BALAMT becomes XML element with tag BALAMT.
  • Say IFX (Interactive Financial Exchange) also maps its data elements, one-for-one, to XML schemas. The IFX element named BALAMT becomes an XML element also tagged BALAMT.

Can we now say that OFX and IFX have “converged” in XML? Can we assume that these two XML elements, having identical tags, are semantically equivalent, i.e., synonymous? Actually we cannot,
because, going back to the sources, IFX and OFX qualify balance amounts (Ledger, Available, Current, etc.) differently. In IFX, the meaning of the balance amount is qualified by the value assigned
to another field (BALTYPE). In OFX, the meaning is qualified by the name of the “aggregate” (i.e., group level) field (LEDGERBAL, AVAILBAL) in which BALAMT is nested. There can be multiple BALAMT
elements in a single OFX/XML document.

Convergence occurs only when the meanings of equivalent labels are precisely synonymous. A Balance Amount (Available) is not equivalent to a Balance Amount (Ledger), as anyone who’s tried to write
a check on an un-cleared deposited check can attest.

So we’re not converged yet — is there hope? Check out UDEF.com. These folks have the right idea. UDEF is a system for classifying and identifying data elements according to
their meaning.

Under UDEF, a data element is assigned a unique identifier based on its meaning. Applying this to the above IFX/OFX/XML example, “Ledger Balance Amount” could be assigned a fully-qualified UDEF
identifier, say U-g.9_13.11 (I didn’t say it was pretty). Since XML is eXtensible, within any schema an attribute “UDEF_ID” could be defined on any element. The value of UDEF_ID could be set to
U-g.9_13.11, for example, for any XML element that is equivalent to “Ledger Balance Amount”. A less-precisely-qualified “Balance Amount” element would take a less-well-qualified UDEF_ID value,
say U-g.9_13. True semantic convergence could begin to become a reality.

So, to summarize the XML landscape: progress in semantic content still needs to catch up with the pace of progress in syntactic form. The shape of the XML conference table and the initial agenda
have been proposed and generally agreed upon. Like spectators in the gallery at the Yalta conference, we’re watching the participants enter the negotiating room and take their seats. Substantive
talks are about to begin; the diplomatic language must be very precise.


submit to reddit

About William Lewis

Wiliam has more than 20 years’ experience delivering data-driven solutions to business challenges across the financial services, energy, healthcare, manufacturing, software and consulting industries. Bill has gained recognition as a thought leader and leading-edge practitioner in a broad range of data management and other IT disciplines including data modeling, data integration, business intelligence, meta data management, XML and XSLT, requirements structuring, automated software development tools and IT Architecture. Lewis is a Principal Consultant at EWSolutions, a GSA schedule and Chicago-headquartered strategic partner and systems integrator dedicated to providing companies and large government agencies with best-in-class business intelligence solutions using enterprise architecture, managed meta data environment, and data warehousing technologies. Visit http://www.ewsolutions.com/. William can be reached at wlewis@ewsolutions.com.