Published in TDAN.com April 2003
A lot has been written about XML in the past year or so. It is one of the most hyped, and least exciting new technologies in years. If you believe everything you read, XML is going to slay dragons,
leap from tall buildings, and enable your Grandma to write an order entry application. What nonsense! Sure, XML can be used as a format for exchanging data. The tags help to make the data somewhat
self-documenting. But it ain’t all that groundbreaking and it is dangerous to the health of your databases!
What is XML?
XML stands for eXtensible Markup Language. And that is another thing wrong with it – I hate acronyms that use letters from the middle of words! Like HTML, XML has its origins in SGML
(Standard Generalized Markup Language). Actually, both HTML and XML are subsets of SGML. HTML uses tags to describe how data appears on a page; XML uses tags to describe
what the data is, more or less. Well, actually, less. An XML tag is really nothing more than a name for the data that follows it. Which is helpful, but it is not a true
description of the data. Some people say that XML is like metadata – but that is only true if you don’t really know what metadata is.
So, XML retains a key SGML advantage of “self-description,” while avoiding the complexity of full-blown SGML. XML allows tags to be defined by users and those tags then name the data
elements in the XML document. This capability concentrates both the data and its structure into a single XML document. So, in essence, the document becomes somewhat self-describing.
The simple syntax of XML makes it easy to process by machine while remaining understandable to people. Once again, let’s use HTML as a metaphor to help us understand XML. HTML uses tags to
describe the appearance of data on a page. For example text would specify that the “text” data should appear in bold face. XML uses tags to document the data itself, instead of its
appearance. So, Mullins says that “Mullins” is a Last Name. So far, so good.
Fear and Loathing
Okay, so what is there to fear? XML looks like a harmless little language to help define data so it is easier to understand the data when it is transmitted all over the place. And everyone
transmits a lot of data, right? Well, first of all, XML may not be the best technique for sharing data if you are concerned about transmission speed and performance. By jamming all of those tags
into the data you can easily more than double the size of the data files being sent across the network. And the bigger the file, the slower the transmission time.
But that is the least of a DBA’s worries, in my humble opinion. Those XML documents are trying to worm their way into our databases. And that spells trouble. Why? There are many reasons, but the
biggest objection I have is the ad hoc capabilities it will give to non-DBAs.
There are two basic approaches for storing XML data in a SQL database:
- Break the XML document apart by reading the tags and storing each tagged element as a separate column in one or more tables;
- Store the entire XML document as a text field or character large object (CLOB).
The first approach is to be preferred. It is less troublesome to administer but more complicated to implement. It requires someone (the DBA?) to map each tagged element to an appropriate column
that already exists in a table. If done properly, at least the result will be a database that is maintainable and usable.
The second approach, I fear, is more popular. In this case the entire XML document is just stored as a big “clump” in a large column defined as text, varchar, or CLOB, depending on the
DBMS in use. In this case, what you wind up with is something that is useful only to a program that can parse the XML, read the tags, and “understand” the data in the big clump.
Furthermore, whenever the XML programmer wants to add more data – wham! – there it is. All that needs to be done is to add the tags to the XML document and then shove it into the
database. And, once again, that data is of no use to anyone who does not have the parsing program. So the database is holding data that is not really defined in any way other than as a giant clump
of text. Hello? Does anyone really view this as progress?
With this second approach a database becomes nothing more than a simple holding ground for XML documents. Querying the database is of no advantage unless you understand how to traverse the
hierarchical XML document. The metadata in the system catalog is of no particular use because it will just tell you “here we have a big clump of text.” It won’t tell you about all of
the tags – and that is the useful information.
Furthermore, XML offers little control over data type and length. So, you can create a tag called in the XML DTD and then create an XML document including something like this Fred. I don’t know
about you, but I sure as heck don’t want my account balance to be “Fred.”
So, what we have with this approach is a big text field where developers can arbitrarily add and remove data elements at will. And a storage mechanism with little integrity control over data type
and length. But wait, there’s more – that big text field is useless unless you can parse XML or use a tool that understands the XML.
Which brings us to a third approach. Now this third approach is the worst of them all – hard to believe, huh? The third approach stores XML documents in a new type of DBMS called an XML DBMS.
This new type of DBMS manipulates and reads XML as XML, without having to break it apart. In other words, the storage mechanism is XML. This is just stepping back to the days of the hierarchical
DBMS and all the potential pitfalls that come along with hierarchies. Don’t fall into that trap!
Before jumping on the XML bandwagon make sure you know what it is that you are jumping on. XML will not eradicate terrorism and make the world safe for our children. It is not the answer to
everything. Use it if you must to transfer data here and there. But keep it out of your databases if you can.