What’s the Difference between JSON and XML?

BLG02x - image - EDThere are two languages commonly used to express data, but they are very different, and are better and worse for various applications. Some folks get emotional about one or the other, and insist that one should be banned and that the other solves all problems. This is exactly like saying that a screwdriver is better than a hammer, or vice versa. It’s really a matter of using the right tool for the right job. So, let’s take a quick look at these two data languages, and understand what they’re meant for.

XML

The eXtensible Markup Language (XML) is used for exchanging human-readable documents in electronic form. We learn right from the name that XML is a markup language, which means that it provides tools for inserting annotations, called markup, into ordinary human-readable text. The annotations “mark” boundaries within the text, and tell us things about the text between the boundaries that might not be obvious—at least, not obvious to a machine. Another markup language in wide use is the HyperText Markup Language (HTML), where markup is used primarily to indicate how text should be rendered on a Web page for a human reader. In contrast, the proper use of XML is to indicate what the text means, so that separate specifications can indicate how text should be rendered based on what it means. This enables one set of text to be rendered differently for different audiences and viewing devices.

See below for a snippet of an XML document. The names enclosed in angle brackets are called tags, and constitute the markup of what is otherwise plain text. Most tags come in pairs with text between the start tag and end tag, and the whole construction is called an element. For example, in Figure 1 the plain text Chapter 1 is surrounded by the start tag <title> and the end tag </title>. Elements can nest. For example, the Chapter 1 title is nested inside a <chapter> element. The same <chapter> element also contains two <para> elements. The <chapter> element is nested inside the <book> element.

hills01

JSON

JavaScript Object Notation (JSON) is a very simple language for expressing data values. JSON is built around two kinds of composite structures: array and object. An array is a list of values that are distinguished solely by the order in which they appear, while an object is a set of name/value pairs that are distinguished solely by their names. Officially, a unit of JSON text is called simply that: a JSON text, although it is also informally called a document.

See below for an example of a JSON text. Text enclosed in curly braces expresses the value of an object, while text enclosed in square brackets expresses the value of an array. Within an object, each component has a name, followed by a colon, followed by a value. Within an array, nameless values follow each other in a list. The values may themselves be objects, however. For instance, within the phoneNumbers array, there are two nameless objects, each of which consists of two name/value pairs.

hills02

 

Confusion

JSON is often compared to XML as a more efficient language with the same expressive power. This is not accurate. The confusion has arisen because, before JSON was available, XML was used heavily as a data interchange language, even though its original design intent was that it be used as a markup language. As a data language, where an annotation’s position within human-readable text is irrelevant, XML is horribly inefficient because of all those end tags, where the element name is repeated with a slash in front of it. The result can be an XML document many times larger than the data it is carrying.

A JSON text might include human-readable text as data, but not marked-up text in the same sense as XML. It would be an odious task to adapt JSON for marking up text, because JSON does not preserve the order of name/value pairs in an object. In case you haven’t noticed, important order is in natural language. Just ask Yoda.

Both XML and JSON perpetuate terminological confusion by their use of the terms attribute (XML) and object (JSON).

In XML, an attribute is a compact way of associating a simple string value with an element, without that value being considered part of the element itself. But from a data-theoretic point of view, an element’s value is just as much a data attribute of the element it’s nested within, as an attribute is a data attribute of the element to which it applies. We would have preferred some other term than attribute.

What JSON calls an object is really a data structure. Properly speaking, an object is material, and occupies space. A computer object occupies space in a computer’s memory or storage. In contrast, a JSON object expresses a value that can be represented by the state of an object in a computer, or just by ink on paper.

These two overloads of the terms attribute and object help keep the entire computer industry from breaking down the barriers between data, semantics, and software. My book, NoSQL and SQL Data Modeling, explains how to escape this confusion.

Which Should I Use?

It really gets quite simple. For marking up human-readable text so a machine has access to fragments of meaning, and to prepare for rendering text in a variety of contexts, use XML. For exchanging data that is not to be embedded in human-readable text, use JSON.

You will find that there are robust ecosystems built around XML as a markup language and JSON as a data language. There are tools, analyzers, schemas for validation, user groups, and all sorts of resources to help you use each language in the best way possible. It’s not about which tool is better; it’s about which tool is fit for the purpose at hand.

This monthly blog talks about data architecture and data modeling topics, focusing especially, though not exclusively, on the non-traditional modeling needs of NoSQL databases. The modeling notation I use is the Concept and Object Modeling Notation, or COMN (pronounced “common”), and is fully described in my book, NoSQL and SQL Data Modeling (Technics Publications, 2016). See http://comn.dataversity.net/ for more information.

Copyright © 2017, Ted Hills

Share

submit to reddit

About Ted Hills

As an author, speaker, consultant, and data management executive, Ted Hills helps businesses get the most value out of their data. Both an advanced theorist and a committed pragmatist, with grounding in software and systems development, Ted’s book, NoSQL and SQL Data Modeling, promises to change how we represent data, moving from the rigid, prescriptive world of SQL databases to the more fluid domains of Big Data and NoSQL. Ted’s deep experience with large data projects in multiple industries and knowledge of new and established technologies give him perspective and insight into how an organization can maximize its existing investments while leveraging new technologies.

  • Richord1

    Great article. One of the major problems with data is the lack of Data Literacy. What I mean is data is seldom designed using basic constructs such as semantics, syntax and pragmatics.

    As a result we have a plethora of badly named and described data and significant ambiguity and duplication of data. XML provided some relief with a more structured and verbose description of the data. JSON is going backwards to cryptic and semantically obtuse data.

    As stated in the article, XML is universal and can be used across various mediums of transmission and rendering. XML imposes some degree of Data Literacy. JSON does not. We had terse data in the early days of computing – flat files. It appears we are going “back to the future” with JSON!

  • Timothy Cook

    The choice isn’t simply; “For exchanging data that is not to be embedded in human-readable text, use JSON.” One case is because the order of data items can and often is meaningful so your statement “because JSON does not preserve the order of name/value pairs in an object.” applies. Another is the lack of a robust and comprehensive validation approach in JSON. So, if your use case is well known and you do not plan to share information outside of your own eco-system, by all means use JSON.

Top