Subtyping in W3C XML Schema, Part 1

In 2001, the long-awaited W3C XML Schema language (XSD) introduced a typing system as a fundamental principle of its design. This is much more than just strong data types like dates and decimals,
which XSD calls simple types – although it certainly has an embarrassment of riches in that department. But of more profound significance is its concept of a complex type that defines a data
structure of elements and sub-elements, which can then be referenced by other XML elements as their content model.

This feature, with its ability to support extensions, provides powerful data inheritance capabilities that are a fairly natural fit for object-oriented languages such as Java and C#. Class
generators, like JAXB and XMLBeans, run against an XML schema and automatically produce a class hierarchy that supports basic CRUD operations for all the schema components before the programmer has
even opened her editor.

Yes, I am saying that the data modeler’s super-subtype concepts can be directly represented in the schema and the programmers will smile and thank you – well, depending on their personality
type.


1st Degree: Complex Type Reuse

This is the simplest form of data inheritance in XSD. Say our data modeler needs two subtypes of Address, a Home Address and a Business Address:

alt

Since their attribution is identical, some modelers might refer to these as roles rather than subtypes. Either way, the schema first defines a complex type for the address structure:

alt

Notice that the element tags just have a “ref” attribute, meaning they are just references to elements that have already been declared. Here they are:

alt

Now the address subtype elements can be declared and they both inherit the same content model by binding themselves to AddressType:

alt

The XMLSpy design view depicts BusinessAddress like this:

alt

The yellow box identifies a globally defined complex type and the diagonal arrow indicates that this is just a reference to a globally declared element.


Garden of Eden: The Key to Reuse

Just by following this first example you have already learned the preferred style of XML schema design for enterprise data exchange. It is called Garden of Eden where everything has a name, and it
was coined by Eve Maler of Sun Microsystems. The bottom line is that it requires both complex types and elements to be globally declared (i.e., they all appear at the top level in the schema). If
types are anonymously defined within an element declaration or if elements are locally declared within some nested structure, then they are unavailable for reuse elsewhere. So global declaration is
the key to flexible reuse of schema fragments at all levels of granularity because it:

  • Eliminates unnecessary multiple XML declarations for the same business data element, either intentionally or accidentally
  • Ensures the maximum potential for reuse and extension in all situations
  • Automatically ensures that any style of Web service will be able to reference the element or type it needs for the identification of its payload.

Therefore, establishing Garden of Eden as the required schema design style should probably be part of the XML reuse strategy of most organizations in order to prevent the construction of a new
Tower of Babel. Let’s hope you are in time because so often XML travels below the radar of formal metadata management. Do you know where your corporate data-in-motion was serialized today?


2nd Degree: Complex Type Extension

When subtypes have their own unique data attributes in addition to inheriting common ones, then they need to use an extended version of the original complex type. In this example, our supertype is
Instructor and here are its element and complex type:

alt

alt

Both subtypes have one extra element of their own. ProfessorType is based on InstructorType and extended by the addition of TenureDate:

alt

Similarly, LecturerType is also based on InstructorType but extended by the addition of the optional LoaningSchoolName:

alt

This is called derivation by extension. Note that the inherited elements are not repeated in the syntax, but the XMLSpy design view shows the base elements too, so you can see exactly what you are
getting:

alt


3rd Degree: Substitution Groups

Sometimes a data model defines relationships between a number of business concepts that have subtypes, and those subtypes can validly participate in the relationships defined for their supertypes.
In fact, keeping the relationships at the supertype level can be a great design strategy for avoiding the complexity of a rat’s nest of low level relationships. XSD can directly support this
with its substitution group feature. Take the simplest possible example of two related concepts: Schools and their Instructors:

alt

Schools can have one or more Instructors. So it would be valid to say that a College has one or more Lecturers or a University has one or more Professors, or a mixture of Lecturers and Professors.
The substitution group feature allows XML data instances to just say that, without any mention of Schools or Instructors.

To participate in a substitution group, an element must do two things. It has to identify its supertype, known as the head element, by using the substitutionGroup attribute:

alt

and it must also have the same complex type as the head element, or one derived from it. We already know that LecturerType and ProfessorType are derived by extension from InstructorType, so we are
good to go:

alt

In OO style, the head element may be declared abstract in order to force instance files to choose a subtype. So let’s do that:

alt

Now, how does your garden grow?

alt

The School schema allows us to create the following valid XML instance that contains a College that has a Lecturer and a Professor. We could have added more staff but you get the idea.

alt

A word of warning, though. This powerful abstraction capability is one of the more advanced and lesser used XSD features. It is well supported by validators and the major class generators, but Web
services push-button WSDL/SOAP generators tend to be less sophisticated. So put your development toolset and run-time environment through its paces before committing to this design pattern and
check with your data exchange partners too.

In Part 2 we will see how the substitution group technique can be used to extend an XML vocabulary in another namespace and therefore allow you and your data exchange partners to escape the
straitjacket of an industry standard vocabulary.

Share this post

Jack Lindsey

Jack Lindsey

Jack is principal consultant of InfoXman Inc.– specialists in data architecture, data modelling and XML information management. A Limey transplanted to Canada's snowy capital long, long ago, Jack stays warm in the Ottawa winter by using his Boston terriers as cushions.

scroll to top