Managing an XML Data Model in Your SOA – Best Practices

Published in TDAN.com April 2005

The promise of Service-Oriented Architecture (SOA) is that it will enable greater inter-application and inter-enterprise processing by enabling us to integrate applications with a hitherto unseen
level of flexibility. This article argues that if the technology underpinning any part of the implementation threatens or compromises the flexibility promised by SOA, then SOA itself is at threat
and there can be little benefit gained.


SOA – The Business Case

Service-Oriented Architecture (SOA) is a conceptual approach to integrating applications in an IT landscape. The business case for SOA is financial. Simply stated, the value of linking applications
together is surpassing the value of building new ones.

Traditionally, linking applications together is an expensive, time-consuming exercise that results in an inherently fragile set of point-to-point integrations that cause ongoing reliability
problems and cost a great deal of money. SOA changes all that. SOA will enable levels of flexibility in application integration that can slash implementation times and costs, and enable far greater
expansion in the future.


SOA – The Principles

Importantly, SOA is an abstract concept-some would even go so far as to call it a philosophy. It is ‘Architecture’ and not ‘Design’-and architecture is all about principles, not implementation
details. Accordingly, SOA provides neither a blueprint for developers and integrators, nor a fixed set of technical standards and terminology. There is no off-the-shelf framework of components and
templates for fast-tracking your SOA implementation. Rather, SOA gives us a set of goals to aim for and some clear requirements to satisfy.

We test the success of any given SOA implementation against the principles of SOA. If any principles were to summarize the entire set, they would be these:

  • Services are Web services;
  • Services should be loosely coupled;
  • Services should be dynamic.

The benchmark for testing whether services are sufficiently loosely coupled is the level to which services are application-specific. If developers write services that can be application agnostic,
much of the battle has been won. The benchmark for testing whether services are sufficiently dynamic is the level to which they can be located ‘on the fly’ by consumers.

Any implementation that validates the principles of SOA can be said to be truly service-oriented.


SOA – In Practice

All SOA principles place fairly stringent demands on the eventual implementation designs, which is where we tend to see the abstract architectural concepts taking shape in repeatable, physical
implementations. The industry is agreeing on standard after standard, and software vendors are producing steadily more and more of the pieces of the technical solution, all of which serve to
influence what comes next. This emergence of numerous repeatable implementation recommendations and de facto standards is collectively defining our current understanding of SOA as a great deal more
than principles alone. The upshot is that the SOA terrain is much easier to chart and navigate for newcomers.


Integrate Data Models

While the services themselves may appear central to the implementation of SOA, they represent only the visible face of SOA. At the heart of any SOA implementation must be a data model. Furthermore,
the data model driving SOA must be an integrated data model that integrates all underlying data models and business processes.

Gartner advises organizations wishing to fully exploit service-oriented business applications to start by focusing on integrating the processes and underlying data models, rather than on
integrating individual application components. Failure to integrate these aspects will place the organization at a competitive disadvantage. Gartner believes that such metadata management is
“essential to reduce the escalating complexity of management and maintenance of integrated software platforms.” (Gartner, Inc., “Predicts 2004: Application Integration and Middleware,” December
19, 2003).

This is very sound advice. When exposing an application through services, the most cost-effective short term path in development is to write services that speak directly and specifically to the
application. Unfortunately, this means exposing the interface to the application component and little more. Services that access an application in this way provide the equivalent of RPC-style,
point-to-point integration-in other words, they are ‘tightly coupled’. The integrated data model is the essential ingredient in abstracting services away from application-specific
implementations.

The technical infrastructure that enables application agnosticism depends on a layered approach to SOA. Application-agnostic services are only possible if the SOA implements a data model that
properly represents and integrates the underlying data models in the enterprise at a layer of abstraction significantly higher than the interface layer. The layered approach is very important in
the interests of long term maintainability and evolution. At the heart of these layers, the integrated data model is the source of all data definitions and interface definitions needed by services,
but it is also the basis for resolving model-to-model mappings in the interface layer.

The layers that must be delineated are as follows:

6. Services.
5. Schemas constraining services.
4. Integrated data model.
3. Interface layer.
2. Underlying data models.
1. Application components.

The relationships between services (6) and application components (1) are described through these layers. The payloads of message-centric services are described by schemas in the schema layer (5).
Schemas are assembled from the integrated data model (4). The relationship between the payload schemas and the underlying data models is managed in the interface layer (3) through transformations
that are also built against the integrated data model. The model must therefore have knowledge both of the underlying data models (2) and of the payload schemas that have been assembled from the
model (5). For this reason, the integrated data model is sometimes referred to as the ‘interface model’, or ‘transformation model’.

An integrated data model is partly a collection of existing data models, and partly a new schematic representation of the data and processes in an enterprise. This metadata topography throws up a
number of interesting issues. Most obviously, perhaps, there are multiple layers of description. Application landscapes can be described by pieces in each layer. More expensively, however,
references to any given object can occur in multiple layers and contexts. From a programming perspective, multiple layers of metadata and application objects suggest a high level of duplication and
potential redundancy, especially when you consider maintenance and evolution.

This article does not discuss how to integrate data models. Suffice it to say that duplication and redundancy issues should be resolved without loss of semantic value, the model should be stored
somewhere in a repository, and the model is a real model that must be captured and described-it is not just a logical concept.

Crucial to a successful SOA implementation are the following points:

  • The language of description can be anything that is rich enough for the task (e.g. UML), but the expression of the data model for the services layer must be in XML Schema.
  • The integrated data model should be treated as an active master for ongoing maintenance and development, rather than a passive reflection.

These points are explained in more detail below.


XML and the Data Model

XML Schema is the appropriate expression of the integrated data model for services because, as introduced above, services in SOA are Web Services. The industry has overwhelmingly consolidated on
one common approach to SOA based on XML and Web services, using Web Service Definition Language (WSDL) to describe the service interface and marking up the document payload in XML. The XML document
payload is one part of the service that is described by an XML schema, either implicitly or explicitly. The XML schema is the metadata upon which the service is based as far as the data is
concerned, and the WSDL is the metadata that controls the service as far as the technical implementation of a process is concerned. All other protocols in a SOA for security, transport,
handshaking, and so on are XML-based. It is therefore highly recommended that metadata in a SOA is captured and made accessible at a high level in XML Schema format.


Active Metadata

Once we have an integrated data model, however, the metadata takes on an active role. Active metadata is business-driven; passive metadata is technology-driven. Prior to this stage, the role of
metadata was passive, because it served no further purpose than to describe and constrain data. It reflected existing systems and application components. Active metadata drives new development
effort from within the metadata. This subject is worth examining a little further, as we are witnessing a shift in the balance of power for metadata!


Metadata as a Passenger

When an enterprise works with a stable organization schema, developers and project managers need their metadata to be visible. They load their metadata into a repository for viewing and reporting
purposes. There are dozens of products that supply this sort of functionality. The ability to view metadata helps to eliminate redundancy, and enables project planning, enforcement of consistency,
a modicum of impact analysis, and so on. In this scenario, the metadata is a passive reflection of the underlying systems. The evolution of systems described by this metadata happens outside the
scope of the metadata repository. Refreshing the repository happens after the fact, as it were.

The metadata in this case is a passive passenger in this process. When some of this metadata is expressed as XML schemas and transformations which also need to evolve, the work is usually carried
out in single-user desktop tools. When multiple developers and consumers are involved, this quickly becomes an inflexible, manual process in which experts become bottlenecks, and a high risk of
error can occur.


Leveraged Metadata

When an enterprise works with standards that are shared with external trading partners-in which the relationships are known and trusted-we see a need to leverage the metadata in order to transform
internal models to external models. The requirements for a metadata repository are enhanced by the need to edit schemas and create transformations. This moves us into the realm of schema management
tools. Typically, schema management solutions are also single-user desktop systems that provide the ability to break up schemas into reusable fragments.

The drawback of schema management solutions is that the schema or fragment level of management is too coarse-grained for XML metadata evolution, which can result in very high duplication and poor
synchronization between multiple developers or teams. However, leveraging metadata in this way represents a middle stage in the transition from passive to active metadata, which is at least a step
in the right direction.


Metadata as a Driver

Once an enterprise starts to engage in collaborative e-commerce with unknown trading partners, the schemas controlling the interaction are often industry schemas outside the control of the
participating trading partners. The level of negotiation and ongoing integration effort is usually very high. The balance of power shifts from the metadata being a passive passenger in the IT
landscape to being the controlling, active driver of the landscape. It is unthinkable in such an environment that applications can change without first analyzing the situation from the perspective
of the schemas.

SOA lends itself beautifully to such complex situations. However, SOA is driven by fully integrated data models and processes, which means that the development and integration of services is
controlled primarily by editing objects in a metadata model. Now we are straying into the realms of model-driven evolution of XML metadata.

Driving change from the perspective of the metadata has very serious consequences for tooling and evolution management. From this point onwards, any changes to the way the business functions will
force developers to go to the metadata first-that is, the integrated data model-and make their changes there before changing code. This means that when we need to modify the payload of a service to
satisfy a changing business requirement, our starting point must be the externalized schema that describes the payload. (An externalized schema is a schema that is external to the service and
external to the WSDL describing the service, a highly recommended practice for lower maintenance).


Keeping Systems Synchronized

The most important reason for managing a wide area SOA through its metadata is that we can dictate a set of contracts to participating organisations, but we cannot dictate the tooling, the
architectures, the choice of programming language, and so on.

When a significant modification needs to be made in a SOA, the dependence on XML schemas and transformations means that single objects in a conceptual sense (elements, attributes, and so on) are
referred to or used in multiple places in the schemas constraining the services, and in the services themselves. This is because schemas themselves are not models, but expressions of a subset of a
model. A modification to a significant object in a modelling sense can easily affect all the schemas and transformations involved in the process, and force you to potentially modify and redeploy
all the services. Keeping systems synchronized can be quite a challenge.


Evolution Challenges

Evolution of complex XML schema-driven environments is always difficult because deployment environments are constructed from schemas, transformations, document instances, and so on-all of which
provide a container for multiple references to single objects, duplicated to the nth degree. The more complex a metadata-driven application environment becomes, the more difficult it is to chart
the dependencies between all the objects in the system. This is particularly true when multiple developers or teams are working on the system, and especially in a SOA.

Unfortunately, there is no cross-platform, guaranteed scientific way of charting all the places in a SOA where any given object has been referenced because developers do not have access to a
conceptual model of the metadata in a system insofar as it relates to XML schemas, WSDL and transformations. Developers cannot therefore see at a glance how the metadata has been implemented in
their services layer. Analyzing the impact of change in the metadata, and then safely making the modification across all affected objects, is therefore difficult to do elegantly and efficiently. It
is a largely manual process that can give rise to errors and knowledge bottlenecks. This is the Achilles’ heel of SOA.


Collaborative Development

In a multiple developer environment, developers typically check out a schema from a repository of some kind, work on the schema and check it back in again, and then other developers check it out,
and so on. Often the work is carried out in desktop environments using desktop tools such as XML schema editors. This works well for one schema if there is only one developer carrying out the
modification. For multiple schemas and multiple developers, it is a serious risk.

Registries provide multiple consumers with runtime access to metadata in deployed systems, and can vouchsafe the uniqueness of each schema. Development-time maintenance effort falls entirely
outside this scope, however. When two or more of the following factors are present, the development and maintenance of XML metadata in a SOA becomes exceptionally difficult to manage safely:

  • Change is a constant factor
  • Multiple teams collaborate on development and maintenance
  • Multiple divisions or enterprises collaborate on development or maintenance
  • There is a dependency on a shared industry standard
  • Multiple tools and architectures are used in the IT infrastructure.

The level of control is the important factor in determining how to manage metadata evolution. Objects such as schemas and transformations are too coarsely grained. No semantic understanding of the
actual contents can be supported when working at this level, which makes canonical comparisons impossible.


Essential Infrastructure for Managing an XML Data Model in a SOA

For collaborative development on the basis of a fully integrated data model, you need to apply strong version control and be able to treat every object as a true, single-source object. You need a
software ‘build’ mechanism around the parts of the integrated data model that are expressed in schemas. You need to shift away from the management of metadata at the schema level. You need a user
administration framework with roles, groups, permissions, and security. You need conflict resolution, transaction support, and automated generation of schema releases and associated files from
identifiable versions of your integrated data model. You should not need to manage metadata by editing schemas and transformations.

This brings us inevitably to a model-driven architecture for managing the integrated data model, in which the XML parts are managed in a true single source object model. Only then can metadata take
on the active role that will enable SOA to deliver the benefits of flexibility promised at the outset.

Share this post

Jim Gabriel

Jim Gabriel

Jim Gabriel is an inventor who has experienced a number of metadata evolution management problems. Works with London-based digitalML Ltd where he is responsible for the CortexML division of the company.  Jim can be reached at jgabriel@digitalml.com.

scroll to top