Published in TDAN.com July 2006
Most organizations are looking for better ways to share information, cut costs, increase responsiveness, improve data quality and reduce risk. With increasing demand to link disparate data systems
and categorize items uniformly, 21st century analysts and developers need an enterprise-wide map of what information is available, who is responsible for it and the detailed structure of the data
Metadata is a part of the solution, along with a generalized, centralized system to define, manage and share structural and taxonomic schemas. A schema describes the logical organization of any
information resource, including the structure and definition of data and metadata. Typical examples include:
- structured fields and tables in a database
- metadata fields and vocabularies used to tag unstructured content
A schema also defines the legal values, terminology and rules that constrain information in order to improve consistency and data integrity.
Because each situation starts with different requirements, people use somewhat different ways of defining data (structure, metadata, taxonomy, semantics, etc.), which hinder cross-system
information integration and retrieval. Unique and changing data definitions within “islands of information” clearly impede information flow across an enterprise.
Where does one currently find the “enterprise schema standards”? What are the various schemas in use? Who are the stakeholders and stewards of data? What technical, policy and political
problems might arise if I make changes to my data definitions or schema?
The process of defining and maintaining a centralized repository or registry of enterprise schemas is called Enterprise Schema Management (ESM). The justification for Enterprise Schema Management
begins with two key insights:
- Individual business units within a company have distinct ways of categorizing comparable actions and events, based on independent classification criteria, business processes, and related
- To connect applications and make information flow where it needs to go, people need a network-accessible schema management system specifying the schema and metadata definitions in use, who is
responsible for each item and interdependencies among systems.
Shared definitions of schema and metadata drive configuration, mapping and synchronization processes that ensure fast integration, continuous interoperability and frictionless information access.
With the increasing need for interoperability, information sharing and IT responsiveness, with – at best – flat headcount, efficiency is more important than ever. Doing things the same
old way will create no advantage.
Most enterprises can easily remove costs and improve availability of information using ESM. What’s more, most large companies are have duplicate efforts underway—and don’t even know it.
Perhaps more dangerous, much stored information is simply unavailable because it is organized inconsistently or stove-piped within the application it was created.
The fact that information is organized, structured and described differently in each information source makes interoperability, cross-system retrieval and information sharing a struggle. This even
after years of investment in extract, transform and load (ETL), data integration, application integration and federated search.
To answer the question “So what?”, lets consider the business value of Enterprise Schema Management. While many factors contribute to ROI, lets review five primary business imperatives:
- Requirement to share information. Information systems within corporations and across government agencies are notoriously bad at sharing information, as evidenced by the difficulty of
“connecting the dots” in intelligence matters or even obtaining accurate cost and revenue information at the customer or product level. Pick your metaphor: islands of information, data
silos or stovepipe applications—none of these describe systems architected to share information, but these are widely used to describe today’s systems. Creating a shared structure and
terminology simplifies integration, increases interoperability, improves navigation (common hierarchies and pull-down lists) and improves the search and retrieval process. Information sharing is
simplified by exposing the public interfaces, the structure and the semantics used by each of the interconnected systems within an enterprise schema management system. One emerging metric for IT is
“Efficiency of information distribution.” ESM helps improve this measure by enabling information flow between disparate applications, sharing among departments and information retrieval
- Need to decrease costs and boost efficiency. Cutting operational costs by increasing reuse and incorporating automated processes in a new way. Creating a registry or repository of a set of core
schema, which developers look within first, will cut development costs via reuse. Integration costs are reduced, because a common schema is used and, where standardization is not desirable,
relationships can be mapped between core schema and system specific methods. Maintenance and administration costs are reduces because approved changes to shared schema, such as a new product code
within a controlled vocabulary, can be automatically pushed out to subscribing systems, which eliminates the time-consuming job of changing perhaps dozens of individual systems by hand.
- Need for greater responsiveness. Business innovation is often slowed because of integration difficulties or simple backlogs within the IT team. Yet organizations are rewarded when they adapt
and react faster than their rivals. Whether the goal is information superiority in the war-fighting sense (business, too, is a war — to win a greater share of customer budget) or noticeably better
service to the customer (think Amazon.com) the fast analysis, approval and propagation of change can have a noticeable affect on business agility and delivery speed. When Amazon.com decided to
offer used products, along with products from many partner retailers (electronics, apparel, hardware…), the responsiveness of the IT team had a huge impact on the success of the business.
- Requirement for better data quality. Redundant and conflicting data, lack of standards and inconsistency are bad news in this era of increased accountability and governance. A hallmark of any
data stewardship program is the centralized, cross-system definition of who is responsible for what information at a very granular level. The formation of workable standards involves a blend of
reconciliation, consensus, diversity of viewpoint, shifting viewpoints and canonical references. A common, enterprise-wide schema repository and Web-based collaboration tools simplify this process
of stewardship, stakeholder analysis and quality improvement.
- Need to reduce risk and disruption. Uptime, the real-time enterprise and operational continuity depend on analysis of change and reduced dependence on what is inside the head of individuals.
When architects, information stewards, data managers and stakeholders can see and resolve the impact of proposed changes, the likelihood of disruption is greatly reduced. (If this is done in an
automated way, the pace of change is likewise accelerated.) The key is “no surprises”. Any Enterprise Schema Management system should include a process for impact analysis, alerting,
change management, consensus rules or voting, approvals and versioning. An important outcome of this process is the centralization and preservation of knowledge that is inevitably lost from staff
turnover, which reduces risk.
These business reasons can provide the financial motivation to move forward with ESM. This involves leadership, process and technology. This involves centralization, reconciliation, impact analysis
and change management.
A registry of what information is available, how it is used (structure and semantics) and stewards/stakeholders is remarkably absent from most organization.
Where does one currently find the “enterprise schema standards”? What are the various schemas in use? Who are the stakeholders and stewards of data? What technical and political
problems might arise if I make changes to my data definitions or schema? Without a reference point, how can one compare to the authoritative source? These questions are asked every day by people
working with schema and metadata during system integration and information retrieval projects.
The larger the organization, the more likely that communication gaps exist. The more likely that overlapping projects – even conflicting projects—are actively producing information that
goes unseen. The expense of duplicate effort is hidden from top management, but that doesn’t mean it is acceptable, given our endless quest to reduce cost.
Typical users of ESM include information architects, data managers, subject matter experts, software developers and application integrators. Key tasks include:
Import: gathering schema, including metadata definitions, taxonomy and vocabulary
Process: assignment of stewardship, stakeholders, definition of relationships, reconciliation of diverse viewpoints, analysis of impact of change, and change management
Export: making schema available via search and automated synchronization of subscribing systems
Reconciliation of Disparate Systems
Enterprise Schema Management is used to reconcile the relationships and different meanings used by various applications. The semantics must be described and understood by others, because the word
“State” could also be used be associated with concepts including “completed” or “restless”. “Cost” and “Price” could mean the same thing
or very different things within various systems that are supposed to share information or reduce redundancy.
This reconciliation or mediation process can result in consistency along a continuum. On one end is extreme standardization, in which all systems use the same hierarchy (structure, taxonomy) and
vocabulary (terminology.) This is generally impractical for two reasons: 1) disruption and 2) loss of information fidelity.
At the same time, total independence (developers creating self-serving, but perhaps arbitrary schema) sets up an environment in which redundancy and inconsistency add cost and interfere with
Experience tells us that support for varied viewpoints and shifting viewpoints is essential for effective information production and knowledge management.
Enterprise Schema Management does not command absolute conformance, because mappings and alternatives can be documented within the enterprise schema repository. Indeed, flexibility is an important
aspect in order to support varied viewpoints, which are sometimes important to maintain the fidelity of information. What is required is the registry and documentation of interdependencies, so that
when anyone makes a change interconnected systems are aware, hopefully before any disruptive impact.
Once the schema from interconnected (or potentially interconnected) systems are imported, reconciled, relationships mapped and stewards/stakeholders identified you have vastly simplified the
integration and retrieval process. However, requirements change, which may impact this now carefully constructed network of interrelated schema.
Given change, Enterprise Schema Management systems generally include some sort of impact analysis. Impact analysis can reveal which people and systems are affected, at a granular level, by any
potential change. The key, of course, is understanding and resolving impacts before the change occurs. This way, interdependency does not mean vulnerability.
Using various types of collaboration, consensus rules, voting, etc., impacted individuals can discover, resolve and approve changes. Using the Web and email, rather than requiring “yet
another meeting” accelerates the resolution process by enabling solutions and approvals to requiring everyone to get into one room all at the same time.
The process of Enterprise Schema Management essentially captures knowledge, to aid in information sharing. Capturing schema (structure and semantics) within an enterprise repository also minimized
the loss of information that employee turnover causes. (Ever been dependent on an individual because only they knew the internals of a system?)
Streamlining the process of change management via distance collaboration can chop weeks from integration schedules.
By creating an enterprise schema repository, you take the know-how from people’s heads and their personal workspace into a shared, clearly documented view of enterprise information. What
terminology and structure does each use? What are the relationships among distributed information assets? Are there gaps, overlaps or conflicts? This process is difficult enough when doing
point-to-point integration, but becomes impossible to reconcile and manage over time when the goal is integrating an evolving network of interconnected enterprise applications, database and content
XML Schema is not enough
While XML and XML Schema would seem to make things better, they also can fuel the flame of independent, potentially arbitrary, definitions and rework, when re-use and interoperability are the
XML Schemas include definitions for the legal names of elements, valid attributes and vocabulary, along with the specific hierarchical structure of an XML document. To be valid, an XML document
must confirm to the specifications or schema. An XML Schema also includes meta-data describing the purpose or meaning of the values. For example: when describing a location, “WA” is the
information (data value) and “State” is the meta-data.
Any valid XML document conforms to a defined structure or schema, which defines legal elements, attributes and vocabulary. Simply using XML Schemas does not ensure interoperability. While
self-describing and structurally consistent, one cannot be sure of the meaning or semantics represented by perfectly legal element and attributes. Information exchange requires a shared
understanding by source and receiver. Here are some quotes to illustrate the point:
“Communication cannot occur unless there is a shared context for communication…through semantic mediation, which now ranks as one of the most complex and important issue,”
according to IDC in a June 2002 report.
“Anyone building any kind of software these days needs to build in the capability to communicate not just physically, but semantically with the rest of the world,” wrote Esther Dyson in
Release 1.0, in February 2003.
In addition to the interoperability risks associated with unmanaged sprouting of XML Schema, there is the obvious waste of time when developers “re-invent the wheel” to define structure
and taxonomy that already exists or can be re-used. Removing this hidden cost of redundant development (and subsequent integration) provide a huge ROI for enterprise schema management. The META
Group wrote, in May 2002, “Most corporations…have not recognized that they are about to have a management problem, as the number of XML Schemas and DTDs (Document Type Definitions)
they must deal with grows out of control.”
A big motivation foe XML is reuse and standardization of artifacts and schemas, but without Enterprise Schema Management, reuse is hard to achieve. Perhaps worse, people may think XML enables
systems to interoperate, but one system may have a different use of terminology or different interpretation of the data being exchanged. Trouble starts if the semantic meaning of that data was not
understood or delineated within the data exchange process. Given that current specifications do not prescribe a semantic solution beyond providing narrative information within the WSDL file or
associated XML Schema, the schema repository has a key role to play.
To successfully integrate and share information from across the enterprise requires an enterprise perspective, which is not now – or at any point in the near future – going to be 100%
XML. While XML-based documents are increasingly pervasive within the modern enterprise, critical information is also stored and retrieved within relational databases. In this situation, schema is
used to define the columns and data types within the tables of a database. Legacy applications generally have their own templates not based on XML Schema. Some systems still use DTD, or Document
Type Definitions, used with SGML.
Given the heterogeneous nature of enterprise information, a cross-system, language-independent perspective is required when modeling enterprise schema. This is particularly true given the business
imperative to integrate structured and unstructured information from disparate sources. A generalized schema model is necessary to find overlaps, gaps and conflicts. Reuse and interoperability
require the ability to compare “like” from “unlike” and to establish relationships based on meaning and intent rather than syntax and structure.
Now that we have reviewed the driving forces for ESM, we can turn to key characteristics of any ESM solution:
Manages both “Structural” Schema and Taxonomic Schema (elements, classes & vocabularies, terms and vocabulary views)
Schema Standards and existing Schema editing tools tend to focus on either Structural Schema definitions (simple scalar and complex data element definitions, e.g. Dublin Core, ebXML) or taxonomic
schemas (controlled vocabularies such as UN Geography or Getty vocabularies). Complete specifications of enterprise schemas involve both. An integrated system must address the domain specification
and classification mechanisms of vocabularies and the overlying data definitions of all of the string, date, numeric and vocabulary-type elements and how they are combined into complex structures.
Respects the inevitability of diversity and heterogeneity within the standards management process. Despite the larger organizational need for structural data standards, for a variety of technical,
functional or cultural reasons, complete uniformity is not usually practical or desirable. Schema diversity is caused by a variety of things:
- Feature capabilities and data models of different products and systems
- Business function and goals of different groups or departments
- Technical barriers to change such as hard-coded database, user interface or reporting functions
- Business cultural or natural language differences
A worktable ESM system must allow for the management of disparate schema structures while promoting consistency and evolution toward an evolving enterprise standard, without squelching adoption
with an all-or-nothing mandate.
Implements guarantees to stakeholders that schema definition entities placed into the shared domain will be managed in such a way that their interests will be accounted for.
Stakeholders will be hesitant to adopt and support a standards methodology that does not ensure that their systems architectures will not be adversely affected. If the definition of elements,
vocabularies or schema structures which were previously locally maintained are modified without notice or approval and subsequently implemented or mandated in their systems, their business or
technical processes may break. An impact analysis, voting and consensus enforcing mechanism must be ensured as part of a well design ESM system so that the “schema donors” are assured
of continued shared control of the now-shared schema assets.
4) Culturally Responsive:
Allows change management processes to be customized and “tweaked” at all levels of the organizational tree.
Not all organizations are the same. Some have a very top-down management structure and others are collaborative to orchestrated chaos. The ESM system will influence the culture but should be
responsive to the natural business and social culture of the enterprise it serves. In some cases, organizations have hybrid, mosaics of top-down and bottom-up collaborative styles.
Schema Change Management processes, an essential part of the ESM should allow the appropriate style and rules to be established for the enterprise and its’ sub-divisions.
5) Granular/ Modular:
Implements schema asset management, change management and permission to a highly granular level to allow maximum reusability, distributed management with the appropriate level of organizational
security and control.
XML Schema standards such as DTD and XSD provide a useful baseline model for defining schemas, particularly for on-the-wire interchange of data packets. Regardless of which standard is agreed upon,
any mechanism that makes a complete schema document the atomic unit of management, ownership and workflow is critically limited in flexibility and reusability. The ideal ESM system manages, not a
list of versioned documents but a living network of referential schema definition objects, which can be reused, combined and managed in rich ways. Consequently, the power of object-oriented
inheritance, modularity, granular permission control, history logging, impact analysis and change control can be brought to bear to simplify the challenge of managing a complex and massive
6) Lifecycle / Evolutionary:
ESM Is a full-lifecycle system that not only sets the standard but makes it practical and workable to keep the standards current.
Establishing an enterprise schema is a daunting challenge that few organizations have yet achieved. But more difficult still may be the discipline of keeping the standard current through the large
and small changes necessitated by reorganizations, changing business needs, ongoing analysis and mergers and acquisitions. An ESM system should keep stakeholders engaged and involved in the
day-to-day process of tracking and resolution, by keeping the “noise” level of irrelevant distractions down, informing every appropriate stakeholder automatically when changes affect
them and enabling the most agile change process possible with a simple, repeatable online process that reduces unnecessary committee administrivia.
(vs strictly technical) Schema Standards should include not only technical structure definitions but human-readable labeling and descriptive information and appropriate validation and display tips
necessary to drive client interactive functions.
Enterprise Schema Management systems should be designed with the understanding that knowledge domain experts aren’t always programmers, but should still be full partners in the schema
management and development process. As such the terms and techniques for modeling schemas must be accessible to anyone who can administer a typical content management system or manage a taxonomy.
Furthermore, an EMS system should support globalized representations including language, date-time and currency conventions that are present in large, global organizations. Finally Schema is more
than machine-readable standards but standards for human-mediate data management and consumption. The Enterprise Schema must include the ability to manage labeling, descriptive and display-hint
information for schema definitions, which can be propagated across all impacted systems and languages so that each user in the information value chain accurately understands the meaning and intent
of the data they are viewing.
Schema Standards should be described in such a standard, detailed and consistent manner that the implementation and enforcement of those standards can usually be accomplished automatically through
a standard implementation infrastructure or methodology. Rather than collecting dust as a “study” an enterprise-wide schema or taxonomy map should be living and connected to impacted
The ideal ESM contains a detailed, consistent, thoroughly cross-referenced specification of the Enterprise Schema that is accessible online by all systems regardless of architecture or language.
Through a standardized API, process or infrastructure all systems under management by the ESM should be configurable through a repeatable, subscription process to those parts of the enterprise
schema that apply to that system.
Note: Systems that have no practical programmatic schema administration interface (e.g. one-off custom business applications) must be updated by developers to the specification implicit in the
schema components to which they subscribe and which the system stakeholders are involved in collaboratively maintaining.
Why is Enterprise Schema Management worth the trouble? Consider the goals: interoperability, information sharing and cost reduction. Aren’t these critical objectives for both IT and business
management? Indeed, consider what happens if you do not implement Enterprise Schema Management: information overload continues to get worse. Redundant data and redundant effort continue to drain
profit from your enterprise. And spending on information integration continues to consume perhaps 70% of your IT resources, handicapping your ability to respond to new business requirements.
Just one factor – the reuse of “core” schemas – provides tremendous cost justifications for ESM, by increasing programmer productivity, speeding the completion of projects
and ensuring interoperability.
When you add another aspect – the automatic propagation of agreed upon changes to all subscribing systems – it is easy to imagine proclaiming that you have discovered a way to decrease
spending, while improving information sharing.
Once you consider the schemas used across your enterprise as an information asset, critical to integration and untapped source of cost savings, you’ll discover that Enterprise Schema
Management is critically important to achieving the goals of IT team and business management.
Defining the structure of information using schemas is nothing new. But gathering together the schemas used among interconnected applications, in an enterprise schema repository, has become a
source of competitive advantage for pioneers in the field of information management.