A New Way of Thinking – October 2006

Published in TDAN.com October 2006

The concept of a “federated community” has started to pop up in a number of instances, and the nature of information sharing within (or across?) a federated community. And as more
organizations begin to explore ways to exploit free-flowing information streams (such as those enabled through services-oriented frameworks), the more important the notion of federation becomes. So
as an advocate for data quality management, I think that it is worthwhile for us to consider the special challenges that are introduced when attempting to define a data quality strategy for
interoperability within a federated community model.

First of all, what is a federated community? At the simplest level, it is a collection of participants (individuals or organizations), each of which under its own administrative domain and
governance, who agree to collaborate in some way that benefits the participants, both individually and as a community. These communities may cross organizational, political, geographic, and
jurisdictional boundaries. An easy way to identify the formation of a federation, at least one based on information sharing, is watching the development of data standards. The need for a standard
exists when two parties need to agree on a way to understand each other; the more parties that join in the activity is evidence that there is general agreement that there is benefit in
collaboration.

In addition, there may be external obligations and expectations with which each participant may be required to comply. Each participant may have different roles and obligations, and the level of
expected conformance to these obligations can vary from voluntary to various degrees of compliance.

Assessing the degree to which participants conform to best practices and the various implementations of best practices introduce interesting challenges, especially in the area of data quality
management. First of all, within an administered environment, policies regarding the quality of information can be defined and enforced, but as data leaves the organization boundary, so too does
the ability to control its quality. Second, the quality expectations for data used within a functional or operational activity within one organization may be insufficient for the needs of the
“extended enterprise.” Third, the existence of data outside the administrative domains suggests the notion of ownerless data, for which no one is necessarily accountable.

Consider this: while data quality cannot necessarily be mandated, the expected benefits of collaboration through data sharing can only be achieved when all participants willingly contribute to
successful data quality management. An important objective of the community is the development of a data quality framework is that encourages participants to willingly conform to and broaden the
integration of data quality. Here are some of the challenges:

Quantification and Measurement

By considering a definition of “quality data” as “fit for purposes,” we can infer that there are few objective metrics for measuring data quality, and that the quality of
data is dependent on business user expectations. Even within individual organizations, there often is no formalized approach for quantifying data quality. In a federated community, the absence of
formal methods for expressing ways to measure the quality of data adds additional complexity. Consistent criteria and measurements are required.

This introduces two kinds of challenges. One challenge lies in the need to express business expectations for data quality that support the various business needs of shared services within the
constraints imposed by the collaborative environment. The other challenge is specifying a universal set of dimensions of data quality for the entire community against which data quality performance
can be measured.

Politics and Organizational Behavior

While shared services provide the ability to access data materialized from numerous supplier sources, the actual components of this virtual record are likely to have been distributed across
multiple organizations, and may be located in different geographical areas and governmental jurisdictions. The challenge involves providing a framework for information sharing across
multi-jurisdictional domains while considering the differences in policies across those different jurisdictions. Since the policies for security, privacy, and management may be defined in different
ways by different jurisdictions, non-governmental boards/standards organizations, and private organizations, the framework must accommodate data quality management while maintaining conformance to
political and organizational policies.

Regulatory

Accompanying the individual political and organizational challenges are the regulatory policies and legislation that may govern the various geopolitical jurisdictions. While it is reasonable to
assume that there will be overarching policies regulating the use, sharing, and privacy of the data exchanged through the use of shared services, it is also likely that each jurisdiction’s
policies may have slight differences. The challenge is to provide a means of data quality policy management that maps the various policies into business and data rules for validating that the
policies are being consistently conformed to, while accommodating variant rules as information crosses jurisdictional borders.

Technical

Typically, the collaboration is an attempt to exploit the abilities of production or legacy systems, and layer service-oriented functionality on top of the existing applications. The combination of
existing application systems implemented across a distributed environment consisting of heterogeneous systems introduces a need to provide a data quality infrastructure that can accommodate
existing systems while ensuring alignment with future systems. Differences between hardware platforms, operating systems, data storage, and database management systems can introduce challenges in
consistency of the data and integration challenges for the conceptual consolidation of many data sources into a virtual “master data source.”

Operational

The ability to best use shared information services relies on the ability to provide accurate and current data from trusted data sources. To ensure that the quality of this data is maintained,
there is a need to establish a process for “qualifying,” as well as continuously monitoring these data sources and establish their trustworthiness for ensuring quality across the
relevant dimensions of data quality.

A lot of these issues boil down into some more fundamental questions regarding the definition of a governance framework for data quality management that is implementable (and doesn’t impose
too much of an impact on the participants), acceptable (i.e., maintains the level of data quality high enough for all to benefit), and operational (i.e., can provide continuous measures of
community-wide conformance to data quality standards). The trick lies in gaining acceptance for two ideas:

  1. Integrating data quality management and monitoring within the service-oriented architecture. The technical components supporting community-wide data quality management must also be provided as
    component services that can be embedded within the architecture.
  2. Providing a forum for balancing the differences in administrative and jurisdictional policies, certification, and participant requirements, to provide methods for managing and implementing the
    numerous policies that reflect how data quality expectations are validated. This implies establishing protocols for configuring and deploying the shared and collaborative management and measurement
    of data quality.

It is inevitable that data quality issues will erupt, and developing a governance program that is built on cross-organizational consensus will help in developing policies and protocols to address
those issues. And what I find funny is that, the more we look at the issues associated with federated communities, we will start to see (in an almost fractal way) similarities within our own
organizations’ boundaries – cross-division, cross-region, cross-department, cross-application, cross-database… Perhaps the federation concept might not be such a new one after
all.

Copyright © 2006 Knowledge Integrity, Inc.

Share

submit to reddit

About David Loshin

David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management: The Data Quality ApproachÊand Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at loshin@knowledge-integrity.com or at (301) 754-6350.

Editor's Note: More articles and resources are available in David's BeyeNETWORK Expert Channel. Be sure to visit today!

Top