Information sharing has consistently been a significant problem across government and industry. Once IT portfolios grew beyond a few simple applications the need to share and integrate information
across those applications has grown along with the increase in the number of applications. There are numerous reasons why information sharing is so complicated – from the way applications are built
to the how projects are funded.
Metadata has long played a key role in facilitating information sharing as a method for capturing knowledge about information and what it means. As the need for information sharing continues to
grow and evolve the role of metadata in solving those issues will become more critical. Effective application of metadata (including data governance and stewardship) also has the potential to help
transform how IT applications are built and managed to provide a much more transparent IT enterprise.
What is the “transparent enterprise”?
how information is manipulated/transformed/massaged end up being expressed as code somewhere. This has been the case for recent history, and new technologies like enterprise service bus and SOA are
simply transitioning the code to different places and languages. These technologies are certainly more efficient and easier to manage than their predecessors, but they still produce “opaque”
systems where business rules and information logic cannot be easily seen with moving behind the veil of application code. In the transparent enterprise these business rules and transformation logic
are managed in metadata (or created using metadata) so that it becomes easy to see how the information is managed.
- ETL tools in data warehousing have become extremely good at providing visibility into how information is transformed, in part due to the metadata requirements levied on warehousing projects.
Other areas should see the benefits in this transparency and use it as a pattern for their processes as well.
Mergers, acquisitions and portfolio consolidation are all driving enterprises to share information across divisions and stove-piped applications. Data warehousing is one successful and prevalent
technique to implement information sharing – extracting and transforming information from a variety of sources into actionable reports to help drive effective business decisions. The coming
challenges in information sharing are more real time and broader in scope. Portfolio consolidation, information integration (II), service oriented architecture (SOA), ERP systems and enterprise
search are all examples that will be driving information sharing.
Many enterprises (and especially the federal government) are in the process of assessing their current IT portfolios. In the government, years of disjointed applications and contractors intent on
holding operation and maintenance contracts have produced a cornucopia of applications with overlapping data, functions and competing interests. In order to get more value from the IT investment,
these applications are being evaluated to determine how applications and redundant data stores can be removed and where the functions will be reallocated. There is also a desire to move away from
these batches of custom solutions to leverage commercial off the shelf (COTS) applications to provide lower costs and more features. So there are a wide variety of applications that are targeted
for disposal and their functions subsumed by other applications. The process of consolidating an application requires that its information be extracted (shared) by the subsuming application.
Implementing COTS application portfolios requires information sharing across the various COTS tools to provide a seamless system for the users.
Enterprise Information Integration (EII) is another technique being applied by enterprises to provide an enterprise view of information. The objectives for EII are often very similar to those in
data warehousing and EII has sometimes been referred to as real-time data warehousing. There are a number of tool vendors in this space (such as Unicorn and MetaMatrix), and they typically rely
heavily on metadata as part of their implementations.
Perhaps one of the latest technological buzzwords is SOA. Almost every enterprise and government agency is working to implement some form of SOA, with the DOD even publishing their Net-Centric data
strategy outlining their vision for SOA in the DOD. It is a certainty that metadata will be required for successful SOA implementations, and its role is explicitly called out in the DOD net-centric
A practical impact of a SOA implementation is that information formerly locked in difficult to access applications will become easily accessible by a wide audience. Without metadata to provide
context and semantics for this information, the probability of misuse and data quality problems will be high.
The government is also looking at federated searching to help solve numerous communication issues. Information silos can be opened up across agencies with federated search allowing more effective
decisions to be made – especially in the areas of homeland security. Metadata is again a core component of these processes, and the DOD has defined the Defense Discovery Metadata Standard (DDMS) to
be used as at the core of the technology.
Information sharing will be driven by all of these technologies in the coming years, and metadata will be an essential component. There are still significant barriers to overcome – some of the
issues include semantic mismatches, data governance and stewardship issues and data quality issues.
The semantics of information is one of the most common problems in information sharing – what does Quantity On Hand really mean in an inventory system? This will have a significant impact when
trying to create a consolidated inventory system if one system has the meaning “the number of items in the warehouse” and in another system it means “the number of items available for
ordering”. As information becomes more broadly shared these semantic deltas will also have a negative impact on data quality.
The cost for ignoring information sharing issues is also significant. There are numerous examples of man hours wasted tracking down why report A doesn’t agree with report B, or why did the data on
report C suddenly change? In the government world contracting dollars are spent over and over again to determine the same information. Imagine ten projects to interface with a given system – how
much repetitive research is done because the precise semantic meaning of the information in that system is not well documented and easily available? There are also significant issues with data
governance and stewardship. Without clear governance and stewardship, organizations will continue to be plagued by information silos, unclear authoritative sources, data quality issues and a lack
Metadata can and should play a significant role in facilitating information sharing. The DOD Net-centric data strategy requires that information be shared in the global information grid along with
its associated metadata. There are also a number of metadata standards which can be leveraged to help create effective metadata solutions.
The Common Warehouse Model (CWM) is an OMG standard which covers many areas of metadata including relational systems, XML, transformation (ETL), OLAP, data warehousing and more. It was formed in
2000 as the combination of the Open Information Model (OIM) and the initial OMG CWM. It has broad coverage across most technical metadata and there is also a related XML interchange format (XMI).
ISO 11179 is an international standard for data element registries which includes one of the most refined data element models. While it is not perfect it embodies some excellent ideas about
capturing information about data elements and how they interrelate. Data elements have been the focus of many metadata efforts over the years (how many data dictionary projects has the average
organization been through?) and the ISO 11179 data element model is one that every serious metadata practitioner should take some time to understand (Note that this is not intended to imply that
the rest of the model is not worth understanding, just that there is so much benefit to learning and understanding the concepts in the data element model). Many information sharing issues relate to
problems at the atomic level (i.e., the data element) and the concepts in ISO 11179 can provide a way to capture information aimed at preventing those issues.
The federal government has just adopted the Federal Enterprise Architecture Data Reference Model (FEA DRM), which is an abstract model to help organizations determine what information needs to be
captured from a data perspective to help foster better information sharing.
There are also a number of models for capturing taxonomies, ontologies and concept systems. These also play an important role in many metadata strategies and are also prevalent buzzwords. They too
can help with information sharing issues by providing a common point of reference for multiple systems and organizations.
All of these issues and technologies come into play as organizations move towards more interoperable systems (or systems that share information effectively). Interoperability is also evolving –
most systems of today interoperate via many point to point interfaces. This is an inherently inefficient architecture requiring 2N interfaces for N systems. Systems are currently moving toward
architectures like Enterprise Service Bus (ESB), where each system only interfaces with the bus and information is routed through the bus to appropriate destinations. This is much more efficient,
only requiring N interfaces (one for each system to the bus) if a common model is implemented (the common model ensures that each system only has one view to convert to/from). The issue is that it
still doesn’t completely address many information sharing issues such as authoritative sources, common conversions and reuse. Since each system still has its own conversion code it is difficult to
ensure that common lookup tables are used, conversion logic is identical everywhere, etc.
The real opportunity in information sharing is to move towards the “transparent enterprise”. These enterprises use the metadata to drive the conversion of information from system to system
(usually using a common model approach). This provides visibility into the conversion logic, lookup tables, etc across the enterprise. It makes the process of managing and maintaining the
information flows more efficient and it also provides visibility into how information moves throughout the enterprise. This is a tremendous advantage over current architectures and helps to solve
many of the issues discussed earlier and it can also reduce the effort required to manage data governance and stewardship initiatives.
There is an opportunity for data management professionals to provide a comprehensive and visionary approach to information sharing using metadata. Organizations invest in metadata as part of every
project – leveraging these approaches and technologies can provide a significant enterprise value add versus the current repetitive expenditures. By understanding and espousing the benefits of
using metadata to create a transparent enterprise, the data management community can hopefully shape the direction of system development in the future.