Repository Directions – Part Two

Published in June 1999

Articles in this series – Part 1, Part 2


Much of the challenges and solution trends of the near past continue to prevail. These are aggravated by organizational changes and challenges posed by new technology especially the Internet and
new application development environments from vendors such as Microsoft. For example, the MSDN library for assisting developers in coding software systems is more than 1 gigabyte and releases
appear as frequently as one every three months. These new technologies are complex and tightly integrated with the delivery environment. With the consolidation of vendors and the emergence of
Windows as a dominant desktop and server platform, Microsoft proprietary technologies are beginning to become de facto standards. With Microsoft’s offerings extending to the operating systems,
graphical user interface (GUI) platform, relational database platforms, repository application development environments, compilers, programming languages, documentation tools, object modeling
tools, object browsers, object libraries, the word proprietary has acquired a meaning as the new standard. The UNIX development environment continues to be hamstrung by small unit volumes and high
unit prices for everything from database to repository engine to development environments to documentation and development tools.

A change in focus today is the realization that tremendous investment leverages can be obtained by making decisions at as high a row in the Zachman Framework as possible. For example, the decision
to invest in an information system can cause significantly more expense than lack of productivity at the construction or analysis and design levels. Every feature or function added at the Row 1
level to a information system can cause significant investment in its development at all succeeding rows. It is this realization that the enterprise architecture planning level provides the maximum
leverage that is driving current trends to define and represent Rows 1 and 2 of the Zachman Framework for information Systems Architecture.

Year 2000 Challenge Continues

The Year 2000 challenge continues to be addressed. Enterprises are investing in tools for detecting date problems and consultants advising them in restructuring their code. Investments are also
being made in testing tools that verify compliance for Year 2000 changeover. The immutable deadline of December 31 1999 has not moved out yet!

The challenge that occupies the collective mindset past the year 2000 is the conversion of European Economic Community (EEC) European currencies to the Euro, which is scheduled to start after the
year 2000. This involves refurbishing legacy systems to handle the migration to handling Euro currency conversions until the EEC has transitioned to the Euro as a single unit of currency.

Microsoft Repository 2.0

With the delivery of Microsoft Repository 2.0 as a free and integral part of Visual Studio 6.0, a significant capability for versioning repository data instances has been added. Microsoft’s
repository is also tightly integrated into the Microsoft Windows Registry for registering Class and Interface IDs. A new capability for workspace management has been provided, which allows users to
set the scope of the repository that they wish to work with and prevents unpleasant side effects of locks that are created due to version management schemes. Microsoft Repository 2.0 continues to
be offered on Microsoft SQLServer and the Microsoft Jet Engine. Platinum Technologies is continuing to port the Microsoft Repository and has stated it’s intentions to announce a product in the
first quarter of 1999.

Microsoft repository supports multiple Type Information Models (TIMS) and treats repository schema objects in the same manner as repository data instances. Microsoft Repository is accessed through
COM Interfaces. Interfaces can be versioned and hide the underlying Object Class, Property and Relationship Structure from the application programmer using the repository. The Repository also
supports multiple interfaces for the same set of classes and provides insulation of interface from repository changes to the tool builder.

Microsoft Repository 2.0 continues to be aimed at the tool builder. It is delivered with a repository browser that is tree oriented.

Unified Modeling Language OOA/D and Component Based Design

The UML from Rational Software is gaining support as a common implementation language for Object Oriented Analysis and Design. With the three-tier architecture of UML comprising User Services,
Business Services and Data Services, and the views: Logical, Component and Deployment, UML covers Rows 3,4,5 of the Zachman Framework. Microsoft offers Rational’s product, Microsoft Visual
Modeler, free as part of Visual Studio 6.

The Zachman Framework is a useful tool in analyzing where component-based analysis and design tools fit into the Information Systems Architecture. A significant amount of progress has been achieved
in the area of seamlessly integrating tools between rows. It is possible to generate significant amounts of code automatically from Row 4 tools. This code can then be used in Microsoft’s and other
software development environments in Row 5. Row 3 tools by nature require manual intervention to transform designs to Row 4.

The vision of reuse held out by object oriented technology is meeting the reality tests of actual software implementations. Without careful attention to requirements for reusable design, the level
of reuse previously envisioned have not been achieved except in the case of GUI components. The problem of defective components and quality controls for the acceptance of reusable components is
still a challenge. The insertion of architectural components into every potentially reusable object has caused significant bloat of object libraries. Microsoft is now offering a thinner version of
the library instead of using the Microsoft Foundation Classes (MFC).

A significant challenge is assessing which component to reuse. The challenge is akin to having several thousand parts that are identified by characteristics and must be retrieved by hunting and
pecking on characteristics. Private local registries and the need to register classes and interfaces in local registries compound challenges of registration. Microsoft’s COM and DCOM architectures
and techniques for resolution of distributed systems issues are still evolving in comparison with the more established and tested CORBA and IDL technology in the UNIX and OS/2 world.

Knowledge Management Systems

A new term that has entered the industry is the concept of Knowledge Management. The basis for knowledge management is recognizing that knowledge exists in a variety of formats and representations
inside an enterprise. A knowledge management system provides a framework for cataloging and classifying these items of knowledge and providing access mechanisms for interested users retrieving
them. With the widespread use of the internet and the WWW, Knowledge Management systems serve as classification, cataloging and retrieval schemes for information located as web pages, documents or
other electronic media around the enterprise – in servers, in workstations and on the mainframe. The classification scheme is considered a key component of the knowledge management system.
Some of these knowledge based systems offer the Zachman Framework both as a classification mechanism and as a user interface front end to access the individual items of the knowledge base.

Knowledge Management systems are concerned with classifying and providing access to electronic items that are deemed containers of some “knowledge” aspect. Some systems actually store these items
in a central “repository” and provide access to requestors from this source. As the keeper of requested information they are able to guarantee retrieval success. Few of these systems go to the
next step, which is to act as an authoritative source for the knowledge item and guarantee the quality of the retrieved item. To perform this they require sophisticated checkin and checkout schemes
for items, change management and versioning mechanisms.

Other Knowledge Management systems simply store a set of pointers to the sources of information and are seldom able to guarantee retrieval success. This is particularly true of web based systems
that manage pointers simply as a collection of URLs.

Enterprise Resource Planning (ERP)

As enterprises have concentrated on their core businesses they have walked away from application developments and maintenance that could be better provided by third party vendors. As a result a
number of packaged applications from third party vendors have entered the enterprise and have now been promoted to running mission critical functions. As a consequence of the enterprise level
information that they manage in their applications, the vendors of packages applications are also now moving to the next step – offering analysis tools that look into the enterprise data and
support decision support, trend analysis and forward planning activities.

ERP has a direct correlation with EAP, especially since the meta data of the packaged application is a logical extension of the Enterprise Data Model. Enterprises are still dealing with the
challenge of acquiring/incorporating packaged application schemas into their enterprise data architectures.

Data Warehouse/Data Marts

With the maturing of the Data Warehouse market, there has been both a consolidation of vendors (acquisitions and partnerships) and clarity of classifications for offerings. Dominant database
vendors such as Oracle are offering packaged data warehousing platforms. Modeling Tool vendors such as LogicWorks are offering dedicated data modeling tools for data warehouse schema development.
Other companies offer commodity products for extracting and refining legacy data for loading into the warehouse. As a result of this infusion of commodity technology the once difficult task of
putting the data warehouse together is becoming easier. At the same time, the data quality issues, the meta data management issues and the tasks of analyzing the information have not gone away.

The data catalog for the data warehouse is an integral part of an enterprise’s data architecture. The schema of the data warehouse though ultimately implemented in a relational database engine is
conceptually different from a logical data model. This distinction is in the same vein as the structure of an ER model being conceptually different (has more information) than the resulting
table-column implementation without referential integrity.

Object-Relational Extensions to RDBMS

Another recent trend is to support extensions to relational database engines to support complex datatypes (i.e. Oracle 8). Some of these extensions actually alter the metamodel of RDBMSs and
involve changes in the information models of dictionaries that have to support them. Fortunately the lag between new features available in a database engine and the reluctance of application
developers maintaining legacy applications from embracing them provides some breathing room.

Enterprise Architecture Planning (EAP) and EAP Product Management

With the revelation that enterprise driven planning could produce a much higher leverage than tweaking the software development processes, many organizations have embarked on an EAP exercise. One
of the popular methodologies is the EAP process advocated by Dr. Steven Spewak in his book “Enterprise Architecture Planning”. This methodology is a step by step walkthrough of the EAP process.

Because of the step by step nature of the EAP and the amount of process standardization that it involves, it is amenable to significant degrees of automation. The EAP involves copious amounts of
data entry that can be eliminated by researching and using electronic documents that many enterprises probably already have. The EAP exercise also involves conducting a number of interviews with
diverse organizational sub-units. An automated mechanism for rolling up the results of these interviews and playing them back to the interviewees for confirmation produces many benefits. Some of
these are increased levels of involvement and buy-in, and a higher degree of accuracy of information collected (because of the rapid feedback and correction loop). Other tools provide widespread
dissemination of EAP information over the Internet and the Intranet to the desktops of the personnel involved in the activities of Rows 1 and 2 of the Zachman Framework (the highest leverage

The EAP process produces many by-products. The most significant of which are data architectures, applications architectures, technology architectures, Information Resource Catalogue (IRC) and an
applications implementation strategy plan and schedule. These by-products are durable items that must be managed and maintained by the enterprise if the investment in the EAP exercise is to be
preserved and constantly put to work. In addition, the benefit of an EAP comes from promoting organizational coherence in Rows 1 and 2 of the Zachman Framework. Later, it become apparent that Rows
1 and 2 (and 3) will represent the primary areas of organization innovation and competitive advantage. Rows 4 and 5 will become a mechanism for rapid, reliable and cost effective applications
implementation based on looking at application development trends, increasing technological complexity and the projected composition of the application development workforce.

Data (and other Objects) Standardization

The data standardization efforts that were extensively enforced during the 1980s and the early 1990s by enterprises such as the DoD are showing their age. Most of the standardization efforts were
formulated in an era when the unit of standardization were the building blocks of physical database systems. With the onset of model driven application development in the early and mid 1990s,
developers are working with the data model as a unit rather than the individual pieces of the data model. As a result of this change, data standardization efforts have shifted to model
standardization. The DoD for example has formulated a single large model called the Defense Data Model (DDM) as a single authoritative model for all model developments in the DoD.

With the advent of object oriented analysis and design in the mid nineties, few standardization paradigms have been formulated or enforced for objects containing elements of data and process.

Many enterprises have completely thrown up their hands in the face of mergers and acquisitions. They have found it most trying to resolve data standards that are often conflicting from the diverse
merging organizations. A significant consolidation in the banking industry has posed tremendous standardization challenges. Mergers and acquisitions place significant demands for detecting overlaps
and detecting complementary data and processes during the planning process for a merger and the digestion phase after the merger. In such an environment enterprise standardization activities are
always ongoing and have a business focus that has measurable dollar results.

ANSI X11179 Data Element Registry

Another significant step in the area of data standardization in the mid 1990s was the formulation and adoption of the ANSI X11179 standard for data element registration. The ANSI 11179 standard
came from the X3L8 working group of ANSI and was charged with defining a standard classification scheme that could be used for data element registration. The resulting specification is important
not so much as a specification, than for the thinking and concepts that went into the specification. These concepts and thinking are very relevant to the repository area and elements of it are
discernible in most repository systems today.

The key concepts in the ANSI 11179 revolve around stewardship, the separation of meaning from representation and the concept of flexible classification schemes that can be applied to the same group
of underlying items. Every data related asset, every classification scheme, every formula and every composite data item needs to be registered formally as a part number and associated with a person
or organization that is the registrar.

Every data item has two parts – a part that relates to its meaning (data concept) and the part that relates to the way it is physically represented in a database system (value domain).
Meanings are defined in the context of data concepts and the allowed values of data items are determined by value domains. Value domains are inherent to the nature of data and need to be
standardized so that all users of data from the same value domains receive the same set of values. Value domains can be discrete sets of values or continuous ranges of values. Classification
schemes can universally classify any of these items based on diverse criteria of membership. Thus, the same registry can contain multiple classification schemes defined over the same base data
elements. This separation of the intrinsic definition of meaning and value components from the classification scheme provide a purity of storage coupled with a facility of seeing stored items
through familiar classifications.

Model Management

During the 1980s and the early 1990s the emphasis on data administration shifted to model administration as enterprises undertook a model driven approach to the analysis and design of their
databases. Some of the needs for model management that emerge are needs for model accountability, change management, detection of model differences, configuration management and the ability to
provide rapid “starter kits” for new developments that leverage parts of earlier models.

Technical solutions to the model management issues from the companies that developed the modeling tools themselves did not incorporate the model acceptance process, the model quality assurance
process, assigning of responsibility and stewardship and management of production models through an organizational process. In addition, enterprises wished to look at all of their architectural
models uniformly, not just the data models, and manage them through the same organizational processes. These included data models (logical and physical), process models, organization charts,
business plans, object models, communication and computer resources network models etc. In short, all the artifacts of information systems planning, development and deployment.

Where is the future going?

No one can really tell. The pace of technology innovation continues. The fusion of multimedia, computer processing, communications technology is placing demands on application delivery that is
relentless. The development of portable handheld computing platforms is pushing down application delivery to the palmtop and the use of the Internet and wireless communication is eliminating
physical communications media. The quantity of information has increased directly in proportion to the decrease in reliability and responsibility for the information. Enterprises are facing
increased expectations for openness and disclosure. Companies such as Microsoft are already working the issues of fusion between the platform and application development environments. They are also
embedding complex technologies such as communications, remote procedure calls, object oriented technology and delivering them in a manner invisible to the technicians working through the
development environment.

With the year 2000 rapidly approaching and the imminent liability issues that will arise out of any system failures, comes a need to clearly document the design of systems (blueprinting) both as a
defensive measure and a mechanism for effecting rapid corrections.

Other volume manufacturing industries such as the automobile industry has automated the lower rows of the Zachman Framework with a focus on technicians and trade skills at those levels. They
require relatively more expensive and costly personnel at the upper rows of design and analysis, product innovations, planning and market research. Information systems will also deploy a technician
level work force for Rows 4 and 5 of Zachman’s Framework and potentially Row 3. A confirmation of this direction is the emergence of certification programs for systems administration, software
development and individual development platform skills from major vendors such as Microsoft and others.

In this environment, taking a page from other industry’s books on lessons learned would be prudent. Standardized terms and definitions, standardized tools, standardized methodologies, standardized
performance measures, standardized productivity aids, standardized workflow, and standardized vocabularies are essential for a strategy that automates and reduces the costs of successful Rows 3,4,
and 5 development. In short, an engineering approach to the development of information systems becomes mandatory.

The extreme increase in complexity of the application development activity makes any other paradigm based on the use of generalists such as college graduates in computer science untenable both from
a cost perspective and a difficulty with the breadth and depth of coverage required. The current practice of handing down relatively unstructured directives for product development to generalists
with advanced education will have to be replaced with a clear command and control structure where preciseness and clarity of command to highly trained and skilled specialists is followed by
crispness and precise execution. Current business strategies based on the rapid development and deployment of information systems have been marred by unpredictable execution. The addition of an
engineering ingredient restores a high degree of precision and predictability to the execution phase of applications development while containing costs through highly skilled, trained and adaptive
specialists. Doing so will allow enterprise management to spend expensive resources on the formulation of business strategies and product planning.

As the imperatives to standardize the means of application product such as tools, methodologies, architectures, terms, definitions, and work processes increase, every organization that has
responded to the pressure will have a repository for the management of these items. For smaller organizations, this repository will be “rented” or leased out and may not even be physically
located on the premises. For larger enterprises these will be owned, managed and extended as the infrastructure for development evolves with changing technologies.

© Metadata Management Corporation, Ltd. 1999


submit to reddit

About Paula Pahos

Paula Pahos is the President of Metadata Management Corp., Ltd. Located near Washington D.C. in Vienna Virginia, Metadata Management Corporation (MMC) specializes in providing enterprise information management capability to organizations that value the strategic importance of information. MMC recognizes that organizations, in both the public and private sector, are changing. More than ever, this requires the information systems departments to be both reactive and proactive in their support of these changes.