Making the SOA Implementation Successful

 

Published in TDAN.com October 2006

 

First the organization built its systems. Then the potential of the Internet and the web was discovered. And quickly the issue arose – how do I interface my organizational systems to the web?

Answering this question is SOA – service oriented architecture. There is a wealth of information in the systems the organization has built. And to unlock that wealth and expose it to the Internet
fulfills the potential of the Web and online processing.

The key to making the connection between the organization’s systems and the effective usage of those systems is the SOA, as seen in Fig 1.

 

 

Certainly the ability to access data for usage on the web is important. But there are other issues that relate to the successful usage of the web for the access of organizational data and at the
top of the list is the issue of performance. Performance refers to the amount of time that it takes to go and find the data in the organization’s systems. A response time of 2 to 3 seconds is the
expectation. A performance time from 10 to 20 seconds is a possibility if the end user is prepared to wait that long. But a response time of greater than 60 seconds is unacceptable even when the
end user has been conditioned for a long wait. When the end user has to wait a lengthy period of time, the end user simply goes elsewhere. The end user goes and does another query. The end user
turns off the machine. The end user goes to lunch, or some other distraction. Simply stated, under normal circumstances the end user demands good performance through the SOA interface.

Fig 2 shows the end user demand for good performance.

 

 

What is required to give the performance that is being demanded through the SOA? The answer is – both the query and the data beneath the query need to be carefully arranged. If the end user wants
to analyze every transaction at the detailed level for the past five years, there will be no fast response time. If the end user wants to look for data from twenty different places using a matching
criteria for each type of data being sought, there will not be fast response time. In a word, with even the most elegant SOA interface, if the data underneath the SOA is not carefully arranged and
if the request for data has not been properly made, then the SOA will achieve very poor response time. Merely having an SOA does not mean that there will be good response time.

But there are other issues relating to the data underlying the SOA. Data underneath the SOA needs to be integrated as well. As an example of the need for integration, suppose SOA goes and looks at
revenue form three data bases. The data is retrieved and returned back to the SOA interface. The reply given to the SOA interface is $4,977,330.76. This seems to be an impressive number.

But looking closely at the data bases that have been retrieved, one data base stores its dollars in US currency, the next data base stores its revenues in Canadian dollars, and the third data base
stores its revenues in Australian dollars. Adding Us dollars, Canadian dollars, and Australian dollars together makes no sense whatsoever. In order for the data to be useful it must be integrated
before the SOA has a chance to access the data.

Unfortunately the issue of integration goes much deeper than the mere conversion of currency. Integration touches almost every aspect of the system – from the structure of the key values to the
definition of the data. The fact that the SOA can access information is delightful only as long as the data has first been integrated.

Fig 3 shows the need for integration of data BEFORE the SOA is effective.

 

 

Another interesting issue is that of the need to access historical data. Traditionally SOA accesses applications. And traditionally applications do not have any degree of historical data attached
to them. In order to get good performance at the application level, historical data is jettisoned. The problem is that if historical data is needed it is not found in the applications that sit
beneath the SOA. In order to make the SOA effective then it is necessary to have a store of historical data, and that store of historical data is not found in applications.

Fig 4 shows the need for historical data sitting beneath the SOA.

 

 

Yet another issue is that of consistency of data. One SOA process access data base A, B, and C and arrives at a value of $4,760,669. Another SOA process sets out to solve the same or very similar
analysis and uses data bases D, E, and F. The second SOA analysis arrives at a conclusion of $5,339,817. Unfortunately the values are not reconcilable. When management uses these numbers for the
purposes of decision making it is a dart board because there are conflicting numbers but no confidence in those numbers. SOA has made data available on the web, but SOA has done nothing to make
those numbers believable.

Fig 5 shows the need for confidence in numbers and the fact that the SOA does nothing for the believability of the numbers.

 

 

And there are even more issues. Suppose a new analysis or query needs to be done through SOA. Where does the analyst go to find the information that is needed to satisfy the analysis? The answer is
that in a large complex organization, there needs to be a repository of metadata available that is needed for the searching of data. Otherwise it is difficult to find the right data for the
analysis that is needed.

Fig 6 shows the need for metadata in support of SOA processing.

 

 

A final last problem is that when it comes time to build a new SOA transaction, is there any foundation that can be used from previous SOA activities? In other words, when I get ready to build an
SOA program ABC, can I reuse some or all of previously built SOA transactions? The answer is no. Unless careful care is taken with the underlying infrastructure, each new SOA program or transaction
must build its own infrastructure from scratch.

Fig 7 shows the need for building each SOA program or activity independently or from scratch.

 

 

It is seen then that while SOA is a fine way to link the web and organizational data, that the SOA does not address major issues such as –

 

  • the performance of the SOA as it executes
  • the integration of data that the SOA operates on,
  • the need to occasionally access historical data,
  • the need for reconcilability of the results of SOA processing,
  • the need for metadata to point to existing data
  • the need for a foundation that can be built upon so that there is reuse of both code and data.

 

In a word there needs to be an infrastructure on which to build SOA processing on.

There are indeed different infrastructures for organizational systems. Fig 8 shows some of the classical infrastructures which exist.

 

 

On the left hand side of Fig 8 is the stove pipe legacy environment. There is massive redundancy, limited historical data, very limited integration in the legacy stovepipe environment. In addition
because data resides in many places it is slow to access. And there is usually very little metadata in the legacy, stovepipe environment. Trying to use SOA on top of the legacy, stovepipe
environment is an exercise in frustration and wasted resources.

On the right hand side of Fig 8 is classical “information factory” architecture. The information factory architecture exists in the form of a corporate information factory (CIF) or it cousin, the
government information factory (GIF). In the CIF or GIF there is metadata, there is historical data, there is integrated data, there is data that is separated into sectors based on the need for
performance, there is granular data that can be used as a basis for many ways for looking at data without losing reconcilability, and so forth. In a word, for all of the reasons mentioned above,
the CIF or GIF forms the IDEAL foundation for SOA processing.

 

 

Founded in Colorado by Bill Inmon, Guy Hildebrand and Dan Meers, Inmon Data Systems (IDS) is a software company dedicated to the proposition that there needs to be a bridge between the worlds
of structured data and unstructured data. IDS has foundation technology that allows unstructured data to be brought into the structured environment and once there, integrated into the structured
environment.

Applications – unstructured visualization (with Compudigm)

  • Enterprise metadata consolidation
  • CRM enhancement
  • Communication compliance
  • Email and unstructured indexing for bulk storage

IDS is located in Castle Rock, Colorado.

Further information about architecture can be found on www.inmoncif.com.

Share

submit to reddit

About Bill Inmon

Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

Top