|
Data Integration and Sharing - Part One
Published: October 1, 2003
Published in TDAN.com October 2003
We data modelers have a great passion about data. We understand (even enjoy) the process of creating data models. We love to see a business unfold before our eyes in the form of the data model. We often say, “Data doesn’t flow”. In the late 1970’s and early 1980’s, I too said, “Data doesn’t flow.” By 1984, I stopped saying this. Of course, data does not flow within a data model. It resides there. The reality is that data does flow and flows abundantly throughout an organization in the form of data movement. In most organizations, there is a huge amount of data movement. Reference systems pass data to transactional and analytical systems. Transaction systems pass data to one another. External data is absorbed into transaction and analytical systems. Warehouses pass data to marts. Warehouses and marts pass data for data mining. Some people even contend that there is more data moving throughout organizations than stored by them. This article, which is in two parts, addresses different ways in which data can move throughout an organization. It specifically focuses on methods for data sharing. Part I this month discusses messaging methods for data sharing. Part II, next month, will address data movement methods for data sharing. Data integration and sharing deals with the use of common data by multiple applications or the exchange of data across multiple applications. When multiple applications exchange data, in some form or other, messages are exchanged among applications. We can do this several ways. Some more flexible than others, and some more powerful than others. To understand integration models, first requires understanding a few simple, common integration concepts and terms. We define these concepts before going into the data sharing models. 1.0 Common Integration TermsMessaging refers to a mechanism for getting systems to interact via the passing of messages. A message is a single unit of communication encapsulating some information. It is the unit of data for sending data and values across applications. Messages may contain factual or status information about application objects or processes, or even instructions for the recipient. They consist of a header, containing message identification, and a body containing user-defined information. Depending on how the sending and receiving applications behave, they can be in different states and have different levels of coupling between them. Senders, receivers and messages themselves can have state. State is the description of the current situation of a component or object. It represents knowledge of the object. State is typically described in memory. State could describe the identity of the object or the progress of an object through different processes, such as an Order being in the state of Certified, In Process, and later Fulfilled. State could also describe the operations that a transaction can validly require. A stateful application is an application that retains state information in memory after a service or operation has been performed. A stateless application is an application that flushes state information from memory after a service or operation has been performed. Some integration methods are stateful, such as request/reply. Others are stateless, such as messaging (see below). Coupling has to do with how intimately components relate to other components. Tight coupling is a form of integration in which each component has knowledge of the other component. Thereby a change in one object will affect the other object. In loose coupling, one component does not have knowledge of the other and thereby is insulated from changes in the other. Some integration methods use tight coupling, such as a database link. Others use loose coupling, such as messaging (see below). Synchronization has to do with how extensively components cooperate in ensuring transactions are properly completed. Synchronous communication means that two or more separate objects or systems partake in a single unit of work. One is dependent on the other. The requester must wait until the service provider responds. The requestor resumes its execution after it receives the response. The entire unit of work from end to end is completed or nothing is completed. A typical form of synchronous communication is called request/reply (see below). Asynchronous communication means that the work is broken into separate parts. One component is not dependent on the other. The requestor does not have to wait for the remote process to complete, nor for a reply. In fact, the requestor can do other work while waiting for an answer. Two forms of asynchronous communication are queue-based and publish/subscribe (see below). For applications to exist in a loosely coupled, asynchronous relationship requires special software to make that happen. Message-Oriented Middleware (MOM) provides this. MOM is software that provides a common, reliable way for programs to create, send, receive, and read messages in a distributed environment. MOM ensures fast and reliable asynchronous electronic communication, guaranteed message delivery, receipt notification, and transaction control. MOM is probably the best way to ensure asynchronous, loose coupling. The basic unit of work on data is the transaction. A transaction is a logical construct through which applications perform work on shared resources, such as databases. A transaction is a complete unit of work, though it can involve multiple sub-units of work, which may even be performed on one or more systems. A transaction has four major characteristics, called its ACID properties, defined as follows:
2.0 Data Integration and Sharing ModelsWe will now take the above concepts and form them into the different integration models. Messaging systems can either be:
and can be classified into four interaction models that determine how messages are passed. :
2.1 Four Messaging ModelsThe following table summarizes these models:
ConversationIn this, application A and B exchange messages reciprocally and steadily, and state is maintained in both. This type of messaging is inappropriate for business applications and is usually reserved for lower level system and network functions. Request-ReplyRequest-Reply is used when an application sends a message and waits to receive a corresponding message in return. This is typically done in a remote procedure call. It is the standard synchronous object-messaging format. In Request-Reply, state is maintained in the calling application only. The called application is only acting as a server and only needs to know how to respond to an incoming message. Publish-SubscribeWhen multiple applications need to receive the same messages, Publish-Subscribe Messaging can be used. State is kept only in the subscribing application. The publishing application just sends messages. It is up to the Subscriber to keep track of where it is in the published messages. Multiple publishers can send messages to a topic, and all subscribers to that topic receive all the messages sent to it. This model is extremely useful when a group of applications wants to notify each other of a particular occurrence, such as new product data being available. In Publish-Subscribe Messaging, there may be multiple Senders and multiple Receivers. It is not necessary that the applications act as both—only that the solution supports both. For example, a reference data owner may want to send out notification for all subscribers regarding the arrival or availability of new version of an organizational hierarchy. The Subscribers can use this information to subscribe to and retrieve the necessary sets of this data. Point-To-PointPoint-To-Point Messaging is used when one or more senders need to send messages to a single receiver. However, this may or may not be a one-way relationship. An application in a messaging system may only send messages, only receive messages, or both send and receive messages. At the same time, another application can also send and/or receive messages. In the simplest case, one application is the sender of the message, and the other client is the receiver of the message. There are two basic types of Point-to-Point Messaging:
In Event Messaging, even though there may be multiple Senders of messages, there is only a single Receiver. For example, multiple departments may send messages to a Purchase Department requesting items to be purchased. These messages are only intended for Purchasing, and other applications will not receive them. 2.2 Synchronization ModelsHere is a summary of the two methods for synchronizing applications:
2.3 Synchronous IntegrationSynchronous communication follows the request-response model. An application initiates a request to another target application. The calling application then blocks its processing in the request invocation thread while it waits for a response from the called application. The application continues its execution after it receives the response. Typically, an application uses a remote procedure call to issue synchronous requests to the other application. For example, an application might define a remote procedure call to create an account receivable item in the database. The calling application invokes this remote function to create an account receivable item and waits until it receives a reply containing the results and response. This interaction is synchronous because the calling application's program waits in timing with the called application and continues when it gets the remote response. Synchronous interaction is applicable, for instance, where it is critical that multiple database updates are exactly synchronized. Synchronous interaction leads to tight coupling between applications. One should consider the implications of this when integrating applications within an organization. Synchronous interaction reveals three potential disadvantages:
Here are several scenarios.
2.4 Asynchronous IntegrationAsynchronous integration involves message-based communication across applications. An application sends a request to a target application. The sender continues its own processing, while the target application handles the request independently. The sender does not have to wait for the remote processing to complete nor for a reply to come back. Instead, the thread sends the message and continues processing client requests. When using asynchronous communication, applications are said to be loosely coupled. With loose coupling, an application can continue processing without interference from performance or communication aberrations. The requesting application is not bound to the responding application, nor to the communication delivery mechanism. 2.5 Comparing ApproachesWhen designing application, one needs to decide whether to use synchronous or asynchronous integration other applications. Both synchronous and asynchronous integration approaches are valid for application integration, and the choice should be based on the integration requirements and use cases. In deciding use the following guidelines
Publisher's Note ... Look for the Part Two of this article in the January issue of TDAN.com. Go to Current Issue | Go to Issue Archive Recent articles by Tom Haughey
Tom Haughey -
Tom is considered one of the four founding fathers of Information Engineering in America. He is currently President of InfoModel, Inc, training and consulting company specializing in practical and rapid development methods. His courses on data management, data warehousing, and software development have been delivered to Fortune 100 companies around the world. He has worked on the development of seven different CASE tools, over 40,000 copies of which have been sold to date. He was formerly Chief Technology Officer for the Pepsi Bottling Group and Enterprise Director of Data Warehousing for Pepsico. He was also formerly Vice President of Technology for Computer Systems Advisers, who market the CASE tools called POSE and SILVERRUN. He wrote his own CASE tool in 1984. He formerly worked for IBM for 17 years as a Senior Project Manager. He is an author of many articles on Data Management, Information Engineering and Data Warehousing. His book, Designing the Data Warehouse-The Real Deal will be published later this year. |