|
Data Integration and Sharing, Part Two
Published: January 1, 2004
Published in TDAN.com January 2004
Publisher's Note ... Click here to read Part One of this article. 3.0 Data Movement Methods for Data Sharing and Integration Now let us discuss the actual methods or manners in which data is moved. These are specific implementation methods for implementing the data movement strategies we discussed above. 3.1 Data LayerThe standard client-server model divides architecture into three layers of Data, Function and Presentation. A Data Layer provides programming access to any database from any application. A Data Layer consists of two main parts:
The Data Layer should be used to gain access to data required by an application, without that application becoming DBMS dependent. It should not be used to access data in other applications, to access data from a reporting environment, nor to access reference data stored in other applications. 3.2 Database LinkA database link is a schema object in the local database that enables you to access objects on a remote database. For example, Oracle supports this. A database link connection allows local users to access data on a remote database. A database link can be used refer to tables and views on the remote database. For example, you can query a remote table or view with the SELECT statement. The remote database need not be an Oracle system. 3.2.1 One Way A database link is a one-way link in this sense:
If local users on database B want to access data on database A, then they must define a link that is stored in database B. 3.2.2 Why Use Database Links? The great advantage of database links is that they allow users to access another user's objects in a remote database so that they are bounded by the privilege set of the object's owner. In other words, a local user can access a link to a remote database without having to be a user on the remote database. For example, assume that employees submit expense reports to Accounts Payable (AP), and further suppose that a user using an AP application needs to retrieve information about employees from the HQ database. The AP users should be able to connect to the HQ database and execute a stored procedure in the remote HQ database that retrieves the desired information. The AP users should not need to be HQ database users to do their jobs; they should only be able to access HQ information in a controlled way as limited by the procedure. Database links allow you to grant to local users a limited access on remote databases. By using current user links, you can create centrally managed global users whose password information is hidden from both administrators and non-administrators. For example, A/P users can access the HQ database as SMYTH, but unlike fixed user links, Smyth’s credentials are not stored where database users can see them. 3.2.3 General Guidelines on DB Links
3.3 Extract-Transformation-Load (ETL) Extract-Transformation-Load (ETL) is the process of gathering data from source systems, enhancing and enriching that data, and then loading it into a target system. Generally speaking, ETL is the mainstay of data warehousing and reporting systems, though it is by no means restricted to them. ETL is ideally suited to data movement that has the following characteristics:
3.4 ReplicationReplication is the storing of a separate copy of data. Replication may use synchronous or asynchronous technology, though asynchronous is more common. Data can be complete or partial. A complete replica copies the entire source. Partial replicas can be horizontal or vertical. A horizontal replica is a copy of some rows, such as all customers in New Jersey only. A vertical replica is a copy of some columns, such as Customer ID, Customer Name, Customer Address and Customer Phone Number only. Replication has these advantages:
Replication has two main disadvantages.
Replication should be considered when:
3.5 Global ViewsGlobal views are a method implemented in some organizations. A view is a virtual table, constructed dynamically as the data is requested. The real data is stored in base tables. The view provides a dynamic in that its contents are selected each time the view is used. A view may combine some or all data from several base tables. A view only returns the specific columns contained in the view. It can thereby hide base data, which the developer does not want the user to see. Global views are predefined views that a client organization has defined for either security or performance purposes. Views have several uses and advantages:
Views have several disadvantages:
3.6 Guidelines for Application Development3.6.1 Transactional Systems Transactional Systems should be focused on data and processing. They should deal with operational requirements of the business process being supported and should not divert themselves into reporting or maintenance of reference data. Reporting should be minimized by limiting it to operational reports. Application reporting, also called Intra-Application Reporting, has the following additional characteristics
Transactional applications should also be autonomous in its usage of data. If other applications need the data, then the data should to be ‘pushed out’ or published for their usage. Their data retention requirements are different from those of reporting applications. They are as follows:
In terms of data sources, transactional applications obtain data from other sources in the following ways:
If synchronous sharing is required, the following methods should be evaluated:
Asynchronous is acceptable or needed:
When transactional systems need to provide data to other sources:
3.6.2 Analytical Reporting Applications A Reference Data Application should not be part of a reporting framework but may feed the reporting framework. A Reference Data Application has the following characteristics:
Analytical Reporting has the following characteristics:
SummaryAs we said at the beginning, data does flow within an organization. It flows frequently and plentifully throughout most organizations. We in data management, with our focus on integrated data, often lose sight of the realities of dealing with the large and complex issues in managing the vast amounts of data that drive the modern organization. Unfortunately, vast amounts of disparate data are often the byproduct of this phenomenon. Sound data management principles, implemented with sound messaging practices, can work together over time to provide order to this state. Go to Current Issue | Go to Issue Archive Recent articles by Tom Haughey
Tom Haughey -
Tom is considered one of the four founding fathers of Information Engineering in America. He is currently President of InfoModel, Inc, training and consulting company specializing in practical and rapid development methods. His courses on data management, data warehousing, and software development have been delivered to Fortune 100 companies around the world. He has worked on the development of seven different CASE tools, over 40,000 copies of which have been sold to date. He was formerly Chief Technology Officer for the Pepsi Bottling Group and Enterprise Director of Data Warehousing for Pepsico. He was also formerly Vice President of Technology for Computer Systems Advisers, who market the CASE tools called POSE and SILVERRUN. He wrote his own CASE tool in 1984. He formerly worked for IBM for 17 years as a Senior Project Manager. He is an author of many articles on Data Management, Information Engineering and Data Warehousing. His book, Designing the Data Warehouse-The Real Deal will be published later this year. |