There are many database replication tools available on the market today, but not all of them are created equal, and how each of them might fit with your application will vary greatly. How to
compare these database replication tools often becomes more of a guessing game, especially to those who are new to the field.
This technical paper describes the principal features of database replication tools which a DBA or applications architect should take note of when comparing replication offerings.
The most often requested information about a database replication tool is what database platforms it supports: is it limited to a particular vendor’s database (eg. Microsoft, Oracle, Informix,
Sybase), or does it allow cross-platform replication, eg. between Microsoft SQL Server and Oracle Enterprise Server?
The ability to replicate across multiple platforms allows an organization to maximize its investment in its existing database installations, instead of having to spend additional money on
standardizing on a single platform. Not only can the cost of this standardization be prohibitive, but it can also mean purchasing new versions of applications which are database-dependent, or worse
yet it can mean rewriting the organization’s existing applications to allow them to work with the new standard database platform.
Most database vendors are not interested in providing your organization with maximum flexibility when it comes to your purchasing decisions; they want to sell you the most of their own software as
they possibly can. It is in their interest to restrict your organization’s use of competitors’ databases, so their replication technology is naturally going to require you to use only their own
databases. While some of these vendors now claim to provide cross-platform support, the fine print shows that such cross-platform capability is restricted in how data can flow, usually in just one
direction (in or out) of a competitor’s database but not both, or with data type, performance, or other limitations. In particular, beware “heterogeneous” replication tools that are based on
using gateways, because this method always restricts capabilities and performance.
Database platform restriction incurs the additional cost of preventing your organization from deploying the most appropriate database platform in a given situation if it is not from the same
A database vendor independent replication technology is likelier to provide your organization with the flexibility it needs, whether today or in the future, and allows your organization to limit
Replication Speed and Efficiency
Another commonly asked question is that of throughput: how much data can the replication software move between sites, and how quickly / efficiently will it do so? In order to best answer this
question, one has to examine several factors, including the underlying technology used by the replication software.
First-generation replication technology involves what are variously called “change capture” or “store and forward”, and “log based” methods.
These techniques require that the changes to data be captured by the replication tool, stored locally at each site, and forwarded to other sites at replication time. These techniques suffer from a
number of drawbacks:
- In order to identify (“capture”) data changes, a log-based replication tool relies entirely on the database’s transaction log. Log-based tools require a hub-and-spoke configuration, and they
often do not allow updates to occur at multiple databases (“update-anywhere”). The size of the log places an upper limit on the amount of time that can pass between replications (see below).
- Otherwise, in order to identify (“capture”) data changes, a middleware-based replication tool must sit between the application and the database and create its own form of log, intercepting
the command stream going from the application to the database, and storing each data modification command (ie. insert / update / delete) in a queue file. This means that your application must be
written to access its database only through the replication tool, which can limit the application to using only direct SQL commands (ie. no triggers or stored procedure calls would be allowed) and
to using specific development environments or languages supported by the replication tool. As well, this can result in performance bottlenecks, since your application must go through the
replication tool to access its data.
- If your organization has an existing application and would like to retrofit replication capability into it, then changing the way your application accesses its data or removing its ability to
make use of triggers and/or stored procedures can make the use of change capture or store & forward technology a prohibitively daunting task. Further, middleware-based change capture typically
cannot detect and correctly handle changes made by DBAs through interactive SQL consoles or by third-party software such as OLAP, reporting, backup, mining, and other database tools.
- The queue file containing change records (in the case of log-based methods, the database’s transaction log serves as the queue file) requires significant amounts of disk space, depending on
how quickly your database contents change. Often the amount of disk space used by the queue file must be set at design time, and while its size can be augmented manually, if the maximum capacity of
this queue file is exceeded, either your application must stop running, or replication between sites must stop, or both. Your application’s remote sites will be required to replicate at minimum,
regular intervals in order to avoid overflowing their queue files, and if they fail to do so, all sites will be forced to cease making data changes until all sites have replicated. Otherwise,
remote sites have to be erased and replaced with a new full copy of the database, which loses all changes made at the remote site since the hub’s logs/queues became full.
- The storage efficiency of these queue files is low, since every change to a given data record is stored, so that if a record were changed 100 times in a given hour, there would be 100 change
records stored in the queue file for that data record alone.
- Queue files and log files sometimes get corrupted, and in at least one vendor’s case there are known bugs regarding log file corruption. In one leading point of sale system for the automotive
industry, a pilot project involving the use of log-based replication has failed because of multi-day downtimes per log corruption incident, since each incident requires that the affected sites be
restored from tape backups, and all sites must cease replicating until the affected sites are repaired. The administrative overhead of maintaining such a system would make production deployment
- Queue file based replication tools can cause data discrepancies across replicated sites, since they transmit only relative changes to data and never look directly at the actual data stored in
each site’s database. These replication tools often require an occasional “synchronization” of replicated sites, which is a time consuming and complex process involving the use of snapshots
taken from tape backups. Since most organizations are tending towards minimal administration wherever possible, these site synchronizations will incur significant cost since they require DBA
involvement on a regular basis. These technologies are also particularly vulnerable to network outages that take place during a replication cycle, since they leave the target database in an
Modern (or “second-generation”) replication technologies involve the use of “direct to the database” methods which examine only the net data changes that have taken place since the last
replication cycle. These replication tools have the following advantages:
- By going direct to the database, these replication tools entirely bypass the need for large, inefficient (and easily corrupted) queue files. Less administrative overhead (hence cost) is
required, since no queue file sizing needs to be done, no special disk partitions have to be created to store these queue files, and no queue file size limits will ever get exceeded. All your
application’s sites can replicate on their own schedules without fear of causing a queue file overflow somewhere in the system.
- “Direct to the database” replication tools provide your application developers with total flexibility in terms of what development tools, languages, or environment they need to best achieve
your needs, because the replication tool uses its own independent data connection. Integration with other applications becomes a non-issue, since your developers have complete freedom to use any
standard or non-standard data access methods, including (but not limited to) C/C++/Java, CORBA/COM/RPC/ODBC, application servers in N-tier architectures, DBAs manually using interactive SQL
consoles, and any third party software including OLAP, reporting, backup, mining, and other database tools.
- There are no restrictions on the use of triggers, stored procedures, or interactive SQL consoles, and there are no performance bottlenecks requiring that the application access its data only
through the replication tool’s middleware.
- Since “direct to the database” replication tools examine actual data at replicated sites, they are inherently “auto-synchronizing” and do not require the time consuming and expensive
process of regular synchronization needed by first-generation tools. This also provides complete robustness since even a network interruption will not cause data corruption or require time
consuming restores from tape.
- Retrofitting replication capability into your existing applications is much faster, and hence lower cost, using “direct to the database” replication tools, since there is no need to rewrite
your applications using a restricted set of data access tools. If you are in the business of selling applications, this true “plug-in” approach significantly reduces time to market with a
distributed application, giving you an extra edge over the competition.
Database Schema Changes
Most replication tools available today require modifications to an organization’s database schema, usually to include an extra column in each table to be replicated. While this can easily be
accommodated for new applications being developed, it can mean a time consuming or expensive reengineering of an existing application to which replication capability is being added. Additionally,
if your organization is attempting to add replication to an application provided by a 3rd party vendor (eg. BAAN, SAP), the vendor may not allow schema changes and may no longer support your
organization if it went ahead with those database alterations.
Modern replication tools do not require any changes to existing tables, allowing your organization to integrate replication capability as a true “plug-in” solution with minimal or no
reengineering, and to add replication to a 3rd party vendor’s application without loss of support.
Ownership of Data
First-generation replication tools require that each piece of data to be replicated have a designated owner site (also called “publisher” or “master” site), from which other sites (called
“subscriber” or “slave” sites) obtain their updates. If a user at a subscriber or slave site wishes to modify a given data record, the update request must be sent to the publisher or master
site, where the change is made to the specified record, but it is only at the next replication cycle that the change is propagated to the subscriber or slave site. Not only can this delay can be
significant, making users at remote sites keenly aware of their status as a secondary site, but it also requires the application to manage these central site data changes differently from local
changes. In the case of existing applications in which replication is being retrofitted, this differentiation between local and remote changes often requires significant reengineering of the
Modern replication tools implement the “update-anywhere” (also called “true peer to peer”) replication paradigm, which allows all sites to insert, update, and delete records (unless restricted
by administrators). This provides local users with the same high performance as if they were using a non-replicated (or single site) system locally at head office, eliminating the “second-class
user” syndrome. Data updates take place at local speeds, and with modern tools’ continuous trickle replication the updates can be rapidly distributed across all sites. The flexibility of modern
update-anywhere replication tools removes design time restrictions on which users can perform what operations where, and any unnecessary distinctions between users at central versus remote sites.
This also means that the addition of this type of replication capability to an existing application will require less development effort and will be completed much more quickly and at lower overall
Supported Network Topologies
Another commonly asked question about replication tools is what network topologies they support: eg. hub-and-spoke, server-to-server, or some combination of the two. Hube and spoke topologies
consist of a central server (the hub) which replicates with all other sites (the spokes). Server-to-server topologies allow replication for common tasks such as mirroring for backup or redundancy
purposes. As we will see, it is not common for a replication tool to support both these kinds of topologies.
Most first-generation replication tools support either one or the other, but not both, because of the limitations imposed by their change capture- or queue file-based underlying technology. Only
modern replication tools allow both workstation and server replication in any combination, with automatic replication failover if some sites are temporarily or permanently unavailable, and
automatic replication load balancing with dynamic configuration as new sites are added to ensure optimal replication.
Data Transmission Efficiency and Security
As outlined above, first-generation replication tools make use of inefficient queue files which contain all the changes made to a given data record. Transmitting all these change records requires
significantly more network bandwidth (orders of magnitude greater, in some cases, depending on the application and on how regularly replication takes place) than that used by “direct to database”
replication tools which transmit only the net changes to a given record.
Besides offering efficient net-change operation, including at finer- than-record granularity, modern “direct to database” replication tools also automatically compress data transmissions (typical
business data volumes are reduced by a factor of 4 to 1) and encrypt them using industry-standard Triple-DES security to allow safe replication even across public communications networks such as
the Internet or wireless connections. The efficiency of these modern tools allows for replication across network connections that could otherwise be unusably slow, including dial-up modem or
cellular connections. In some parts of the world, the networking infrastructure of a geographical region precludes the use of first-generation replication tools.
Replication Over Public Networks and the Internet
A question that comes up is whether or not a given tool will allow for replication across public networks, the Internet being mentioned the most often of these types of networks. Some organizations
require the redundancy of being able to use their own internal network as well as the Internet in order to provide for failover capability.
The most common concern of these organizations is the vulnerability of their data as it moves across the Internet. Most replication tools vendors do not include encryption as an embedded component
of their offering, leaving it to the implementor to provide encryption as an add-on, or sometimes providing non-standard encryption technologies that leave the data transmission vulnerable to
There is nothing worse than the unfounded belief that your organization’s data is secure when in fact it is falling into the hands of your competitors. Since most organizations implementing
replication are distributing their most precious commodity (business data), it is in their interest to obtain the most secure transmissions channels available, which newer replication tools provide
using Triple-DES, government-approved standard encryption.
Another concern related to Internet is whether or not the replication tool can communicate through firewalls, which are necessary to protect the organization’s sites from intrusion. Most
replication tools are not Internet standards and leave the organization with the uncomfortable choice of removing their firewalls or abandoning replication completely.
Enterprise-class replication tools can operate securely through firewalls because they are registered as Internet standards and are associated with official Internet port numbers which can be
opened selectively by network administrators at each replicated site. This allows replication to take place through firewalls yet leaves the firewalls in place to protect the organization’s vital
Manual vs. Automatic Data Routing
Another important consideration is how the replication tool routes data to sites which need to receive updates; that is, how each site determines the path to be taken by its locally stored data
changes to other replicated sites.
Most first-generation replication tools require statically defined master/slave or publish/subscribe relationships, and support only fixed hub-and-spoke or server-to-server topologies. In such an
environment, the manner in which each site connects to others is fixed at installation time or requires manual administration. This includes which pairs of sites will connect with each other, and
in which order.
Changing this routing, for example when adding a new replicated site or to compensate for a shift in the relative data volumes at the replicate sites, requires manual administrator intervention.
Until this manual intervention is performed, overall systemwide replication performance will be degraded, manifesting either as a general slowdown in the distribution of data changes, or as system
downtime at the heavily loaded sites. If queue files overflow at the hub site, remote databases are typically not recoverable and will need to be recreated manually from the master site’s image,
causing data loss (losing any updates made since the previous replication).
In comparison, modern peer-to-peer replication tools dynamically determine the optimal route to take in order to distribute data changes to all interested sites using the most efficient paths
possible. This flexibility provided by dynamic routing allows administrators to add new sites to the system without any manual intervention: at the next replication cycle, these new sites negotiate
with existing ones and automatically add themselves to the network, allowing existing sites to take advantage of the additional connectivity provided by the new site.
A byproduct of this dynamic routing capability is that load balancing becomes an inherent property of the replication network. For example, in the case of a hub-and-spoke topology, instead of
becoming depending on the single point of failure represented by the hub site, an administrator can drop a new site next to the existing hub site, and at the next replication cycle these two hub
sites will automatically divide the connections to the spoke sites between themselves, with no need for manual reconfiguration of any routing data. Since these replication tools also take into
account overall network throughput to each site, and CPU performance at each site, the distribution of work across all sites is maximally efficient.
As mentioned above, organizations needing failover network capability can exploit this dynamic routing capability since these newer replication tools also detect the presence of available network
connections to each site, and choose the most appropriate connection (ie. highest available bandwidth and/or least busy partner) route to each site.
Granularity and Collision Handling
Many first-generation replication tools provide “record level” change control, ie. they detect and handle data changes at the record level.
Some first-generation replication tools go even further in providing “field level” change control, in that if they detect that a single field has changed in a given record, only that value is
transmitted to other sites.
Field-level change control may appear to increase replication efficiency by reducing transmission data volumes, but it usually also has the side effect of causing data corruption. This can occur
when, for example, two users at different sites update a customer’s address record simultaneously: one user modifies the city field to New York, and the other user modifies the ZIP code to reflect
a Boston ZIP code. The result of permitting both changes, as field-level replication does, is corruption of the customer’s address record.
Modern replication tools allow an administrator to define which sets of fields in a record should be updated together. In the above example, the administrator would specify that in the customer
table all the address fields (including street address, city, ZIP, and country) are defined to be a single “fragment”. Changes to any of the fields in a given fragment are replicated together,
guaranteeing consistency of data even if other collisions take place.
Copyright (c) 1999, PeerDirect Inc.