Proven Strategies for Large-Scale Data Migration Projects

As one of the core data management activities, data migration has been practiced ever since the invention of computers. However, it can be the most neglected task on IT managers’ lists of
things to do, resulting in poor quality data in the target system. The observation is not new, but is commonly seen throughout the industry. It is estimated that 84% of data migration projects
fail1. The impact of data migration project failure can be numerous ranging from:

  • Breakdown of target systems
  • Poor data quality in the target environment
  • Loss of business opportunity
  • Cost overruns, etc.

What is Data Migration?
The term “data migration” is used in several contexts for data movement activities. Let’s look at the definition of data migration:

Data migration is the process of transferring data between storage types, formats or computer systems. Data migration is usually performed programmatically to
achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems
merge (such as when the organizations that use them undergo a merger/takeover).
     (Source: Wikipedia)

This article will address the large-scale data migration projects, where data is to be moved from source (old) system(s) to the target (new) system(s) on a one-time basis, usually as a result of
application or technology upgrade initiative.

The business objective of a data migration project is to move the data set of interest from the source system to the target system, while improving data quality and
maintaining business continuity.

Proven Strategies
In this article we discuss the proven strategies for executing large-scale data migration projects. The list has been compiled over years, after working on numerous large-scale critical data
migration projects. Each strategy can be adopted with some level of customization, as per an individual organization’s needs.

Strategy 1: Invest in Profiling Source Data

Source data is the starting point for any data migration effort. Understanding characteristics of source data is paramount for the success of the data migration project for several reasons –
to uncover undocumented data relationships, data quality, data volume, data anomalies, etc. Data profiling essentially provides x-ray vision of the source data sets, which helps to understand the
strengths and weaknesses of the data sets. The investment made will have direct impact on the effectiveness of downstream processes and software code components. Also, it is important to define the
scope of the data profiling exercise up front to avoid any overspending on this task. Basic data profiling can be performed by developing scripts; however, for highly complex and large data sets,
using an industrial strength data profiling tool is worth the investment.

Strategy 2: Create a Data Migration Process Model

Data migration can be as simple as single step with just one source and one target system, or it can be a highly complex process involving multiple source systems, multiple steps and multiple
target systems. Create an elaborate process model depicting every step of the migration process. The artifact serves as the road map for moving data, as well as an agreement among the stakeholders
involved. The process model also serves as input to the downstream administration, configuration management and software development processes. The process model should have interim steps to
validate volume and quality of data that is flowing through the process. By having the embedded checkpoints, data analysts can make sure the exceptions are within accepted limits and there are no
hidden surprises.

Strategy 3: Define Roles and Responsibilities Up Front

Data migration can be a complex and daunting project involving several stakeholders and IT task managers/team leaders. A formal handshake at every critical step of the data migration process is
imperative. The architects and the project manager should identify all possible roles and assign responsibilities to the roles as part of project planning. The project manager then should formally
assign these roles to all project staff members. By assigning roles and responsibilities up front, project leadership can ensure that entire data migration life cycle is supported with appropriate
accountability established. Conduct a formal walkthrough of the “Roles and Responsibilities” document to get buy-in from all stakeholders and project staff members.

Strategy 4: Divide and Conquer

Just like any other large and complex task, data migration also should be divided around logical grouping of data – such as business area, geography, cost center, etc. The choice of such
logical groupings depends on the business context for data migration task. It is recommended to choose smallest data set (Hawaii) first and then move on to larger data sets (California). By
following such methodology, the team can learn and fine-tune the migration process early on with smaller data sets, thus minimizing the risks. Migration of each data set can be treated like a
release, which will help the team immensely in communication. Each release should be followed by a formal release evaluation step – to document and to educate – and refinements for next

Strategy 5: Invest in Technology/Tool Training

For large-scale migration projects, it is recommended to invest in proven technology/tool for obvious reasons – automation, metadata collection, scheduling, error handling, etc. If the team
is new to such technology/tool, then it is highly recommended to invest in formal training for the staff that is responsible for development and execution of data migration code components. By
investing in such training, project leadership can minimize the risk associated with the learning curve involved in the project. Also during the training process, the team gets the opportunity to
establish relationship with vendor’s technical support staff.

Strategy 6: Conduct Performance Testing

For large-scale data migration efforts, the size of the data sets being moved from source to destination data stores can be overwhelming. Due to business and/or operational requirements, most of
the data migration projects have predefined and short time windows for moving data. Hence, it is imperative for the code components to have acceptable performance levels. It is highly recommended
to fully test the code components for performance at production scale. The project staff should continue to tune the software and/or configuration parameters until the desired throughput has been
achieved. The repetitive performance testing will also help staff get acquainted with the technology and the migration process.

Strategy 7: Have a Plan B

Last but not least, just like for any mission-critical project, have a plan B. Even after significant planning, testing and rehearsal, migration projects tend to face surprises. Hence, from
business continuity standpoint, it is imperative to have an alternative solution planned and tested before the project begins. The plan B must be formulated
with inputs from all stakeholders including business leadership, business users, operational IT staff and migration project leadership. The migration project leadership should get sign off from all
stakeholders for the plan B and communicate any changes thereafter. By communicating the plans and intentions to all stakeholders, the project leadership can ensure that all dependent business
processes are prepared for the change.

As we learned, most of the data migration projects fail for various reasons. One of the primary reasons for such failure is underestimation of the scale and complexity of the data migration
effort. By proactively investing in estimation and planning, IT managers can get good handle on the project. Data migration is a multidimensional effort, which can be time sensitive and mission
critical. By following these simple and proven strategies, IT managers can certainly improve the probability of success.

End Notes:

  1. Data Migration in the Global 2000, Bloor Research (September 2007)


submit to reddit

About Satyajeet Dhumne

Satyajeet is an experienced consultant in the fields of data warehousing, business intelligence and data management. He has more than 23 years of experience in the information technology industry, and for the past 12 years he has focused on business intelligence, data warehousing and data architecture. Satyajeet holds M.S. in Management of Information Technology from McIntire School of Commerce at University of Virginia. You may reach Satyajeet via email at