Many established organisations have legacy information systems that are expensive to maintain and difficult to modify. These legacy systems can damage an organisation’s competitiveness, reputation – even its viability – and as time passes there are fewer people with the skills to fix them.
There are several hurdles in the way of conversion from legacy systems to modern systems, and one of those is data migration.
Data migration is a messy, time-consuming and difficult undertaking. Some surveys quote that 84% of data migration projects fail.1 A survey of UK-based financial services firms found that 72% of organisations deferred moving applications because data migration is ‘too risky.’2 Knowledge of these problems can result in organisations procrastinating over replacing their legacy systems, sometimes delaying the asset replacement past an appropriate retirement cycle and timeframe, and in crises that force unplanned actions. Clearly, a crisis-driven approach to managing and replacing information systems is undesirable.
Exacerbating the problem is the ‘shiny new toy’ syndrome. Increasingly, legacy systems are being replaced by proven package solutions that are supported by a broad range of service providers rather than in-house developed systems. Understandably, IT professionals are attracted to developing marketable skills in modern packages and are reluctant to spend time analysing data in a custom-built legacy system. This contributes to the lack of focus on the data migration project.
So, data migration is a difficult and unattractive task with high potential for failure. But data migration is unavoidable if an organisation wants to retain its valuable data and knowledge assets as it replaces its legacy systems. Our observations, confirmed by reviewing the literature, indicate that there are five significant barriers to data migration success:
Delaying the data migration effort until it adversely affects the system conversion effort
Failing to make informed data migration decisions due to lack of cost and time estimates
Failing to fully engage the business in the data migration project
Inability to access scarce internal subject matter experts
Using inexperienced staff with homegrown tools and unproven processes
This paper outlines five considerations to address these barriers.
Consideration 1 – Data migration is unavoidable, so start now!
Example: A leading Australian organisation is working on converting from a legacy system to a package solution. However, their total focus is on implementing and configuring the package and there is no attention being paid to data migration. This will inevitably lead to delays because the system cannot go live until the legacy system data has been migrated. Data migration will be on the critical path, increasing the risk of arbitrary decisions and shortcuts.
Data migration is important, difficult and expensive; however it often does not receive appropriate attention and funding until late in the system conversion exercise. Many organisations are tempted to focus on purchasing and implementing the target system while postponing commencement of data migration tasks. There is an understandable attraction towards working on the new, proven future system and an equally understandable reluctance to undertake the tedious, time consuming and risky task of analysing and executing data migration. However, this is only delaying the inevitable. Unless system requirements allow for zero data migration, where all the data is left behind, then data must be understood, converted, tested and loaded, and that process will be on the critical path for going live with the target system.
It does not matter whether the target system is unknown; the first stage of data migration work can begin on the first day of a system conversion project – even before replacement options have been evaluated. Organisations can begin work immediately on documenting and assessing their legacy data assets.
It is impossible to migrate from one database to another database without a clear understanding of the structure and quality of the data in the source database. Documenting and assessing the data quality of the source database is mandatory, otherwise the loading of data into the target database will fail. Failure occurs when data is loaded into the wrong place (e.g., phone numbers being loaded into a car registration field), or the data fails the target environment’s validation requirements (e.g., rejection of bad dates, missing fields or invalid values), or data takes on different meanings, or referential integrity is lost, or any one of hundreds of problems that occur due to poor data or analysis. At some point these types of problems need to be resolved. The effort involved in understanding the source database can either begin at the start of the project or at some later point, but the effort cannot be avoided; it will have to occur at some time.
The good news is that understanding and cleaning the data is a useful activity in its own right. Even if the migration project is cancelled, the cleansing and documenting of data assets will assist with all future activities involving the source database. It is also critical to the next consideration – creating estimates.
Consideration 2 – The sooner you can create meaningful, defendable cost and time estimates the better.
Example: A major Government organisation is preparing to go to market to tender for a replacement of its core information system. They are exploring options for migrating to the new system; however it is difficult to develop meaningful cost comparisons between the different data migration options without understanding the source system’s data assets.
Typical estimating approaches identify what type of data is in the source system (e.g., how many ‘look up’ tables, how many duplicate tables, or how many tables used for once-off purposes such as staging tables). The tables with true business data can then be classified as low, medium and high complexity for the data migration – the complexity measures reflect whether the tables have a large number of outlier data, weird dates, outlandish values or other complicating factors. The estimates should be tested several times during the data migration project to ensure they reflect knowledge gained during the process, and that they match the actual effort required. A good first step is to migrate a small, but representative, set of data, then carefully measure the actual effort and refine total estimates based on what is discovered.
These estimates will enable informed decisions about what data is migrated and what data is not migrated. In the absence of good cost estimates the default decision will be to migrate all source data to the target system and store it on high speed storage technology. Unfortunately this is the most expensive option and will often lead to unacceptable data migration costs.
Solid cost estimates will support decisions regarding which data elements are:
Migrated to the target database
Left in the source system
Transferred to a data warehouse
Dealt with in some other manner
These questions are particularly relevant to historical data, which can require a large amount of disk space and be expensive to move in its entirety. Historical data was often created under different business rules, and while the existing system may contain 30 years of accreted business rules, it is usually prohibitive to implement all of those rules in the new system. Therefore historical data is often not valid (without simulating the rule changes). Decisions on the handling of historical data have a significant impact on the success of the target system and the future operation of the organisation and need to be made with the best possible understanding of the source system data.
Data migration cost estimates are critical to ensuring the data migration effort is adequately funded and receives appropriate management attention. The longer the data migration costs are labelled ‘unknown’ the longer it will be before management are able to understand the magnitude of the effort and cost and the longer it will be before the organisation can make informed decisions and fully commit the necessary funding, effort and attention. In contrast, the purchase and implementation costs of a package solution can be readily ascertained and understood, which contributes to the focus being on target system implementation rather than data migration.
Time estimates also assist with meaningful discussions regarding outage periods during data migration. The business default position is often for zero or minimal outage. The estimates can highlight the reality of the situation and provoke discussion of how to work around any necessary outages.
The estimates must include the time and effort commitments required from business people to support the next consideration.
Consideration 3 – Full business involvement in the data migration.
Example: A State Government agency successfully converted a significant legacy system to a modern solution. A key contributor to their success was allocation of key business representatives to the project on an almost full time basis.
Data migration issues are business issues – IT does not have the expertise or the authority to resolve business ambiguities, trade-offs and uncertainties. Business involvement is not negotiable. The credibility of the target system will be compromised if it is loaded with data that is known to be incorrect and is unacceptable for future business use.
Business involvement begins with governance of the data migration project. Figure 1 provides an initial framework for team based governance of the data migration. The data migration project team presents the data quality committee with findings about data quality issues. The committee recommends what action, if any, should be taken to rectify the data. The Data Migration Steering Committee approves/rejects the Data Quality Committee recommendations. Surrounding this team- based governance would be overall program governance.
Figure 1: Generic Governance Structure
Consideration 4 – Your internal subject matter experts (SMEs) are in short supply, so look for ways to leverage their skills.
Example: A Federal Government agency is migrating from their internally developed legacy system. They recognise that expertise in this system is entirely within their internal staff and are implementing approaches to leverage their in-house expertise.
Systems that have been created internally are best understood by internal staff (both business and IT) who have worked on these systems for years. However, these resources already have full time jobs maintaining and working with the source system and they are often critical to defining the requirements for configuring the new solution.
One solution is to divide the work load so that routine data migration tasks are handled by generalists, and only those tasks requiring specialist domain knowledge are handled by SMEs, so that demand for those internal SMEs is reduced. For example, the data conversion effort will deal with both technical and business matters. Technical matters, such as converting from EBCDIC to ASCII, can be handled by general technical resources. A large amount of mechanical work, such as identifying outlier values, can be performed by generalists and then resolving how to deal with outliers can be performed by the SMEs.
The SME effort should also be reserved for changes to data concepts. For example, let’s imagine an organisation that issues licences to companies; these license applications are commonly performed by agents. The source system may have one record containing details about the agent, company and license whereas the target system has agent, company and license details stored in 3 different tables. Inevitably there will be issues, such as how to merge duplicate agents, which will require assistance from the SMEs to resolve.
Figure 2 shows a hierarchy of SME involvement. Ideally the SMEs will spend most of their time on activities at the top of the hierarchy.
Figure 2: SME Deployment
Consideration 5 – Use experienced migration practitioners as well as proven tools and methodology.
For most organisations a core system conversion may happen once in a decade, so in-house staff will not have had the opportunity to develop data migration expertise. Specialist organisations will be more experienced with the problems faced during data migrations and will have developed techniques to deal with these issues. For example, data migration involves different activities, such as data cleansing, that are not usually contained within the typical System Development Life Cycle (SDLC) that is used in-house – the specialists can help identify the additional activities and develop a refined one-time SDLC.
Data migration involves data quality assessments and cleansing as well as extract, transform and load activities. It is possible to build tools internally to perform these activities but increasingly organisations are using proven third-party tools that can be readily purchased and supported. Third-party tools are continually enhanced to meet the needs of their user base and to respond to the competitive demands of the market place.
A data migration project is usually a significant and risky undertaking and is not well suited to experimenting with new, unproven approaches. The safest approach is to use a proven data migration methodology and customise it for the specific needs of a given project. Figure 3 shows a high level approach to data migrations. This would be customised to meet the particular needs of an organisation.
Figure 3: Generic Data Migration Methodology
(mouseover image to enlarge)
Figure 4 shows a high level flow of deliverables between stages, and again this would be customised for each project. For example, if the source system stored addresses in a single unstructured field then the decomposing and cleansing of address data may occur during the ‘convert data into production’ stage rather than ‘plan and conduct data cleansing’.
Figure 4: Generic Data Migration Deliverables
(mouseover image to enlarge)
Data migration is difficult but unavoidable, so start work now on understanding your legacy data assets. Use the knowledge gained about the data assets to develop defendable time and cost estimates to attract management attention, resources and commitment. Ensure the business is fully engaged and get the best possible leverage from scarce internal SMEs. And finally, data migration risks can be reduced by using proven tools, methods and specialist expertise.
Data Migration in the Global 2000, Bloor Research (September 2007).
Business-centric data migration, Philip Howard – Bloor Research, January 2009.