Having been involved in several large-scale data migration projects, my company has come across some common re-occurring themes that have the potential to derail not only the data migration itself, but the entire parent program. In a set of five mistakes covered in this piece, I will provide an overview of the top mistakes that should be avoided at all costs.
Underestimating the Complexity, Duration, and Associated Costs of a Data Migration
Data migration is typically required because of a broader systems implementation that is being planned, and all too often we have heard stakeholders say that the data migration is the ‘easy part’. Thus, it gets relegated to a line item near the bottom of the project plan. BIG mistake. Regardless of the data volumes in question, data migration is a complex endeavor that requires dedicated, focused, and structured attention starting early in the program. Some of this complexity comes from the fact that there are so many questions about the new system’s functionality, data structures and related transitioning from old to new, that cannot be easily answered, or answered at all. And many of these prevail late into the program.
This situation demands a well-defined migration methodology that helps to provide a common reference point for all involved, as well as a disciplined approach that is still agile enough to allow for the inevitable and frequent changes in direction. It also demands the early engagement of experienced data migration professionals, who have an appreciation for what’s to come, and can help to confidently navigate the common pitfalls and frustrations inherent to such a project. A good yardstick is to budget at least 15% to 20% of the total services costs of the parent project to the data migration. Yes, it’s that big!
Not Involving Business Early Enough
The data in an organization belongs to the business, and they must be the ones making the business decisions about it. Coupled with this, in a data migration project there is always an expectation that “the new system” will make everyone’s lives easier. But this very likely involves new or improved processes that will in turn be reliant on a well-defined and aligned data representation in the target system. Business must provide the vision and together with subject matter experts they need to guide the implementation team toward a sound target model that will support the desired business outcomes.
This model, however, is typically very new and different to that found in legacy, and often relies on data elements that have never been considered or even captured! Together with the inevitable data quality problems in source (discussed in the next article), this leads to a situation where focused and timeous input and decisions are needed from business stakeholders.
We have found that the early (in the planning phase!) establishment of a formal platform for interaction with business is often the best way to ensure this. Such a forum – we like to call it the “Data Migration Working Group” – needs to meet regularly (at least weekly) with the data migration leads to be appraised of the data landscape, data related risks and migration project progress, and to help make the required business decisions and provide overall data related guidance.
Not Adequately Addressing Data Quality
There is often a lot of hype and a lot of expectation generated in the process of selling the ‘new system’, both from external and internal parties, as the motivations and business cases run their merry courses. Consequently, prior to every new system implementation, we hear common expectations expressed such as ‘the data quality will be better’ and ‘we will have a 360-degree view of our customer’.
Well, this does not just happen and the new technology never magically ‘sorts the data out’: left to its own devices the data in the target will be as good, as bad, and as fragmented as it is in the source! Complicating this situation is the fact that the System Integrator (SI) selected for the implementation will, even though they may take on the actual data migration, explicitly exclude the resolution of data quality problems from their project charter. They will leave it up to you, the client to sort out (as if you don’t already have enough to do!)
What we have seen works best is to outsource key aspects of the data migration and cleansing to professionals, and to contract directly with them rather than through the SI, so that they are working directly with you the client to address what are generally very complex issues. This also means profiling the data early and often, and putting into place systems, technology and processes that regularly validate data integrity and identify data quality problems that will cause the new system to fall short of the expected objectives, and those that will likely cause the data load into the target to simply fail.
This requires careful articulation, management and alignment of business and data rules (to be covered in the fifth article in this series) across all activity. This includes not least extraction, exclusion, validation, cleansing, mapping and target front-end validation rules. These also need to be regularly made visible to the business via the Data Migration Working Group (see previous mistake: Not Involving Business Early Enough) and then decisively dealt with through each iteration as the target evolves.
Another problem that regularly crops up is that most organizations expect to deal with data non-quality without specialized data quality tools. Whilst manual data quality resolution is usually always a part of a data migration, the extent thereof can be minimized by following a programmatic cleansing approach where possible. Especially with high data volumes or complex data problems (or typically a combination of both!) it will be impossible to deal with data quality issues using home-grown SQL/Excel/Access type solutions adequately, predictably, and consistently. And often the objective to build a ‘single view’ for the new system will require sophisticated match/merge algorithms that are generally only found in specialized data quality tools.
The bottom line is that data migration is not simply a ‘source to target mapping’ exercise and is not just about Extract Transform and Load (ETL), but about Extract Cleanse Transform and Load (ECTL). Ultimately, a Data Migration sub-project needs to take a holistic approach that includes consideration of the high expectations of the organization about ‘better data’ in the new application. Finally, don’t forget to also prevent the data quality problems from happening all over again in the new system. Ensure that preventative measures that match the corrective ones taken during the cleansing effectively protect the new database(s) from re-contamination.
Delays Because the Target is Undefined
We often hear from the project stakeholders that “it is too early for the data migration team to start as we have not yet defined the target.” What absolute rubbish! The reality is that there is so much data related work to do that the earlier you start the better. For starters, there is the Data Migration Strategy which needs to be fleshed out. Yes, you will not be able to complete it 100%, but the mere fact is that the topics to be addressed are so vast and mere mention of them will at the very least kick-start the many important discussions that need to occur.
A typical Data Migration Strategy needs to cover areas such as Data Migration Architecture, Approach, Forums, Governance, Technology Choices, Extract, Load, Transform, Data Quality, Cleanse Approach, Audit Control, Reconciliation Approach, Cutover Approach, Testing and many more! And so, there is a massive amount of work to be done in fleshing all of these out! Granted, a lot of this will be a work in progress for a while to come and there will be many unknowns in the early phases, but at the very least, you will have identified and prioritized these!
Another recommendation is to profile the source data as early as possible as this will provide very useful insights into what exactly you will be dealing with from a source data structure and content perspective. We all know that source system documentation is generally sorely lacking, if it exists at all. And so, data profiling is a relatively quick and easy way to establish a foundation of fact, especially as regards to what data gremlins are lurking in your legacy systems. The earlier you discover these and put plans in place to deal with them, the better. You want a well-informed and thoroughly considered Extract Cleanse Transform and Load solution to be built, or else you will suffer from the obvious time and budget setbacks inherent in the typical “Code, Load and Explode” solutions that we often see.
Finally, there is a ton of work to do on business and data rules that need to be considered in the context of the new system. Even if the target is not defined, there are many that are based on simple truths that must be catered for regardless. At the very least, make a start on the processes and templates that will be needed to manage rules within what will seem to be an ever-changing landscape (covered in the next article). For a Data Migration, there is no such thing as beginning too early. Just start. You will be amazed at how much there is to do!
No Processes for Managing Dynamic Business & Data Rules
One of the most prevalent characteristics of a data migration project is the significant extent of continual change that must be dealt with, often late into the program. This is understandable because, despite best efforts to define and decide as much as possible up front, the business is generally not completely ready in the early stages. Not having been able to fully comprehend and frame the expected target landscape, decisions typically evolve organically as the program proceeds. Another factor from a data management perspective is due to unknown data quality and modelling issues, which are discovered down the road only when the data issues and relationships start becoming clearer.
Coupled with the other challenges covered in the earlier listed mistakes, this all comes down to a distinct requirement for well-defined, solid processes to manage continual change within the data migration stream. It is, therefore, worthwhile spending considerable time as early as possible to define the roles, ownerships, processes, artefacts, and technologies that will be required to keep things under control. Particular attention must be given to the artefacts that record and help manage business and data rules across all migration activity, including not least extraction, exclusion, validation, cleansing, mapping and target front-end validation rules.
Formal templates for these artefacts must be crafted, agreed and process proved, long before they are used in earnest when (controlled!) chaos becomes the norm down the road! It must be easy to not only keep track of all these changing rules, but to understand end to end impacts of the changes as regards data models overall. Complicating this all is the fact that work is taking place across various technologies and databases, as well as data areas such as source, landing, staging etc., and on multiple platforms (e.g., Dev, QA, Prod, DR) and all of these must be kept in synch via a well-controlled release process.
Whilst Excel is frequently the tool of choice to record and manage business and data rules, it must be borne in mind that a high degree of automation should be built in to ensure alignment across multiple artefacts, because it is very challenging to manually keep up with the rate of change and resultant cross-dependency impacts, and of course anything manual is bound to be error prone.
Finally, it would make sense to call on professionals who have already thought through the requirements, made the mistakes, and have as part of their data migration arsenal a well-defined and proven set of templates to put to confident use as early as possible in the program. The subject of Data Migration is vast, and I hope that this short set of five mistakes has assisted in understanding where to focus extra effort to ensure success to the migration, and in turn to the parent program.