Understanding Legacy-to-Package Data Movement

Published in TDAN.com April 2001


The move to purchase systems from custom development has led some to believe that thorough requirements analysis is somehow less relevant. Consider that the set of required data elements, the set
of legacy data elements, and the set of elements a package can store are overlapping, yet distinct. A three set Venn diagram representation can be used to classify data elements and help ease the
pain of integration. Each region in the diagram has relevance in the implementation and migration process. An understanding of possible impact for the elements in each region may ease the
uncertainty of the integration effort. The ability to do this analysis depends on package-independent data requirements.

The Challenge

Many organizations are moving toward commercial off the shelf (COTS) packages to replace some or all of their legacy applications. This can be an excellent idea, especially in non-strategic areas
of the enterprise. But even if selection is done correctly (requirements are done before selection and the selection is based on those requirements), migrating the legacy data and interfacing the
package into the systems environment can be a daunting challenge.

Challenge Analysis

Actually, the organization migrating from legacy to COTS is dealing with three distinct sets of data elements: the legacy application’s (set L), those actually required (set R), as well as those
offered as part of the COTS data structure (set C). Each set of data elements overlaps with both, one, or none of the others.

While there is significant overlap between these three sets, these sets are not perfectly coincidental. For instance, all of the legacy data fields may not be used or used simply to store
information for which they were not designed. In addition rarely is the generic COTS package a perfect fit for either for actual requirements or the legacy. Since there is overlap, but not perfect
coincidence, the three sets of data elements may be represented as three interlocking circles in a Venn diagram (See Figure 1).


Figure 1: Venn Diagram of Three Interlocking Sets

In the figure 1, the reader will note that the three interlocking circles form seven distinct regions. For convenience, they will be referred to as regions 1-7. Note that the circles are drawn such
that the center region, region 7, is a relatively large portion of each set of data elements. This is what every organization hopes mirrors their own reality; ideally this region for a given
implementation will be even larger. Also note that the diagram shows all three data sets overlapping equally. This will rarely be the case.


Figure 2: Three Set Diagram, Numbered and Shaded Regions

For ease of reference, figure 2 presents the Venn diagram with each region numbered and shaded with different colors. Each region presents a unique set of challenges for those charged with mapping
from an existing structure to a target structure.

A Data Element Classification Scheme

If the integration team uses a classification scheme for identified data elements, the data mapping results can highlight areas were integration problems might occur. The follow paragraphs present
a scheme based on the previous analysis.

Region 1 represents all data elements in the existing legacy that neither support actual data requirements, nor can be mapped into the new structure. These are fields in existing segments,
files or tables that are either currently unused, or they are used for some other purpose than originally intended. The latter situation makes data archaeology more challenging.

Region 2 represents the data elements that are currently in use and are necessary, but the target data structure has no place for them. This region contains the data elements that end
users will recognize as missing during or soon after implementation, unless high quality requirements were done before selection, in which case the fact that the needed data element is missing
should have already been identified. It is possible to work around missing functionality, but customization makes keeping up with vended packages difficult and costly.

Region 3 represents those data requirements that are supported neither in the legacy application nor in the COTS package. This region’s data elements translate to requirements missed,
which are likely to trickle in as requests during the first few months after a package is put into production. This probably will result in additional cost to assess each request, to perform impact
analysis, to design and implement a solution if customization is possible. Alternatively, the organization could learn to live without the functionality, but this will result in significant
opportunity costs. A final possibility is to record all missed requirements to be submitted to the package vendor to hope that they might incorporate these suggestions in a future release.

Region 4 is where functionality is needed and supported by the COTS package but no legacy data exists for conversion or integration. Some might attempt to reconstruct or synthesize data,
at no small expense. However most organizations will simply draw a line in the sand and go on. But end users performing analytical processing will continuously wish that this required data had been
captured sooner, at least until the useful time horizon passes.

Region 5 is the subset of columns or fields in the new solution that are neither useful to the business nor able to be sourced in the legacy system. It may be important to somehow disable
the user entry facilities to these data elements to avoid the risk of misused fields. Even so this region is mostly benign because no mapping and little integration are necessary here.

Region 6 potentially represents a resource wasting opportunity. The legacy system contains the data elements and the COTS database has a place to put them. But how much time and how many
resources is the organization willing to exert for an effort that has no return. Typically, this region is small, but organizations migrating data should be aware of the possibility of wasted

Region 7 represents the ideal area. The legacy data is available, it is useful and required and it is supported by the COTS application. This region is the one that unwary managers
envision to be the totality of what is currently supported, required and will be supported in the future. It would be difficult to ever find an example where this is the case. The integration team
should map the legacy to target, and check off the required data elements as these are accommodated.


For the organization seeking to migrate from a legacy system to a purchased package, it is important to complete a thorough and accurate requirements assessment. These requirements can be used as
the basis to assess risk and potential cost of integration and data conversion. By comparing this with the legacy data structure, an assessment of fit to the current environment may be done,
allowing the organization to correctly prioritize need.

Furthermore, these requirements can serve as an objective basis for package selection. Fit to requirements should be very heavily weighted in the selection decision. The difference between what is
needed and what is offered can now be objectively assessed.

Integration plans should include remediation plans for each requirement not present in the offered solution. Each requirement may be addressed with customization, a request to the vendor or accept
as-is. For each requirement not present, the opportunity cost of its omission should be estimated in terms of how much time lost per transaction or some other metric to assist in making the
determination of how to deal with the lack of functionality.

In an organization that is planning a mass migration of three or more COTS integration within a short time frame, doing solution-independent requirements analysis becomes even more critical. An
enterprise-level requirements analysis is likely justified. Prioritization might be influenced by the precedence of functions.


submit to reddit

About Todd Owens

Todd Owens has over fifteen years experience as an industrial engineer, programmer, systems analyst, data administrator, methodologist, college instructor and consultant. His most recent endeavors have been in IS consulting for an aircraft manufacturer in North Texas. Todd M. Owens Independent Consultant (972) 410-1040