Because patches and work-overs can keep smaller databases running, most organizations never realize how seriously their database has been degraded until they attempt a major integration. In fact, over 80% of major integrations fail or overrun budget. There are 5 danger signs of potential data integration disaster — any one of which can derail even the most well-planned project.
The way to accurately identify the potential problems within a database — and prevent integration failure — is through data profiling. There are currently two basic data profiling methodologies: manual and automated. Because automated data profiling is far faster and more accurate than manual approaches, it saves substantial time and money — and reduces the risk of failure — when used with data integration projects. Automated data profiling can also be used on an ongoing basis to ensure continued maximum data integrity.
The Five Danger Signs of Data Integration Disaster
A database is like a growing, evolving organism. Information is added, deleted and updated … structures, relationships and fields are altered to meet changing business needs … and databases designed for one application are shared with others.
In an ideal world, all of these updates and adjustments would be documented and available for reference when a problem or question arises. But we all know that this usually isn’t the case, for a variety of reasons.
Unfortunately, problems that are hidden or patched-over in daily operations suddenly become major stumbling blocks during an enterprise data integration. Here are the 5 danger signs that often precede serious data integration problems.
- The people responsible for setting up and maintaining the application have moved on
- The application was outsourced, and the vendor no longer exists, or has merged with another company with new priorities
- An emergency update was made under extreme time pressures, and proper documentation procedures were not followed
- The documentation regarding changes was misfiled or lost
- Proper data entry standards, policies and accountability were not in place, permitting users to creatively use fields for unintended purposes
Most databases are in high demand. When a problem develops, the organization can’t afford for the database to be down for long. Often, if the cause cannot be identified because knowledge or documentation of the underlying structure is unavailable, a patch or work-over is applied, further degrading the quality of the database.
The database may still operate at a reasonable level and meet the needs of the people using it — but the underlying errors and problems still exist. Over time, with continuing alterations to its structure and content, the integrity of the database can reach such a level that it becomes unreliable.
The Breaking Point — Data Integration Failure
Patches and work-overs can keep smaller databases running, but most organizations never realize how seriously the integrity of their database has been eroded until they attempt a major integration. That’s when the impact of the many anomalies combine to crash the project.
With all the problems and unknowns, the result is often disastrous. In fact, over 80% of all major data integrations run over schedule or budget – or fail altogether. The main reason is lack of knowledge about the source data.
Without that knowledge, the mapping specifications for the integration must be based on assumptions, outdated information and educated guesses. No one knows if the mapping specifications are correct until the testing phase. If the test fails, the data analysts have to go back to the beginning, revise their assumptions, develop new specifications, and test again.
In major data integration projects, this trial-and-error process may take up to 10 iterations before the integration is successful. The added cost can easily run into hundreds of thousands of dollars – and seriously impact the ROI of the merger, acquisition or other initiative that originally prompted the integration.
However, the potential risk in terms of cost overruns is small compared to the potential effect of months of implementation delays. If the integration is related to a mission-critical CRM (customer relationship management) initiative, the window of opportunity to achieve a competitive advantage is likely to be lost. Today, organizations simply don’t have the luxury of making a data integration mistake. They have to succeed the first time – or be left far behind.
Yet, many organizations going through a major data integration project for the first time often grossly underestimate potential obstacles. The result is a serious risk of project failure.
How to Determine Your Data’s True Quality
The only reliable way to determine the true nature of your data is through a data audit, involving thorough data analysis or profiling. There are two methods of data profiling: manual and automated.
Manual Data Profiling
In manual data profiling, business or professional service analysts create data assessments that uses existing documentation and assumptions about the data to develop an educated guess as to what and where data problems are likely to be encountered. Since they cannot realistically evaluate 100 percent of the available data, they instead take a sampling of the data and analyze it to determine if their suspicions were correct.
If the analysis confirms their judgment, they write programs to correct the problems they identified. At this point, the analysts revisit the source data, find out if their corrections worked, and then try to determine if other problems may exist. They then analyze another sample, and test again.
This can be a very costly, time-consuming process, requiring a number of iterations and draining IT resources. Of greater concern is the fact that there is no guarantee that all problems have been identified and corrected. As the saying goes: You don’t know what you don’t know. Undetected data problems can crash a major migration or integration initiative.
Automated Data Profiling
The more efficient and accurate way of examining data is to use an automated data profiling tool that is able to examine all the data, not just a sample. This kind of tool can identify virtually all problems resident in the data, providing a complete picture of its structure, relationships and quality, including metadata descriptions, data values, formats, frequencies, patterns, and ranges. It is also usually much faster than manual methodologies.
Robust automated data profiling solutions offer a number of capabilities that make viewing the profiling results and correcting data errors and anomalies easier and more efficient. Among these is the ability to view analysis information, examples of the data, and analysts’ notes about the data, all on a single screen.
When a data problem is identified, “drill-down” capability enables users to actually examine the data elements in question, not just review statistics about the data. For example, if the analysis reveals that there is a format conflict in a particular data field, the user can look directly at that field and see exactly what the problem is. This makes correcting any problems much faster and easier, saving both time and money.
Preventing Data Integration Breakdown
Due to its efficiency and ease of use, an automated data profiling system can also be used for ongoing data quality audits to ensure continued maximum data integrity. In this way, it will reinforce the value of the asset your enterprise data represents, and also help prevent any data quality surprises in the event that a data integration initiative arises.
If you have not audited your data quality on a regular basis, automated data profiling prior to a major data integration or migration initiative is critical. It can save hundreds of thousands of dollars, and weeks — if not months — of delay.
Increased sharing of data across the enterprise is creating a serious dilemma. The more ways the data is used, the more valuable it becomes – but also the more susceptible it is to degradation. Entering any major data integration initiative without confirming the database’s integrity puts the project at substantial risk of failure. The most precise, cost-efficient way to determine a database’s quality, structure, and relationships is with an automated data profiling system. This type of system identifies all the errors, problems and anomalies in the database with exceptionally high levels of accuracy. It also enables analysts to drill down to view the actual data strings or fields in question. Being able to work directly with the data itself saves significant time, money and IT resources in making the necessary repairs.