In Part One of this article, I explained that known data issues are only the tip-of-the-iceberg. The majority of these issues are still hidden and unknown.
Data issues are one of the most underestimated and misunderstood challenges organizations face today. Most organizations do not realize the extent of their data issues and fail to address their core data problems.
In these articles, I identify nine underestimated or unrecognized data challenges. I covered the first four in Part 1 and will cover the last five below:
5. Redundant-Inconsistent Data
Redundant data is any duplicated data or data captured and stored in multiple systems for different purposes. This practice is either intentional or unintentional. Inevitably, redundant data quickly becomes inconsistent data when the multiple copies of the data end up with different values. Unfortunately, redundant-inconsistent data is the norm in most organizations. Redundant-inconsistent data is a significant but hidden data challenge.
Redundant data is very costly to an organization, creating technical debt. Each redundant data element requires additional software to input, move, and manipulate. Then there are the maintenance costs to support redundant data elements. Multiply this by the number of redundant data systems. Then add in the human capital, systems, storage, and network costs. The cost to an organization is significant. However, the even greater cost is the intangible cost of the poor-quality data inherent to redundant-inconsistent data. For example, the effort and risk involved in guessing which version to use in order to make a critical business decision.
An added complication of redundant-inconsistent data is that departments rarely resolve discrepancies the same way as one another. Each department uses a different method and thus has a different answer for the same question. Inconsistent data issues are elusive due to an application-centric focus where departments are comfortable with their application data silos and their versions of data. This situation is especially challenging for analytics.
Redundant-inconsistent data is one of the biggest culprits of failed Analytics and Big Data. Analytics requires data across departmental boundaries. The nature of analytics is like peeling an onion—as soon as you remove a layer or answer a question, there is yet another. One answer leads to another question, and so forth. Analytical questions typically require data details that cross business functional areas (and their data silos). The deeper level the question, the more holistic (connected) the data must be to answer the question. Successful Analytics requires consistent data across the organization. Addressing redundant-inconsistent data is a challenge for most organizations—it requires an organizational shift from an application-centric to a data-centric or a holistic data strategy.
6. Under-Managed and Unmanaged Data
Mort organizations are under the false belief that they manage their data because they manage the technology systems that house, move, or process the data. Data technology management and data management are not the same. Unfortunately, very few organizations practice formal data management. From experience, organizations typically operate “data-management” in a reactive mode. They use a tactical approach addressing data issues as they occur, and only when they negatively impact business operations. This scenario boils down to reactive “data management” in production as an emergency band-aid to fix a symptom. This method rarely considers or addresses core data problems. The underlying data problems continue to create data issues, often requiring weekly or even daily “cleaning.”
Reactive data management is costly and takes place across many departments. Each department addresses the same data issue, typically in a different way. The redundant activities waste a considerable amount of time looking for missing data, investigating issues, and guessing at the best band-aid approach. Reactive “data-management” is typically outside the normal job activities of a department. Thus, these activities are rarely accounted for in a budget. Most organizations never realize the actual cost of continually reacting to and chasing their data issues.
Without a centrally coordinated, planned, and proactive effort, organizations rarely address the root cause of their data issues. Continuous project urgency and tight budgets drive a quick-fix mentality (treating the symptoms), rather than performing a root cause analysis to fix the underlying problem. These band-aid fixes often end up causing more data issues than they initially “solved.” This scenario leaves little time, money, and resources to strategically manage data and proactively stay ahead of the issues. Effective data management is proactive and preventative.
Organizations with “formal” data management processes, standards, policies, strategies, and governance often end up practicing data management in theory only. The urgency of timelines and tight budgets cause IT departments to bypass data management standards and processes without realizing it. Projects typically get an “exception” based on their intention of fixing the data someday and doing it right the next time. In reality, someday never arrives, and there is no next time.
These organizations fool themselves into believing that they practice formal data management and then wonder why they continue to have data issues. They question the effectiveness of data management as a good practice rather than examining the effectiveness of their data management execution. It takes time and commitment to get and keep the data right. The irony is that it takes more time and resources to get the data wrong.
7. The Unnecessary Data-Overload
Data-overload is a costly, often unknown data issue facing most organizations today. Organizations are continually building new data structures and data systems to compensate for missing (existing but unknown) data, a unique version of data, non-scalable inflexible systems, and poor-quality data. Additionally, many old data systems are not fully retired when new systems replace old technology. This is often because there is a need for a specific version of data.
These scenarios lead to unnecessary data-overload, where the volume of unused or unusable data structures negatively impacts the usable data and system performance. Unnecessary data-overload also puts an organization at risk for compliance and information protection. Unnecessary data structures may contain sensitive data. The sheer volume of unnecessary data hinders information discovery and security.
The cost of unnecessary data-overload goes unrecognized and unaccounted for, but is significant. The costs include the time, labor, and loss of productivity to support the unneeded data structures. Additional costs include the extra hardware and software to support the unnecessary data, as well as the human capital required to maintain the extra data structures. The even greater costs are the unrecognized intangible costs from poor decisions, lost business opportunity, and customer dissatisfaction resulting from the unnecessary data.
8. Regulatory Compliance Data Challenge
The growing regulatory and corporate governance requirements are exposing a multitude of data challenges. Keeping data secure, private, and compliant, requires an accurate understanding of the meaning, location, ownership, governance, source, lineage, and use of the data. Unidentified, misunderstood, inaccurate, and untrusted data affect an organization’s ability to secure and protect its data. This situation puts the organization at significant risk—often unknowingly due to unknown data issues.
Addressing compliance and securing information is more complicated and costly than it needs to be due to unknown data issues. It is a difficult undertaking for most organizations to identify the sensitive data due to the lack of a data inventory and missing business data context, as well as all the redundant-inconsistent data. Even with identified sensitive data, without proactive data management, securing and protecting that data is extremely challenging due to the organic nature of data as it grows and changes.
Appropriately addressing compliance and securing data requires an accurate accounting of the right data fields and data content, not just simply listing the name of a data fields on a form during an audit. A field name or definition within a database does not guarantee that the data in that field matches. Often data fields are overloaded. The lack of proactive data management means that a full analysis of the data fields by a qualified data professional is necessary on an on-going basis to address compliance and information protection appropriately.
9. Acceptance of Data Issues as Normal
One of the biggest hurdles to improving data is the acceptance of poor-quality and inconsistent data as the status quo. Many organizations consider data issues as a normal cost of doing business— a little low-quality data is inevitable and harmless. Achieving data quality is not worth the effort because there is not an economically desirable reason to change. The inaccurate assumptions regarding the scope of the known data issues, their cost, and impact is the basis for this thinking.
In reality, the scope of the unknown broken data is significantly broader than most could imagine. The tangible and intangible cost of hidden data issues is substantial and unaccounted for, including its negative impacts. If the actual scope, costs, and negative impacts of the unknown data issues were understood, the benefits of getting the data right would far outweigh any cost. Organizations would treat data as a valuable business asset where poor-quality is unacceptable.
The lack of recognition of the data problems, their true scope, cost, impact, and lack of accounting for their actual cost has severely hindered most organizations from appropriately addressing their data problems. The current band-aid approach to “fixing” the data symptoms is not working. To continue is not an option. It is imperative to change this inadequate approach to data quality and stop chasing the symptoms. Organizations need to look deeper, understand their core data issues and the impact, accurately account for the cost, and appropriately address their core data problems.