The Data Centric Revolution: Integration Debt

15-SEPCOL01MCCOMB-edIntegration Debt is a Form of Technical Debt

As with so many things, we owe the coining of the metaphor “Technical Debt” to Ward Cunningham and the agile community. It is the confluence of several interesting conclusions the community has come to. The first was that being agile means being able to make a simple change to a system in a limited amount of time, and being able to test it easily. That sounds like a goal anyone could get behind, and yet, this is nearly impossible in a legacy environment. Agile proponents know that any well-intentioned agile system is only six months’ worth of entropy away from devolving into that same sad state where small changes take big effort.

One of the tenants of agile is that patterns of code architecture exist that are conducive to making changes. While these patterns are known in general (there is a whole pattern languages movement to keep refining the knowledge and use of these patterns), how they will play out on any given project is emergent. Once you have a starting structure for a system, a given change often perturbs that structure. Usually not a lot. But changes add up, and over time, can greatly impede progress.

One school of thought is to be continually refactoring your code, such that, at all times, it is in its optimal structure to receive new changes. The more pragmatic approach favored by many is that for any given sprint or set of sprints, it is preferable to just accept the fact that the changes are making things architecturally worse; as a result, you set aside a specific sprint every 2-5 sprints to address the accumulated “technical debt” that these un-refactored changes have added to the system. Like financial debt, technical debt accrues compounding interest, and if you let it grow, it gets worse—eventually, exponentially worse, as debt accrues upon debt.

Integration Debt

I’d like to coin a new term: “integration debt.” In some ways it is a type of technical debt, but as we will see here, it is broader, more pervasive, and probably more costly.

Integration debt occurs when we take on a new project that, by its existence, is likely to lead someone at some later point to incur additional work to integrate it with the rest of the enterprise. While technical debt tends to occur within a project or application, integration debt takes place across projects or applications. While technical debt creeps in one change at a time, integration debt tends to come in large leaps.

Here’s how it works: let’s say you’ve been tasked with creating a system to track the effectiveness of direct mail campaigns. It’s pretty simple – you implement these campaigns as some form of project and their results as some form of outcomes. As the system becomes more successful, you add in more information on the total cost of the campaign, perhaps more granular success criteria. Maybe you want to know which prospects and clients were touched by each campaign.

Gradually, it dawns that in order to get this additional information (and especially in order to get it without incurring more research time and re-entry of data), it will require integration with other systems within the firm: the accounting system to get the true costs, the customer service systems to get customer contact information, the marketing systems to get the overlapping target groups, etc. At this point, you recognize that the firm is going to consume a great deal of resources to get a complete data picture. Yet, this could have been known and dealt with at project launch time. It even could have been prevented.

Integration Debt you Inherited

It is more obvious to see this integration debt after the fact. In a mature (read “legacy”) environment, most of the cost of implementing a new system is through integrating it with the existing data that is not yet integrated. It is not unusual for a new systems project to incur more work drawing in data from other sources than it does with its own functionality. If you implement an order taking system, you are going to drag customer data in from some customer system(s) and product and price data from other systems. The cost to integrate into what already exists is fairly obvious if you look for it, because it represents real costs that your project is now incurring. This is your project paying the tax that was incurred by those who have gone before you.

The subtler aspect that most overlook, is assessing how the project adds to the sum total integration debt for the firm. To be more aware of this, you must ask yourself: am I causing a data set to come into existence, or am I taking an existing data set and maintaining it in such a way that the rest of the firm will eventually be motivated to bring this data back into alignment with the rest of the firm?

“Enums” Considered Integration Debt

Even the smallest data sets add to integration debt. If your firm creates (or more often, if it obtains by virtue of having purchased a system) a small taxonomy, sooner or later you will need to integrate that with your other taxonomies and the rest of your data. Let’s say your applicant tracking system captures gender (0 = male, 1 = female). Not only are you incurring the future integration cost of figuring out whether “0” maps to “M” in your HR system, but you have the added burden of understanding what you are going to do when you start allowing more than two genders. The point is, every independently managed data set, big or small, adds to integration debt. The whole “reference data” industry is an attempt to deal with this lack of integration at the small data set level. (Show of hands: how many systems at your firm have, independently, a list of valid country codes?)

Rogue Systems

Rogue systems (those Access and Excel-based systems that spring up outside the control of IT) are another breeding ground for integration debt. Rogue systems come in two varieties: those that get a feed from a corporate system and those that don’t. Those that get a feed typically add additional fields (categories and analytic values, usually), as this was why they needed to build a rogue system in the first place. These add-on fields, if successful, eventually become reluctant enterprise “assets” and sooner or later need to be re-incorporated. The rogue systems that were not based on corporate data have the same problem, but all of their data constitutes integration debt.

Integration Debt as a Service

When we use Software as a Service (SaaS), we are almost always incurring integration debt. In this case, we often incur the debt on an employee-by-employee level. We have arrived at the place where any employee with a credit card can implement a system. Most of these systems must get populated from somewhere, and that somewhere is mostly data entry on the part of the employee. This is why it flies under the radar so well.

But each record added into the SaaS is another bit of integration debt. The CRM systems are building up vast troves of prospect and customer data (sales person activity and the like) that will eventually be integrated, at great cost to the enterprise.

The Application Data Model

The single biggest contributor to integration debt is the application data model.

When a project launches, one of the first things to do is design the application data model. In most cases it is designed strictly from the point of view and the requirements of the immediate problem to be solved. This ensures that the model is creating data sets that are not integrated with the rest of the firm. In the areas where the new application happens to refer to data sets that are already covered by other systems in the firm, the new system will inevitably redefine them (i.e., “We needed to maintain additional data about the employees, so we made a new table,” or, “This is only about hand tools, which aren’t managed by the inventory system,” though they are procured by the purchasing system, etc.).

Packaged Software and Integration Debt

Every application software package comes with its own data model. None of them are integrated with your other enterprise applications (at best, maybe a few from the same vendor, but as anyone who has tried to integrate two separate systems will attest—even from the same vendor—integration debt is alive and well in all major application software packages).

What Can Be Done?

The best thing to do, is that at the inception of a new project, recognize the integration debt that will be created by the project. Like a mining company’s environmental reclamation budget, we should recognize before we do the damage what its correction will cost.

This mandate to estimate and budget the damage being created has to come from the leadership of the firm. The project approval process needs to include an identification of and a means for dealing with integration debt.

When the project has to justify and set aside the cost to deal with future integration problems they create, they will recognize that many of these future costs can be mitigated now for a fraction of their future cost. Integration debt is not inevitable. It is possible to construct environments with minimal debt, which leads to an agile enterprise.


submit to reddit

About Dave McComb

Dave McComb is President of Semantic Arts, Inc. a Fort Collins, Colorado based consulting firm, specializing in Enterprise Architecture and the application of Semantic Technology to Business Systems. He is the author of Semantics in Business Systems, and program chair for the annual Semantic Technology Conference.

  • Richord1

    Data Poverty

    “Data centric” is another marketing term that has been thrown out there akin to man believing the earth was the center of the universe.

    Designing viable solutions requires a holistic rather than “centric” approach. Systemic designs must encompass technology and human factors including software, data and human values (norms) and behaviors rather than “data centricity”.

    Although the concept of integration debt is amusing, I think most of the data that has been collected over the years, regardless of whether it resides in spreadsheets, Access databases or RDMS databases is of poor quality. All systems should be considered “rogue” since most of the designers of these systems were data illiterate and unfortunately remain so today.

    Impoverished metadata, polluted semantics and institutionalized data design mediocrity, have left us with data that is ambiguous, meaningless and obfuscated.

    We are faced with data poverty. Data that remains valueless despite imaginative attempts to claim data as an asset. Our “data currency” has little value.

    Preventing data poverty requires that designers and users alike learn to be data literate and to prevent the ongoing data poverty. Creating data with trustworthy metadata, meaningful semantics and reducing ambiguity should be the goal of data professionals.

    We are not suffering with integration debt but with a valueless currency; data.

    • Lonna Hannan

      Interesting perspective and I totally agree. Data poverty is a great term for the data that has built up over time without the discipline of good data management practices, policies, and procedures.

      As a seasoned data professional and advocate of clean, well understood (semantics), unified (reference data management), managed (stewardship), inventoried, and discoverable data assets, I believe it is our responsibility to describe the dangers of both data poverty and data integration debt to our fellow associates. I do not think they are mutually exclusive concepts.

      Motivating people to change behavior is not successful without common understanding.