I’m reading the book Kill it with Fire by Marianne Bellotti. It’s a good book. Plenty of pragmatic advice, both on the architectural side (how to think through whether and when to break up that monolith) and the organizational side (how to get and maintain momentum for what are often long, drawn-out projects). So far in my reading, she seems to advocate incremental improvement over rip and replace, which is sensible, given the terrible track record with rip and replace. Recommended reading for anyone who deals with legacy systems (which is to say anyone who deals with enterprise systems, because almost all are or will be legacy systems).
But there is a better way to modernize legacy systems. Let me spoil the suspense: it’s Data-Centric. We’re calling it Incremental Stealth Legacy Modernization because no one is going to get the green light to take this on directly. This article is for those playing the long game.
Legacy Systems
Legacy is the covering concept for a wide range of activities involving aging enterprise systems. I had the misfortune of working in Enterprise IT just as the term “Legacy” became pejorative. It was the early 1990’s, we were just completing a long-term strategic plan for John’s Manville. We decided to call it the “Legacy Plan” as we thought those involved with it would leave a legacy to those who came after. The ink had barely dried when “legacy” acquired a negative connotation. (While writing this, I just looked it up. Wikipedia seems to think the term had already acquired its negative connotation in the 1980’s. Seems to me if it were in widespread use someone would have mentioned it before we published that report).
There are multiple definitions of what makes something a legacy system. Generally, it refers to older technology that is still in place and operating. What tends to keep legacy systems in place are networks of complex dependencies. A simple stand-alone program doesn’t become a legacy system, because when the time comes, it can easily be rewritten and replaced. Legacy systems have hundreds or thousands of external dependencies, that often are not documented. Removing, replacing, even updating in place runs the risk of violating some of those dependencies. It is the fear of this disruption that keeps most legacy systems in place. And the longer it stays in place the more dependencies it accretes.
If these were the only forces affecting legacy systems, they would stay in place forever. The countervailing forces are obsolescence, dis-economy, and risk. While many parts of the enterprise depend on the legacy system, the legacy systems itself has dependencies. The system is dependent on operating systems, programming languages, middleware, and computer hardware. Any of these dependencies can and do become obsolescent and eventually obsolete. Obsolete components are no longer supported and therefore represent a high degree of risk of total failure of the system. The two main dimensions of dis-economy are operations and change. A modern system can typically run at a small fraction of the operating costs of a legacy system, especially when you tally up all the licenses for application systems, operating systems and middleware and add in salary costs for operators and administrators. The dis-economy of change is well known. Legacy systems are complex and brittle, which makes change hard. The cost to make even the smallest changes to a legacy system are orders of magnitude more than the cost to make a similar change to a modern well-designed system. They are often written in obscure languages that developers don’t want to invest their careers in. One of my first legacy modernization projects involved replacing a payroll system written in assembler language with one that was to be written in “ADPAC.” You can be forgiven for thinking it insane to have written a payroll system in assembler language, and even more so for replacing with a system written in a language that no one in the 21st century has heard of, but this was a long time ago, and is indicative of where legacy systems come from.
Legacy Modernization
Sooner or later, the pressure to change overwhelms the inertia to leave things as they are. This usually doesn’t end well for a number of reasons. Legacy modernization is usually long delayed. There isn’t a compelling need to change, and as a result for most of the life of a legacy systems resources have been assigned to other projects that get short term net positive returns. Upgrading the legacy system is seen as low upside. The new legacy system will do the same thing the old legacy system did, perhaps a bit cheaper or a bit better, but not fundamentally differently. Your old payroll system is paying everyone, and so will a new one.
As a result, the legacy modernization project is delayed as long as possible. When the inevitable precipitating event occurs, the replacement becomes urgent. People are frustrated with the old system and the project to replace it is highly anticipated. Replacing the legacy system with some more modern system seems like a desirable thing to do. Usually this involves replacing an application system with a package, as this is the easiest project to get approved. These projects were called “Rip and Replace” until the success rate of this approach plummeted. It’s remarkable how expensive these projects are and how frequently they fail. Each failure further entrenches the legacy system and raises the stakes for the next project.
Ms. Bellotti points out in Kill it with Fire, many times the way to go is incremental improvement in place. By skillfully understanding the dependencies, and engineering decoupling techniques, such as APIs and intermediary data sets, it is possible to stave off some of the highest risk aspects of the legacy system. This is preferably to massive modernization projects that fail, but has its own downsides: major portions of the legacy system continue to persist, and as she points out, few developers want to sign on to this type of work.
We want to outline a third way.
The Lost Opportunity
After a presentation on Data-Centricity, someone in the audience pointed out that data-warehousing represented a form of Data-Centricity. Yes, in a way it does. With Data Warehousing, and more recently Data Laking, and even more recently Data Lakehousing, yes you have taken a subset of the data from a large number of silos and put it in one place to make reporting easier. This captures a few of the data-centric tenets.
But what a lost opportunity. Think about it, we’ve spent the last 30 years setting up ETL pipelines and gone through several generations of data warehouses (from Kimball/Inmon roll your own to Teradata, Netezza to Snowflake and dozens more along the way), but have not gotten one inch closer to replacing any legacy systems. Indeed, the data warehouse is entrenching the legacy systems by being dependent on them for their source of data. The industry has easily spent hundreds of billions of dollars, maybe even trillions of dollars over the last several decades, on warehouses and their ecosystems, but rather than getting us closer to legacy modernization we have gotten further from it.
Why No One Will Take You Seriously
If you propose replacing a legacy system with a Knowledge Graph you will get laughed out of the room. Believe me, I’ve tried. They will point out that the legacy systems are vastly complex (they are), have unknowable numbers of dependent systems (they do). They will point out that the enterprise depends on their continued operation for its very existence (it does) and there are few if any reference sites of firms that have done this (also true). Yet, this is exactly what needs to be done, and at this point is the only real viable approach to legacy modernization.
So, if no one will take you seriously, and therefore no one will fund you for this route to legacy modernization, what are you to do? Go into stealth mode.
Think about it: if you did manage to get funded for a $100 million legacy replacement project, and it failed, what do you have? The company is out $100 million, and your reputation is trashed. If instead, you get approval for a $1 Million Knowledge Graph based project that delivers $2 million in value, they will encourage you to keep going. Nobody cares what the end game is, but you.
The answer then, is incremental stealth.
Tacking
It’s much like sailing into the wind. You can’t sail directly into the wind. You must tack, and sail as close into the wind as you can, even though you are not headed directly at your target. At some point you will have gone far to the left of the direct line to your target, and you need to tack to starboard (boat speak for “right”). After a long starboard tack, it’s time to tack to port.
In our analogy, taking on legacy modernization directly is sailing directly into the wind. It doesn’t work. Incremental stealth is tacking. Keep in mind though, just incremental improvement without a strategy is like sailing with the wind (downwind): it’s fun and easy, but it takes you further from your goal not closer.
The rest of this article are what we think the important tacking strategy should be for a firm that wants to take the Data-Centric route to legacy modernization. We have several clients that are on the second and third tack in this series. We are looking for smaller clients that may be able to complete the entire exercise in a reasonable amount of time.
I’m going to use a hypothetical HR/Payroll legacy domain for my examples here, but they apply to any domain.
Leg 1 – ETL to a Graph
The first tack is the simplest. Just extract some data from legacy systems and load it into a Graph Database. You won’t get a lot of resistance to this, as it looks familiar. It looks like yet another data warehouse project. The only trick is getting sponsors to go this route instead of the tried-and-true data warehouse. The key enablers here are to find problems well suited to graph structures, such as those that rely on graph analytics or shortest path problems. Or find data that is hard to integrate in a data warehouse, a classic example is integrating structured data with unstructured data, which is nearly impossible in traditional warehouses, and merely tricky in graph environments.
The only difficulty is deciding how long to stay on this tack. As long as each project is adding benefit, it’s tempting to stay on this tack for a long, long time. We recommend staying this course at least until you have a large subset of the data in at least one domain in the graph and it is being refreshed frequently.
Let’s say after being on this tack for a long while you have all the key data on all your employees in the graph and being updated frequently.
Leg 2 – Architecture MVP
On the first leg of the journey there are no updates being made directly to the graph. Just as in a data warehouse: no one makes updates in place in the data warehouse. It isn’t designed to handle that, and it would mess with everyone’s audit trails.
But a graph database doesn’t have the limitations of a warehouse. It is possible to have ACID transactions directly in the graph. But you need a bit of architecture to do so. The challenge here is creating just enough architecture to get through your next tack. It depends a lot on what you think your next tack will be as to where you start. You probably need constraint management to make sure your early projects are not loading invalid data back into your graph. Depending on the next tack you may need to implement fine grained security.
Whatever you choose, you will need to build or buy enough architecture to get your first update in place functionality going.
Leg 3 – Simple New Functionality in the Graph
In this leg, we begin building update in place use cases. We recommend not trying to replace anything yet. Concentrate on net-new functionality. Some of the current best places to start are maintaining reference data (common shared data such as country codes, currencies, and taxonomies) and/or some meta data management. Everyone seems to be doing data cataloging projects these days; they could just as well be done in the graph and give you some experience and working through learning this new paradigm.
The objective here is to spend enough time on this tack that developers are comfortable with the new development paradigm. Coding directly to graph involves new libraries and new patterns.
Optionally, you may want to stay on this tack long enough to build “model driven development” (low code/no code in Gartner speak) capability into the architecture. The objective of this effort is to drastically reduce the cost of implementing new functionality in future tacks. This will come in very handy when you compare before and after metrics to make the case for the new approach. Or you could leave model driven to a future tack.
Using the payroll/HR example, maybe you add new functionality that depends on HR data, but other things are not dependent on it. Maybe you built a skills database, or a learning management system. It depends on what isn’t yet in place that can be purely additive. These are the good places to start.
Leg 4 – Understand the Legacy System and its Environment
Eventually, you will get good at this and want to actually replace some legacy functionality. Before you do, it will behoove you to do a bunch of deep research. Many legacy modernization attempts have run aground from not knowing what they didn’t know.
There are three things that you don’t fully know at this point:
- What data is the legacy system managing
- What business logic is the legacy system delivering
- What systems are dependent on the legacy system, and what is the nature of those dependencies.
If you’ve done the first three tacks well, you will have all the important data from the domain in the graph. But you won’t have all the data. In fact, at the meta data level, it will appear that you have the tiniest fraction of the data. In your Knowledge Graph, you may have populated a few hundred classes and used a few hundred properties, but your legacy system has tens of thousands of columns. By appearances you’re missing a lot. What we’ve discovered anecdotally, but haven’t proven yet, is that legacy systems are full of redundancy and emptiness. You will find that you do have most of the data you need, but before you proceed you need to prove this.
We recommend data profiling using software from a company such as GlobalIDs, IoTahoe or BigID. This software reads all the data in the legacy system and profiles it. It discovers patterns and creates histograms, which reveal where the redundancy is. More importantly, you can find data that isn’t in the graph and have a conversation about whether it is needed. A lot of data in legacy systems are accumulators (YTD, MTD etc.) that can easily be replaced by aggregation functions, processing flags that are no longer needed, denormalized copies of data and vast number of fields that are no longer used but people are afraid to let go.
Another source of fear is “business logic” hidden in the legacy system. People fear that we don’t know all of what the legacy system is doing and turning it off will break something. There are millions of lines of code in that legacy system, surely it is doing something useful. Actually, no. There is remarkably little essential business logic in most legacy systems. I know I’ve built complex ERP systems and implemented many packages. Most of this code is just moving data from the data base to an API to a transaction to another API, into a conversational control record or the DOM if it’s a more modern legacy system, onto the screen and back again. There is a bit of validation sprinkled throughout which some people call “business logic”, but that’s a stretch, it’s just validation. There is some mapping (when the user selects “Male” in the drop down put “1” in the gender field). And occasionally there is a bit of real business logic. Calculating economic order quantities, critical paths or gross to net payroll calculations are genuine business logic. But they represent far less than 1% of the code base. The trick is to be sure you’ve found them.
This is where reverse engineering or legacy understanding software comes in. Ms. Bellotti is 100% correct on this point as well. If you think these reverse engineer systems are going to automate your legacy conversion, you’re in for a world of hurt. But what they can do is help you find the real business logic and provide some comfort to the sponsors that there isn’t something important that the legacy system is doing that no one knows about.
The final bit of understanding is the dependencies. This is the hardest one to get complete. The profiling software can help. Some can detect when the histogram of social security numbers in system A changes and the next day the same change is seen in system B, there must be an interface. But beyond this the best you can do is catalog all the known data feeds and APIs. These are the major mechanisms that other systems use to become dependent on the legacy system. You will need to have strategies to mimic these dependencies in order to begin the migration.
This tack is purely research, and therefore doesn’t develop any immediate gain. You may need to bundle it with some other project that is providing immediate gain to get it funded or you may fund it out of some form of contingency budget.
Leg 5 – Become the System of Record for Some Subset
Up to this point, data has been flowing into the graph from the legacy system or originating directly in the graph.
Now it’s time to begin the reverse flow. We need to find an area where we can begin the flow going in the other direction. We now have enough architecture to build use cases in the graph, it’s time to start publishing rather than subscribing.
It’s tempting to want to feed all the data back to the legacy system, but the legacy system has lots of data we don’t want to source, and this further entrenches the legacy system. We need to pick off small areas that could decommission part of the legacy system.
Let’s say there was a certificate management system in the legacy system. We replace this with a better one in the graph and quit using the legacy one. But from our investigation above, we realize that the legacy certificate management system was feeding some points to the compensation management system. I am totally making this up, but assume that completing certificates granted points that feed into your annual review. We just make sure the new system can feed the compensation system those points.
Leg 6 – Replace the Dependencies Incrementally
Now, the flywheel is starting to turn. Encouraged by the early success of the reverse flow, the next step is to work out the data dependencies in the legacy system and work out a sequence to replace them.
The legacy payroll system is dependent on the benefit elections system. You now have two choices. You could replace the benefits system in the Graph. Now, you will need to feed the results of the benefit elections (how much to deduct for the health care options etc.) to the legacy system. This might be the easier of the two options.
But the one that has the most impact is the other. Replace the payroll system. You have the benefits data from the feed you are already getting from the legacy system. If you replace the payroll system, there is nothing else (in HR) you need to feed. You will need to feed the financial system and the government reporting system, but you will have taken a much bigger leap in the legacy modernization effort.
Leg 7 – Lather, Rinse, Repeat
Once you’ve worked through a few of those, you can safely decommission the legacy system a bit at a time. Pick off an area that can be isolated. Replace the functionality with an equivalent in the new system. Feed the remaining bits of the legacy infrastructure if need be. Just stop using that portion of the legacy system. The system will gradually atrophy. No need for any big bang replacement. The risk is incremental and can be rolled back and retried at any stage if it doesn’t work.
Conclusion
We don’t go into our clients claiming to be doing legacy modernization, but it is our intent to put them in a position where they could, over time.
We all know that at some point all legacy systems will have to be retired. At the moment, the state of the art seems to be either “rip and replace” usually putting a packaged application (neo-legacy) in in place of the incumbent legacy system, or incrementally improve the legacy system in place. We think there is a safer, and self-funding route to legacy modernization, and it is done through Data-Centric implementation.