The Data-Centric Revolution: Data-Centric vs. Application-Centric

Data Centric vs.
Software Wasteland

Dave MccComb’s new book “Software Wasteland: How the Application-Centric Mindset is Hobbling our Enterprises” has just been released.

In it, I make this case that the opposite of Data Centric is Application Centric, and our preoccupation with Application Centric approaches over the last several decades has caused the cost and complexity of our information systems to be at least 10 times what they should be and in most cases we’ve examined, 100 times what they should be.

This article is a summary about how diametrically opposed these two world views are, and how the application-centric mind set is draining our corporate coffers.

An information system is essentially data and behavior.

On the surface, you wouldn’t think it would make much difference with which one you started with if you need both and they feed off each other. But it turns out it does make a difference. A very substantial difference.

What does it do?

The application-centric approach starts with “what does this system need to do?” Often this is framed in terms of business process and/or work flow. In the days before automation, information systems were work flow systems. Humans executed tasks or procedures. Most tasks had prerequisite data input and generated data output. The classic “input / process / output” mantra described how work was organized.

Information in the pre-computer era was centered around “forms.” Forms were a way to gather some prerequisite data, which could then be processed. Sometimes the processing was calculation. The form might be hours spent and pay rate, and the calculation might be determining gross pay.

These forms also often were filed, and the process might be to retrieve the corresponding form, in the corresponding (paper) file folder and augment it as needed.

While this sounds like ancient history, it persists. If you’ve been the doctor recently, you might have noticed that despite decades of “Electronic Medical Records,” the intake is weirdly like it always has been: paper form based.

This idea that information systems are the automation of manual work flow tasks continues. In the Financial Service industry, it is called RPA (Robotic Process Automation) despite the fact that there are no robots. What is being automated are the myriad of tasks that have evolved to keep a Financial Services firm going.

When we automate a task in this way, we buy into a couple of interesting ideas, without necessarily noticing that we have done so. The first is that automating the task is the main thing. The second is that the task defines how it would like to see the input and how it will organize the output. This is why there are so many forms in companies and especially in the government.

The process automation essentially exports the problem of getting the input assembled and organized into the form the process wants. In far too many cases this falls on the user of the system to input the data, yet again, despite the fact that you know you have told this firm this information dozens of times before.

In the cases where the automation does not rely on a human to recreate the input, something almost as bad is occurring: developers are doing “systems integration” to get the data from wherever it is to the input structures and then aligning the names, codes and categories to satisfy the input requirements.

Most large firms have thousands of these processes. They have implemented thousands of application systems, each of which automates anywhere between a handful and dozens of these processes. The “modern” equivalent of the form is the document data structure. A document data structure is not a document in the same way that Microsoft Word creates a document. Instead, a document data structure is a particular way to organize a semi-structured data structure. The most popular now is json (javascript object notation).

A typical json document looks like:

{‘Patient’: {‘id’: ‘12345’, ‘meds’: [ ‘2345’, ‘3344’, ‘9876’] } }

Json relies on two primary structures: lists and dictionaries. Lists are shown inside square brackets (the list following the work ‘meds’ in the above example). Dictionaries are key / value pairs and are inside the curly brackets. In the above ‘id’ is a key and ‘12345’ is the value, ‘meds’ is a key and the list is the value, and ‘Patient’ is a key and the complex structure (a dictionary that contains a both simple values and lists) is the value. These can be arbitrarily nested.

Squint very closely and you will see the document data structure is our current embodiment of the form.

The important parallels are:

The process created the data structure to be convenient to what the process needed to do.
There is no mechanism here for coordinating or normalizing these keys and values.

Process-centric is very focused on what something does. It is all about function.

Aren’t Databased Systems Data-Centric

If we design a database and use that instead of forms or json objects for our data definition, then haven’t we avoided the application-centric mind set?

The answer is “yes,” but only very locally. If you only had one application, you may have achieved data centricity, at least temporarily. The database provides some data centricity for the various processes that share the same data model.

But most large enterprises have hundreds to thousands of applications. Each has its own database. There is some local sharing, but if a firm has 100 applications, each application is only sharing with 1% of the firm.

Many of the applications were created externally, especially “packaged software.” There is no hope that the packaged software from one vendor will have similar data structures to the data structure from another.

Most software have created portholes into their data structures that they call “APIs” (Application Programming Interfaces). These APIs are descriptions of how you must organize and name your data to submit it to the application, and descriptions of how the result will be organized and named. Many APIs are in json these days.

This doesn’t solve the problem. Each application is speaking a different language. It is as if there were no common language for the UN and each nation had to translate to each other nation.

Data Structures and Code Complexity

Each application writes code to deal with the data structures it is ingesting and creating. They do their best to make sure these structures are as simple as they need, but it is inescapable that they are coding to these structures.

Most code in most applications exists to deal with the data structures that have been created. It is surprising how little algorithm or complex processes exist. It is mostly accesses, moving, validating, converting, transposing, summarizing and presenting data structures.

By allowing each process and each application to define their data structures, we have made it nearly impossible for any real sharing to occur. Each application writes code to deal with similar but not identical data.

This is one of the main things that prevents us from seeing the sharing opportunity. If you have 100 applications, each of which have created code to handle their own structured data around sending invoices, and following up to make sure they are paid (Accounts Receivable), you will not be able to see that that you have 100-fold redundancy.

This is a mindset. We are so immersed in it we don’t see it. We think this is normal. Until we see the problem, we will be blind to the solution.

I hope I’ve piqued your interest. “Software Wasteland” attempts to quantify just how bad this situation is, and the vast opportunity for improvement. It also describes why most of the advances over the last 30 years have not helped the core problem, and describes some of the mindsets that keep us stuck. While there are some suggestions for reversing the extreme excess that these approaches have fostered, this book doesn’t offer a comprehensive solution. It offers tactically suggestions, which for many firms could save $10s of millions or even $100s of millions of dollars, but it doesn’t get to the core and to the end game.

I’m taking that up in a trilogy that will hopefully be completed this year: The Data Centric Solution lays out the vision and is written for an executive audience. The Data Centric Pattern Language covers the complex trade-offs and design decision that the modelers and designers will need, and The Data Centric Architecture is a blueprint for developers and architects who wish to build systems based on these principles.

MenuMenu

The Data-Centric Revolution: Data-Centric vs. Application-Centric

Data Centric vs.
Software Wasteland

What does it do?

Aren’t Databased Systems Data-Centric

Data Structures and Code Complexity

Dave McComb

MenuMenu

Data Centric vs. Software Wasteland

What does it do?

Aren’t Databased Systems Data-Centric

Data Structures and Code Complexity

Share this post

Dave McComb

Data Centric vs.
Software Wasteland