The Data Centric Revolution: Data Centric vs. Application Centric

COL02x - feature image for mccombData Centric vs.
Software Wasteland

Dave MccComb’s new book “Software Wasteland: How the Application-Centric Mindset is Hobbling our Enterprises” has just been released.

In it, I make this case that the opposite of Data Centric is Application Centric, and our preoccupation with Application Centric approaches over the last several decades has caused the cost and complexity of our information systems to be at least 10 times what they should be and in most cases we’ve examined, 100 times what they should be.

This article is a summary about how diametrically opposed these two world views are, and how the application-centric mind set is draining our corporate coffers.

An information system is essentially data and behavior.

On the surface, you wouldn’t think it would make much difference with which one you started with if you need both and they feed off each other.  But it turns out it does make a difference.  A very substantial difference.

Screen Shot 2018-03-04 at 10.20.54 PM

What does it do?

The application-centric approach starts with “what does this system need to do?” Often this is framed in terms of business process and/or work flow.  In the days before automation, information systems were work flow systems.  Humans executed tasks or procedures.  Most tasks had prerequisite data input and generated data output.  The classic “input / process / output” mantra described how work was organized.

Information in the pre-computer era was centered around “forms.”  Forms were a way to gather some prerequisite data, which could then be processed.  Sometimes the processing was calculation.  The form might be hours spent and pay rate, and the calculation might be determining gross pay.

These forms also often were filed, and the process might be to retrieve the corresponding form, in the corresponding (paper) file folder and augment it as needed.

While this sounds like ancient history, it persists.  If you’ve been the doctor recently, you might have noticed that despite decades of “Electronic Medical Records,” the intake is weirdly like it always has been: paper form based.

This idea that information systems are the automation of manual work flow tasks continues.  In the Financial Service industry, it is called RPA (Robotic Process Automation) despite the fact that there are no robots.  What is being automated are the myriad of tasks that have evolved to keep a Financial Services firm going.

When we automate a task in this way, we buy into a couple of interesting ideas, without necessarily noticing that we have done so.  The first is that automating the task is the main thing.  The second is that the task defines how it would like to see the input and how it will organize the output.  This is why there are so many forms in companies and especially in the government.

The process automation essentially exports the problem of getting the input assembled and organized into the form the process wants.  In far too many cases this falls on the user of the system to input the data, yet again, despite the fact that you know you have told this firm this information dozens of times before.

In the cases where the automation does not rely on a human to recreate the input, something almost as bad is occurring: developers are doing “systems integration” to get the data from wherever it is to the input structures and then aligning the names, codes and categories to satisfy the input requirements.

Most large firms have thousands of these processes.  They have implemented thousands of application systems, each of which automates anywhere between a handful and dozens of these processes.  The “modern” equivalent of the form is the document data structure.  A document data structure is not a document in the same way that Microsoft Word creates a document. Instead, a document data structure is a particular way to organize a semi-structured data structure.  The most popular now is json (javascript object notation).

A typical json document looks like:

{‘Patient’: {‘id’: ‘12345’, ‘meds’: [ ‘2345’, ‘3344’, ‘9876’] } }

Json relies on two primary structures: lists and dictionaries.  Lists are shown inside square brackets (the list following the work ‘meds’ in the above example).  Dictionaries are key / value pairs and are inside the curly brackets.  In the above ‘id’ is a key and ‘12345’ is the value, ‘meds’ is a key and the list is the value, and ‘Patient’ is a key and the complex structure (a dictionary that contains a both simple values and lists) is the value.  These can be arbitrarily nested.

Squint very closely and you will see the document data structure is our current embodiment of the form.

The important parallels are:

  • The process created the data structure to be convenient to what the process needed to do.
  • There is no mechanism here for coordinating or normalizing these keys and values.

Process-centric is very focused on what something does.  It is all about function.

Aren’t Databased Systems Data-Centric

If we design a database and use that instead of forms or json objects for our data definition, then haven’t we avoided the application-centric mind set?

The answer is “yes,” but only very locally.  If you only had one application, you may have achieved data centricity, at least temporarily.  The database provides some data centricity for the various processes that share the same data model.

But most large enterprises have hundreds to thousands of applications.  Each has its own database. There is some local sharing, but if a firm has 100 applications, each application is only sharing with 1% of the firm.

Many of the applications were created externally, especially “packaged software.” There is no hope that the packaged software from one vendor will have similar data structures to the data structure from another.

Most software have created portholes into their data structures that they call “APIs” (Application Programming Interfaces).  These APIs are descriptions of how you must organize and name your data to submit it to the application, and descriptions of how the result will be organized and named.  Many APIs are in json these days.

This doesn’t solve the problem.  Each application is speaking a different language.  It is as if there were no common language for the UN and each nation had to translate to each other nation.

Data Structures and Code Complexity

Each application writes code to deal with the data structures it is ingesting and creating.  They do their best to make sure these structures are as simple as they need, but it is inescapable that they are coding to these structures.

Most code in most applications exists to deal with the data structures that have been created.  It is surprising how little algorithm or complex processes exist.  It is mostly accesses, moving, validating, converting, transposing, summarizing and presenting data structures.

By allowing each process and each application to define their data structures, we have made it nearly impossible for any real sharing to occur. Each application writes code to deal with similar but not identical data.

This is one of the main things that prevents us from seeing the sharing opportunity.  If you have 100 applications, each of which have created code to handle their own structured data around sending invoices, and following up to make sure they are paid (Accounts Receivable), you will not be able to see that that you have 100-fold redundancy.

This is a mindset.  We are so immersed in it we don’t see it.  We think this is normal. Until we see the problem, we will be blind to the solution.

I hope I’ve piqued your interest.  “Software Wasteland” attempts to quantify just how bad this situation is, and the vast opportunity for improvement.  It also describes why most of the advances over the last 30 years have not helped the core problem, and describes some of the mindsets that keep us stuck.  While there are some suggestions for reversing the extreme excess that these approaches have fostered, this book doesn’t offer a comprehensive solution. It offers tactically suggestions, which for many firms could save $10s of millions or even $100s of millions of dollars, but it doesn’t get to the core and to the end game.

I’m taking that up in a trilogy that will hopefully be completed this year: The Data Centric Solution lays out the vision and is written for an executive audience.  The Data Centric Pattern Language covers the complex trade-offs and design decision that the modelers and designers will need, and The Data Centric Architecture is a blueprint for developers and architects who wish to build systems based on these principles.

Share

submit to reddit

About Dave McComb

Dave McComb is President of Semantic Arts, Inc. a Fort Collins, Colorado based consulting firm, specializing in Enterprise Architecture and the application of Semantic Technology to Business Systems. He is the author of Semantics in Business Systems, and program chair for the annual Semantic Technology Conference.

  • Richord1

    I agree that a more data aware approach is required but I don’t think software development project failures will be significantly reduced. Most of the project failures are the result of ineffective human dynamics rather than poor software development practices. Typically a lack of collaboration, power struggles, clinging to the past and attempts to retain autonomy are the causes of failed projects.

    A data centric approach does little to address these human factors. I think it is possible to predict which projects will fail or stall early in the project by observing the behaviors of those involved and equally critical, not involved in the project.

    For example, the senior management at a large bank were concerned that many projects would progress normally and then suddenly fail or stall. Over budget, behind schedule and not meeting requirements. They asked if there was anyway to get a sense of a project before it reached the failure stage. The solution was to monitor the sentiment of the e-mails exchanged between the project team participants. This turned out to be a good predictor of projects that would fail. Most of the sentiments related to behaviors between participants or non-participants rather than technology or methodology problems.

    Until we address the human dynamics, projects will continue to fail at the same rate regardless of technology or software development methodology.

    However, a data concentric approach,using data as the common goal and purpose of a project can help to re-balance a project to reduce the current technical and software dominance.

  • David Grover

    I really like this idea. The model makes a lot of sense: While we started using forms to collect data, we’ve now made the automation of the completion of forms the main purpose of software development.

    I’ve long thought the entire point of an application was, however, just to capture a tiny bit of judgment as a response to a whole lot of structured data. Applications are really just mechanisms to capture judgments that can’t be automated. Historically our ability to automate judgment didn’t extend very far; we couldn’t even really get a computer to figure out which strings were addresses and which were first names, so we needed a human to do that. But understanding the process your way helps us get past a form-based meta-architecture.

Top
We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept