The Data Centric Revolution: Data Centric’s Role in the Reduction of Complexity

mccombComplexity Drives Cost in Information Systems

A system with twice the number of lines of code will typically cost more than twice as much to build and maintain.

There is no economy of scale in enterprise applications.  There is dis economy of scale.   In manufacturing, every doubling of output results in predictable reduction in the cost per unit.  This is often called a learning curve or an experience curve.

Just the opposite happens with enterprise applications.  Every doubling of code size means that additional code is added at ever lower productivity.  This is because of complex dependency.  When you manufacture widgets, each widget has no relationship to or dependency on, any of the other widgets.  With code, it is just the opposite.  Each line must fit in with all those that preceded it.  We can reduce the dependency, with discipline, but we cannot eliminate it.

If you are interested in reducing the cost of building, maintaining, and integrating systems, you need to tackle the complexity issue head on.

The first stopping point on this journey is recognizing the role that schema has in the proliferation of code.  Study software estimating methodologies, such as function point analysis, and you will quickly see the central role that schema size has on code bloat.  Function point analysis estimates effort based on inputs such as the number of fields on a form, the elements in a transaction, or the columns in a report.  Each of these is directly driven by the size of the schema.  If you add attributes to your schema they must show up in forms, transactions, and reports, otherwise, what was the point?

I recently did a bit of forensics on a popular and well known high quality application: Quick Books, which I think is representative.  The Quick Books code base is 10 million lines of code.  The schema consists of 150 tables and 7500 attributes (or 7650 schema concepts in total).  That means that each schema concept, on average, contributed another 1300 lines of code to the solutions.  Given that most studies have placed the cost to build and deploy software at between $10 and $100 per line of code (it is an admittedly large range but you have to start somewhere) that means that each attribute added to the schema is committing the enterprise to somewhere between $13K and $130K of expense just to deploy, and probably an equal amount over the life of the product for maintenance.

I’m hoping this would give data modelers a bit of pause.  It is so easy to add another column, let alone another table to a design; it is sobering to consider the economic impact.

But that’s not what this article is about.  This article is about the insidious multiplier effect that not following the data centric approach is having on enterprises these days.

Let us summarize what is happening in enterprise applications:

  • The size of each application’s schema is driving the cost of building, implementing, and maintaining it (even if the application is purchased).
  • The number of applications drives the cost of systems integration (which is now 30-60% of all IT costs).
  • The overlap, without alignment, is the main driver of integration costs (if the fields are identical from application to application, integration is easy; if the applications have no overlap, integration is unnecessary).

We now know that most applications can be reduced in complexity by a factor of 10-100.  That is pretty good.  But the systems of systems potential is even greater.  We now know that even very complex enterprises have a core model that has just a few hundred concepts.  Most of the rest of the distinctions can be made taxonomically and not involve programming changes.

When each sub domain directly extends the core model, instead of the complexity being multiplicative, it is only incrementally additive.

We worked with a manufacturing company whose core product management system had 700 tables and 7000 attributes (7700 concepts).  Our replacement system had 46 classes and 36 attributes (82 concepts) – almost a 100-fold reduction in complexity.  They acquired another company that had their own systems, completely and arbitrarily different, smaller and simpler at 60 tables and 1000 attributes or 1060 concepts total.  To accommodate the differences in the acquired company we had to add 2 concepts to the core model, or about 3%.

Normally, trying to integrate 7700 concepts with 1060 concepts would require a very complex systems integration project.  But once the problem is reduced to its essence, we realize that there is a 3% increment, which is easily managed.

What does this have to do with data centricity?

Until you embrace data centricity, you think that the 7700 concepts and the 1060 concepts are valid and necessary.  You’d be willing to spend considerable money to integrate them (it is worth mentioning that in this case the client we were working with had acquired the other company ten years ago and had not integrated their systems, mostly due to the “complexity” of doing so).

Once you embrace data centricity, you begin to see the incredible opportunities.

You don’t need data centricity to fix one application.  You merely need elegance.  That is a discipline that helps guide you to the simplest design that solves the problem.  You may have thought you were doing that already.  What is interesting is that real creativity comes with constraints.  And when you constrain your design choice to be in alignment with a firms’ “core model,” it is surprising how rapidly the complexity drops.  More importantly for the long-term economics, the divergence for the overlapped bits drops even faster.

When you step back and look at the economics though, there is a bigger story:

The total cost of enterprise applications is roughly proportional to:

mccomb01

These items are multiplicative (except for the last which is a divisor).   This means if you drop any one of them in half the overall result drops in half.  If you drop two of them in half the result drops by a factor of four, and if you drop all of them in half the result is an eight-fold reduction in cost.

Dropping any of these in half is not that hard.  If you drop them all by a factor of ten (very do-able) the result is a 1000 fold reduction in cost.  Sounds too incredible to believe, but let’s take a closer look at what it would take to reduce each in half or by a factor of ten.

Number of apps

Most large enterprises have thousands of multiuser applications that are centrally supported.   It is not hard to imagine cutting this number in half.  Indeed many “rationalization” projects are aimed at just this.  The only caveat is, in doing so, be careful you don’t increase the average schema complexity per app (converging a bunch of simple apps onto an ERP system might seem like a complexity reduction until you realize how unnecessarily complex most ERP systems are). Most companies could profitably reduce their total number of apps by a factor of 5-10.

Schema / App

We are always amazed at how complex individual application schema can become.  We’ve yet to find one that isn’t 10x more complex than it needs to be.  The more amazing thing is the discipline that brings the complexity down also reduces unnecessary inter-application overlap.

Code / Schema

If you look very hard at application code (which we have done repeatedly) you find that most of it is mindless repetition.  Gartner have, finally, recognized this and identified the “low code/no code” movement.  This is really two movements with two different sponsors, but they come to almost the same place.  “Low code” is for developers who would like to automate a lot of what they do through code generation.  In some ways, this is just a newer better CASE tool from the 80’s (which is fine).  The “no code” movement is environments that allow non-programmers to define application behavior through models.

We were pioneers in this movement (we have the first patents from the late 1990’s).  What we have learned in the ensuing 20 years is that the optimal position is to create most of the behavior through model driven development, but then allow the high value use cases to be built in a custom way, while still adhering to the constraints set up in the model driven environment.

Our current belief is that now 80-90% of enterprise use cases can be handled with purely model driven approaches, and the remaining 10-20% will use far less code to create a highly customized experience.  Our new target is to have 80% of the use cases require zero application lines of code and 20% require 1000 for an average of around 200 lines of code per use case.  Compare this to QuickBooks, which has 100-200 use cases and therefore 50K -100K lines of code per use case, and you can start to see the scope of the change.

Schema Overlap

Most enterprise applications have a lot of overlap (therefore systems integration is a business, if there were no overlap there would be no need for integration).  It is mostly very ad-hoc, and discovered after the fact.

If every time a new application was created we knew exactly what concepts were shared with the enterprise and which were locally created, it would be trivial to do integration.

The metric here should be unplanned overlap.  If there is no overlap, then the unplanned overlap is 0%. If a new system is designed and built as an extension to the core the “unplanned” overlap is also 0%, because we know ahead of time the overlap.

It is the case where we “accidently” create schema overlap that we must resolve later that causes “schema overlap.”  Anytime you buy a package application that has anything to do with your core business, (which most of them do) you have committed yourself to “accident schema overlap.”

Adding it up

It is hard to break existing habits.  But this is exactly what we must do.

Even existing habits as useful as agile development (of which we are huge fans) need to be re-examined in light of: is this creating more silos?

Our prescription is:

  1. Define your enterprise core model / ontology (this is 100 times simpler than your enterprise data model which attempted to be the union of all your application models).  The core enterprise model is the essential core semantic model of the important concepts and their relationships.
  2. Inventory your current applications in terms of what subset of the core they are empowering, and are they different from the other applications dealing with the same type of items?
  3. Pilot a new application built right on the core model, which will showcase that it is possible to build an application that is pre-integrated and has far fewer lines of code per function than traditional applications.
  4. Estimate the savings that would be made by accelerating this pilot into the application space in general.

Summary

There are huge (100-1000 fold) improvements possible by adopting some of the data centric principles.  Some of these come from individual application modeling elegance.  Some come from sharing large parts of a model and thereby making integration very simple.  And some come from eliminating the need for code for many application user cases.

When you add these all up, the impact is huge.  It is not 10-20% improvement for a given application (which will easily be achieved), but getting a firm hold on the path of where the really big improvements lay.

Share

submit to reddit

About Dave McComb

Dave McComb is President of Semantic Arts, Inc. a Fort Collins, Colorado based consulting firm, specializing in Enterprise Architecture and the application of Semantic Technology to Business Systems. He is the author of Semantics in Business Systems, and program chair for the annual Semantic Technology Conference.

Top