The Data-Centric Revolution

This is the first of a regular series of columns from Dave McComb. Dave’s column, The Data Centric Revolution, will appear every quarter. Please join TDAN.com in welcoming Dave to these pages and stop by often to see what he has to say.

We are in the early stages of what we believe will be a very long and gradual transition of corporate and government information systems. As the transition gets underway, many multi-billion dollar industries will be radically disrupted. Unlike many other disruptions, the revenues currently flowing to information systems companies will not merely be allocated to newer more nimble players. Much of the revenue in this sector will simply evaporate as we collectively discover what a large portion of the current amount spent on IT is unnecessary.

The benefits will mostly accrue to the consumers of information systems, and those benefits will be proportional to the speed and completeness that they embrace the change.

The Data Centric Revolution in a Nutshell

In the data centric enterprise, data will be a permanent shared asset and applications will come and go. When your re-ordering system no longer satisfies your changing requirements, you will bring in a new one, and let the old one go. There will be no data conversion. All analytics that worked before will continue to work. User interfaces, names of fields, and code values will be similar enough that very little training will be required.

The Current Status Quo is Application Centric

This is in stark contrast to what occurs now. Currently, if your re-ordering function no longer meets your needs, you will typically create a patchwork of add-on systems. These add-on systems will provide partial solutions, at the expense of ossifying the data structures and interfaces that were established to allow this. Over time, the ability to change the system continues to decline until a lack of functionality threshold is crossed. At this point a new application project is announced that will solve this problem. The new system may be a package or it may be custom. It may be on premise or it may be in the cloud. The only thing that is guaranteed is that the new system will have a different data structure and data model.

If you buy a package and implement a SaaS system, the data model will pre-exist and have no relation to your current system. If you custom build a system, you could in theory recreate your current data structures, however this almost never happens. Designers believe, correctly in most cases, that a good part of the limitations of the existing system stem from its inadequate data structures.

Whatever the reason, the decision to change the data structures has many direct consequences, including:

Data conversion – it makes the very idea of a data conversion a necessary part of the system replacement project

User Retraining – the change in the data structure ripples all the way through to the user interface and work-flow. All the users of the system will need to be retrained to do their jobs. Not only do the number of screens and the sequence change, the names of the fields, the levels of abstractions, and the codes used in the system will all be different. Thousands of small, evolved processes and workarounds will be obsoleted.

Big Bang – due to the data conversion and user training, you will end up with a big bang project. That is, no improvement at all until the conversion occurs. Big bang conversions have been shown to be high risk, and because of that add additional costs just to attempt, sometimes successfully, to mitigate the risks.

Data Quality – most projects of this type spawn a “data quality” sub project. The new system shines a light on data quality problems that have lain dormant for decades. Very often the new system can’t convert until the data quality issues are taken care of. Who can argue with data quality? But the truth is, most of what comes up as data quality issues in the conversion are just differences in integrity management and constraints between the old system and the new system. Perhaps there are more integrity checks in the new system than the old, but most of the effort goes into requirements caused by the fact that they are just arbitrarily different, which is a side effect of the different models.

What is a bit subtle is that the sum total of these implications adds up, in almost every case, to large projects. These are typically projects large enough to need to go through the capital budgeting process (years can be lost here). These are projects that are big enough to fail, and big enough for people to notice when they fail.

The Other Way

Unfortunately, we are going to incur some of the downsides listed above when we first implement data centric. You will have a data conversion. The main goal of a data centric architecture is not to preserve the lousy data structures you have, but to create an environment where your next set of data structures can persist.

A data centric enterprise needs a few things that are currently uncommon, but are all technically quite feasible. We will touch on these briefly here and more completely in subsequent columns.

Some of the key requirements of a data centric architecture:

A federate-able data store – there are many reasons (performance and security being two) to believe that any reasonably sized data centric enterprise will rely on a large number of coordinated data stores that can be queried and updated individually or in groups.

A flexible data structure – current data structures are rigid (not innately flexible), and that promotes rigid coding practices (developers code to the data structures which tends to lock them in place). More flexible structures (XML, json and rdf) promote flexible coding and allow the data structures to be extended in place.

Shared meaning – the data centric enterprise will need a way to communicate and propagate the meaning of the data in the system, as it can no longer rely on the application to be the arbiter of meaning.

Schema later – some call this “schema on read” to emphasize the fact that schema need not exist prior to writing. We prefer the term “schema later”, because it also includes extending the schema to the data in place, and having it available as a shared resource and not just for the current reader.

Curated and uncurated data sets – there will be data sets that are highly curated, where integrity constraints are consistently applied. There will also be harvested and external data sets that are uncurated. The architecture will need a mechanism to enforce constraints, independent of any application on the curated sets, and will need ways to deal with data sets that are of mixed levels of curation.

Identity Management – far beyond the identity of users, the system will need to manage the identities of everything else as well. The system needs to know whether a potential “add” is creating a Person or widget or transaction that the enterprise already has registered, and if so notify the initiator. At the same time, the system will need to manage the inevitable and almost innumerable identity synonyms that our existing system of system-building has created.

A Long, Slow Road

I liken this to the electrification of factories. In the 1800’s, factories ran on steam engines and complex systems of drive shafts, pulleys, and gears. By the 1880’s the electric motor was invented and it was obvious that the future factories would be populated exclusively by electric motors. And yet it was not until the 1920’s that the revolution was mostly complete.

Many things allowed this inevitable transition to drag on for 40 years. Inertia was one. An over reliance on the idea that the steam engine and the gears represented an investment (which they did), but also encourage re-investment There was a complex set of skills and expertise that held the status quo in place far longer than it should have stayed: architects whose expertise lay in designing factories for the large steam apparatus, work flow designers who were oriented around the drive shafts, attempts to replace the monolithic steam engine with a monolithic electric engine, few electric machines to choose from and the like.

But some companies did make it through the transition, and no doubt some more rapidly than others.

We need to remind ourselves that we are in the same position. It can now be obvious and inevitable, but we will still face great resistance, and things will take far longer than they theoretically should. This is the nature of change.

Some of the Themes I Plan to Explore in Future Columns

In no particular order, here are some of the themes I plan to cover in future columns.

How to tell that you’re not already data centric

Data centric is far more than a data warehouse on steroids

Estimates on the order of magnitude improvement possible in the data centric enterprise

How we got in this mess, and what keeps us stuck there

What kind of data structures will endure through generations of applications

Case studies in the impact of not being data centric

Case studies in data centric

The Status Quo will strike back

The difference between a data centric application and a data centric enterprise

What does security look like in a data centric enterprise

How data centric deals with the huge numbers of current applications

Data centric and external and unstructured data

The role of identify management in data centric architectures

What does integrity management look like if it’s not application centric

By the way, if you have any case studies in adopting data centric approach I’d be happy to feature them. They are few and far between and need to get into the collective consciousness.

If you like to declare your support, lend your name and credibility to the cause, and interact with like-minded professionals, please visit and sign the manifesto:

http://datacentricmanifesto.org

MenuMenu

The Data Centric Revolution in a Nutshell

The Current Status Quo is Application Centric

The Other Way

A Long, Slow Road

Some of the Themes I Plan to Explore in Future Columns

Share this post

Dave McComb