The Data Centric Revolution

This is the first of a regular series of columns from Dave McComb. Dave’s column, The Data Centric Revolution, will appear every quarter. Please join TDAN.com in welcoming Dave to these pages and stop by often to see what he has to say.


 

We are in the early stages of what we believe will be a very long and gradual transition of corporate and government information systems. As the transition gets underway, many multi-billion dollar industries will be radically disrupted. Unlike many other disruptions, the revenues currently flowing to information systems companies will not merely be allocated to newer more nimble players. Much of the revenue in this sector will simply evaporate as we collectively discover what a large portion of the current amount spent on IT is unnecessary.

The benefits will mostly accrue to the consumers of information systems, and those benefits will be proportional to the speed and completeness that they embrace the change.

The Data Centric Revolution in a Nutshell

In the data centric enterprise, data will be a permanent shared asset and applications will come and go. When your re-ordering system no longer satisfies your changing requirements, you will bring in a new one, and let the old one go. There will be no data conversion. All analytics that worked before will continue to work. User interfaces, names of fields, and code values will be similar enough that very little training will be required.

The Current Status Quo is Application Centric

This is in stark contrast to what occurs now. Currently, if your re-ordering function no longer meets your needs, you will typically create a patchwork of add-on systems. These add-on systems will provide partial solutions, at the expense of ossifying the data structures and interfaces that were established to allow this. Over time, the ability to change the system continues to decline until a lack of functionality threshold is crossed. At this point a new application project is announced that will solve this problem. The new system may be a package or it may be custom. It may be on premise or it may be in the cloud. The only thing that is guaranteed is that the new system will have a different data structure and data model.

If you buy a package and implement a SaaS system, the data model will pre-exist and have no relation to your current system. If you custom build a system, you could in theory recreate your current data structures, however this almost never happens. Designers believe, correctly in most cases, that a good part of the limitations of the existing system stem from its inadequate data structures.

Whatever the reason, the decision to change the data structures has many direct consequences, including:

  • Data conversion – it makes the very idea of a data conversion a necessary part of the system replacement project
  • User Retraining – the change in the data structure ripples all the way through to the user interface and work-flow. All the users of the system will need to be retrained to do their jobs. Not only do the number of screens and the sequence change, the names of the fields, the levels of abstractions, and the codes used in the system will all be different. Thousands of small, evolved processes and workarounds will be obsoleted.
  • Big Bang – due to the data conversion and user training, you will end up with a big bang project. That is, no improvement at all until the conversion occurs. Big bang conversions have been shown to be high risk, and because of that add additional costs just to attempt, sometimes successfully, to mitigate the risks.
  • Data Quality – most projects of this type spawn a “data quality” sub project. The new system shines a light on data quality problems that have lain dormant for decades. Very often the new system can’t convert until the data quality issues are taken care of. Who can argue with data quality? But the truth is, most of what comes up as data quality issues in the conversion are just differences in integrity management and constraints between the old system and the new system. Perhaps there are more integrity checks in the new system than the old, but most of the effort goes into requirements caused by the fact that they are just arbitrarily different, which is a side effect of the different models.

What is a bit subtle is that the sum total of these implications adds up, in almost every case, to large projects. These are typically projects large enough to need to go through the capital budgeting process (years can be lost here). These are projects that are big enough to fail, and big enough for people to notice when they fail.

The Other Way

Unfortunately, we are going to incur some of the downsides listed above when we first implement data centric. You will have a data conversion. The main goal of a data centric architecture is not to preserve the lousy data structures you have, but to create an environment where your next set of data structures can persist.

A data centric enterprise needs a few things that are currently uncommon, but are all technically quite feasible. We will touch on these briefly here and more completely in subsequent columns.

Some of the key requirements of a data centric architecture:

  • A federate-able data store – there are many reasons (performance and security being two) to believe that any reasonably sized data centric enterprise will rely on a large number of coordinated data stores that can be queried and updated individually or in groups.
  • A flexible data structure – current data structures are rigid (not innately flexible), and that promotes rigid coding practices (developers code to the data structures which tends to lock them in place). More flexible structures (XML, json and rdf) promote flexible coding and allow the data structures to be extended in place.
  • Shared meaning – the data centric enterprise will need a way to communicate and propagate the meaning of the data in the system, as it can no longer rely on the application to be the arbiter of meaning.
  • Schema later – some call this “schema on read” to emphasize the fact that schema need not exist prior to writing. We prefer the term “schema later”, because it also includes extending the schema to the data in place, and having it available as a shared resource and not just for the current reader.
  • Curated and uncurated data sets – there will be data sets that are highly curated, where integrity constraints are consistently applied. There will also be harvested and external data sets that are uncurated. The architecture will need a mechanism to enforce constraints, independent of any application on the curated sets, and will need ways to deal with data sets that are of mixed levels of curation.
  • Identity Management – far beyond the identity of users, the system will need to manage the identities of everything else as well. The system needs to know whether a potential “add” is creating a Person or widget or transaction that the enterprise already has registered, and if so notify the initiator. At the same time, the system will need to manage the inevitable and almost innumerable identity synonyms that our existing system of system-building has created.

A Long, Slow Road

I liken this to the electrification of factories. In the 1800’s, factories ran on steam engines and complex systems of drive shafts, pulleys, and gears. By the 1880’s the electric motor was invented and it was obvious that the future factories would be populated exclusively by electric motors. And yet it was not until the 1920’s that the revolution was mostly complete.

Many things allowed this inevitable transition to drag on for 40 years. Inertia was one. An over reliance on the idea that the steam engine and the gears represented an investment (which they did), but also encourage re-investment There was a complex set of skills and expertise that held the status quo in place far longer than it should have stayed: architects whose expertise lay in designing factories for the large steam apparatus, work flow designers who were oriented around the drive shafts, attempts to replace the monolithic steam engine with a monolithic electric engine, few electric machines to choose from and the like.

But some companies did make it through the transition, and no doubt some more rapidly than others.

We need to remind ourselves that we are in the same position. It can now be obvious and inevitable, but we will still face great resistance, and things will take far longer than they theoretically should. This is the nature of change.

Some of the Themes I Plan to Explore in Future Columns

In no particular order, here are some of the themes I plan to cover in future columns.

  • How to tell that you’re not already data centric
  • Data centric is far more than a data warehouse on steroids
  • Estimates on the order of magnitude improvement possible in the data centric enterprise
  • How we got in this mess, and what keeps us stuck there
  • What kind of data structures will endure through generations of applications
  • Case studies in the impact of not being data centric
  • Case studies in data centric
  • The Status Quo will strike back
  • The difference between a data centric application and a data centric enterprise
  • What does security look like in a data centric enterprise
  • How data centric deals with the huge numbers of current applications
  • Data centric and external and unstructured data
  • The role of identify management in data centric architectures
  • What does integrity management look like if it’s not application centric

By the way, if you have any case studies in adopting data centric approach I’d be happy to feature them. They are few and far between and need to get into the collective consciousness.

If you like to declare your support, lend your name and credibility to the cause, and interact with like-minded professionals, please visit and sign the manifesto:

http://datacentricmanifesto.org

Share

submit to reddit

About Dave McComb

Dave McComb is President of Semantic Arts, Inc. a Fort Collins, Colorado based consulting firm, specializing in Enterprise Architecture and the application of Semantic Technology to Business Systems. He is the author of Semantics in Business Systems, and program chair for the annual Semantic Technology Conference.

  • Richord1

    Great article. I would add however that before we embark on the journey of transition we need to look at the nature of work in organizations. Peter Drucker referred to the knowledge worker decades ago but it appears we have dumbed down work with data. We see many people in organizations pushing data around from one database or spreadsheet to another. Much of the time is spent trying to integrate, “master” or clean data.
    Our data design practices are technology driven. Optimize the design of data to satisfy the database rather than design data to empower the knowledge of workers.
    I envision a transition where white collar jobs will undergo a decline like we witnessed in manufacturing. Those who currently push data will be replaced by algorithms. Organizations will realize that silos of data create opportunities for rework rather than knowledge and as stated, will reduce their IT expenditures.
    We need to leap from data centric to knowledge centric designs. How to create knowledge rather than data. Technologists are not trained or versed in techniques of knowledge creation so we need new skills such as Data Literacy.

  • tfeltz

    Well written overview of the data-centric paradigm, which I strongly support. The biggest challenge is transforming an organisation from application-centric to data-centric in a world where purchased software packages (COTS) with undisclosed/undocumented data models dominate. Starting from scratch is hardly ever not an option, though fully replacing an unmanagable/complex legacy IT with an elegant data-centric architecture might ultimately be the most (cost) effective option … if done well.

  • Gordon Everest

    Dave, I can see you are on a mission to change the world of IS, and an important mission it is. The proponents of data management (i.e., managing our organizational data resources) are too often seen as holding back progress in the development of information, their value not given the recognition it deserves. But deep in our hearts we know it is important to the life of an organization.
    I believe that is a central purpose of DAMA – for people responsible for
    data to come together to commiserate and learn how to get noticed, accepted, and meaningfully contribute to the success of organizations through the use of information systems.

    .. I have long felt and advocated that data resources must be the central focus of IS/IT and the development of information systems in organizations. Get the underlying data “right” and you have some chance of achieving integration (one view), interoperability, sharability, flexibility, evolvability to satisfy new and changing information system requirements.

    .. Thirty years ago I wrote about the “Copernican Revolution” in data processing. [Gordon Everest, “Database Management: Objectives, System Functions, and Administration” McGraw-Hill, 1986, pages 4-7, & 29, and that was an outgrowth of my doctoral dissertation at the University of Pennsylvania “Managing Corporate Data Resources,” 1974]. There I suggested that rather than view the world in terms of data being inputs and outputs of programs (application program centric), we must view the world with data at the center and programs revolving around the data drawing from and adding to the organizational information resources. As Dave McComb correctly observes, the battle of the journey continues.

Top