A bombshell was recently dropped into the middle of our latest Agile project: a directive from management about a new data governance policy – absolutely no replication of data! From now on, all access to data outside a particular application database must be done via web services written by our integration group and implemented on our integration hub server. Our management refers to this as “an SOA approach to data governance.”
As a data management professional, it might be assumed that I would be happy about this, but I’m not. From where I sit (in the middle of our project’s Ground Zero), this is an absolutely disastrous decision, on several levels.
First, it severely impacts our project’s schedule. Writing, implementing, and testing web services takes time, and most of this work must be done by resources external to – and therefore not committed to – the project. A data contract must be negotiated for each web service, the external work must be managed, and test cases must be created and executed. Not to mention the application code which must be written (and tested) to call the web services. As I’ve mentioned before, our project has an inflexible, non-negotiable deadline. Taking on this extra work will make it harder for our team to meet this deadline.
Second, it introduces additional layers of complexity into the application. In addition to the application layer and the database layer, we now have a web service layer and an integration layer. Additional complexity means a greater maintenance burden and a greater potential risk of failure.
Third, it relegates the database to being nothing more than a persistence mechanism for application-specific data. If you’ve read my writings over the years, you know that my approach has always been to create application-neutral databases that encompass a particular subject area of the business, and which can be used to support multiple applications and multiple business uses of the data (including business analysis, reporting, quality improvement, relationship management, strategic planning, etc.). The SOA approach means that only data generated by a particular application will be stored in that application’s database and that all access to data (and use of the data) must be done via the application. This precludes any useful ad hoc querying of the data for any other business purposes!
Our current project involves aftermarket support of our company’s product. My original intention was to merge each product’s “as manufactured” data (stored in an IMS database on the mainframe) with updates to the product captured during servicing and support, to create a queryable “as maintained” view of each of our company’s products in the aftermarket. I think that data of this sort would be immensely valuable to our company’s business divisions. Alas, this is not to be.
One important thing that’s been overlooked in this decision is that relational databases exist for the purpose of facilitating ad hoc querying and reporting of data. If we wanted to keep our data secure, pristine, and accessible only to applications, we could keep it in IMS databases on the mainframe, and abandon the relational architecture altogether!
I believe that there is a crucial distinction to be made between the concepts of data management and data governance. The purpose of data management, properly understood, is to maximize the business value (ROI) of data. The purpose of data governance, as far as I can tell, is to establish centralized control of data quality so that its purity can be protected from the unwashed masses of business users who might, God forbid, copy it into Excel spreadsheets or Access databases and try to change it.
The question that needs to be asked here is whether, and to what extent, the business is being helped or harmed by replicated data. Our data managers are constantly railing about replicated data that exists in multiple databases, but the reality is that our company runs a lot of applications, and supports a lot of critical business processes, using replicated data. In fact, as I’ve often stressed, enabling the easy reuse of data by the business is the key to increasing the business value of data.
Does this mean that we shouldn’t be concerned about data quality? Absolutely not! Ensuring the quality of data is also key to enabling its reuse. Just as an application component won’t be reused if it’s found to be buggy, data won’t be reused if it’s found to be incorrect or untrustworthy. However, managing data to ensure quality is not the same thing as locking it up to ensure purity. Rather than insisting on “a single source of the truth” (a phrase I absolutely detest), it is more important to know how (and to what extent) a given piece of data residing in a particular location is authoritative to a certain area of the business. As long as we can say, “the authoritative source of data for as-manufactured products is the IMS database,” it shouldn’t matter whether that data is then replicated to some more easily accessible location (as long as the cost of doing so does not exceed the added value to the business!).
It also needs to be understood that replicating a subset of data from one location to another can often change the business context of that data. As mentioned above, the purpose of replicating the mainframe data was not to make it more easily accessible (although I don’t think this would be a bad thing), it was to change the business context of that data from an “as manufactured” view to an “as maintained” (aftermarket) view. Not all of the “as manufactured” data would have been replicated; only that portion of the data necessary and sufficient to give business users a complete aftermarket view of a product.
The emphasis on “data governance” reminds me, uncomfortably, of the old Catholic church, with its focus on centralized control of people’s spiritual lives (only priests were allowed to conduct the Mass, or interpret the Bible). Martin Luther’s Reformation, with its concept of the “Priesthood of the Believer,” gave people the freedom to conduct their own spiritual lives as they see fit.
What’s needed here is, if not a Reformation, at least a Vatican II of data governance, an approach which acknowledges that users have a right to quickly and easily access the data they need to manage their business. We absolutely do not want to return to the “bad old days” of mainframe-centric, centralized IT, where users who needed data often had to wait months for a specialized report or data extract to be created. In today’s complex, fast-paced, Agile environment, centralized control of data simply won’t work!
NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me. Thanks for reading!