Crossing the Data Divide: Metrics Stores Remind Me That Data Work Is Hard

If you haven’t heard about metrics stores yet, they’re “newish,” so you likely will. They are interesting to an extent, but mostly, they feel like a late-night re-run and remind me that data work is hard.

So, what is a metrics store? Most of the young vendors trying to create this category will tell you that it’s a unified way to define and manage business metrics and abstract the confusion and sprawl of layers of BI tools into a common semantic meaning. It sounds like a utopia, right?

Upon further investigation, what becomes obvious to those of us who have been around a while is that it looks a lot like online analytic processing (OLAP) and cubes or, in some flavors, multi-dimensional online analytic process (MOLAP). The general idea is to define common dimensions (customer, date, products) and their hierarchies. Also, to define some precomputed measures. Then, BI and analytic tools can use those definitions to query data. Depending on the query, some of it could be pulled from the middle layer provided by the metrics store or it could drill through to the operational sources.

I’m trying hard to be constructive and not just moan, but come on. We have seen this re-run before. Cognos, Business Objects, Hyperion, Qlik, Tableau, and MicroStrategy, to name a few, have all included this kind of capability in their products for years. If it’s so great and will provide the entire enterprise with a unified semantic meaning and understanding of the business, why hasn’t it? These were all billion-dollar businesses before being acquired. If their push did not do it, how will a fresh crop of startups do it by “reinventing” the concept?

The simple answer is that it won’t, and they won’t. I am not saying that they will not grow, but if they do, it will be because they sell into departments with a narrow set of requirements or who are all just so sick of their current tools that they are willing to give it a go.

Once Upon a Time, We Tried to Find a Shortcut

This topic got me thinking about data modeling, requirements, and data transformation. Way back when, we first pursued dimensional modeling and warehousing to architect a solution for answering a range of analytic questions for specific business processes. The idea was that if a question could not be answered using the model, then the model would be enhanced.

The problem was that it was hard, and hard takes time, and providing time was not what the business could do.

Then came an entire generation of graphical reporting tools, and the message was essentially: It’s not hard. They (IT) just need to empower others and break down barriers to data. So, we started doing that, chipping away at the engineering discipline of data architecture and modeling, which was a young but maturing profession serving the enterprise.

As the chipping away progressed, the capabilities of the graphical reporting tools increased. Surprisingly (sarcasm), SQL query response time was too slow. Primarily, it was because tool users did not know how to write efficient SQL, were poor modelers, were running against operational data, or all of the above. So, the tool vendors introduced modeling capabilities. Theoretically, the tool users could now structure and refresh their data so their queries would run faster. The problem is that you have to know how to model, and you have to know how to transform. So, the tool vendors added data transformation capability. Some departments did ok. After all, the vendors I cited above grew, but from an enterprise perspective, it was siloes on steroids.

Why do so many enterprises have so many BI tools if each is “the answer?” The truth is that each time things got hard, they concluded, with the help of vendors, that they had the wrong tool, so they switched. Some are still cycling through this pattern.

Somewhere along the way, the tool vendors decided that the answer to the speed problem wasn’t to return to a disciplined approach to data, but to add a new mousetrap. The mousetrap was the ability to define common dimensions and precompute metrics — enter OLAP, MOLAP, and HOLAP.

The BI vendors are certainly not the only ones to blame for our mess. There was the age of big data and Hadoop, which was “it,” and now, we are in the age of cloud-based data lakes and hybrid on-premise / cloud core business applications.

What some thought would help us was a bet on the younger generations entering the workforce. They would make the “business” tech-savvy and capable of self-service. To a degree, that is true and has helped, but not being fearful of adopting new technology is far from being someone with professional training and expertise in data modeling, architecture, and data transformation.

Will the Data Mesh Save Us?

Data mesh is interesting to think about in this context. One of its primary pillars is decentralizing data engineering and analytics from the center to business-aligned domains. It essentially says to turn the mess over to each domain and have them produce data products that form an “agreement” for sharing. My fundamental issue with it is the same problem we started with in the days of dimensional data warehouses. Data modeling, architecture, and transformation is a professional discipline that requires specific training and experience. There is no free lunch. We can’t self-service it to the business, re-organize it into decentralized domains, or embed it in fancy tool capabilities.

What About a Return to Defining Business Processes?

Back to metric stores, and in case it’s not perfectly obvious, I don’t think it’s “the thing.”I don’t see it providing the one definition of unified metrics for the enterprise.

I would like to see a return to the days when we paid attention to business processes and business process modeling. It was a thing at one time, and people understood that you modeled the business process for the entire business and then created data models to satisfy the needs and requirements of the business.

Somewhere along the way, as we moved to packaged applications and shifted away from in-house application development, that discipline was lost, and it became expected that the “data people” would become responsible for identifying and satisfying the business requirements.

I would like to see a move back to business-led requirements definition and management. Instead of a rehash of OLAP, how about a metric management platform that makes the business responsible for defining their metrics and keeping data and the analytic professional accountable for delivering on them? This would be a healthier and more productive use of business resources rather than trying to turn them into data professionals.

Silver Lining

The possible silver lining is what’s happening with the Data Engineer role. Enterprises now recognize it as an official role with a career path. I accept this as an admission that data work is hard, and not just anyone with a tool can do it.

We also see a new generation of more tightly integrated tooling supporting the role. In fact, Databricks just announced last week an entire slew of enhanced features for data engineers including, yep, a metric store extension to their Unity Catalog. It’s too early to know if and how it may be more useful than a rehash of OLAP, but at least it’s more tooling for the experts.

My hope is that we are finally getting back to trusting the experts, maturing the discipline of data engineering, and giving them the right tools to move at the speed of the business.

MenuMenu

John Wills