Remember the Sears catalog? Hundreds of pages of clothes quickly flipped past to get to the toys! Who didn’t spend countless hours trying to find the perfect suggested gift for your next birthday or holiday?
In an Amazon and eBay world, thinking about old-school catalogs seems so quaint. Opening a paper volume to manually flip through enormous lists of options and then spending a long time finding the right answer.
How is this possibly the best way to help people find the data they need today?
It’s not.
To be fair, today’s data catalogs aren’t printed on paper. They are electronic and enabled with much more robust search abilities than indexes in the back. The lists are still there though, and I don’t know if I’ve seen a data catalog that is anywhere near as enjoyable or well-curated as the old Sears catalog was. There also aren’t any drum sets, which is a pretty big miss if you ask me.
The main issue with data catalogs is not that they are useless—finding and understanding data to work with is important! But these functions are only part of the solution. Once we have learned everything about what we want to do, we still need to DO SOMETHING!
In the Sears catalog days, when we wanted to place an order, we either had to call the 800 number or send back that weirdly folded envelope stapled into the middle pages. In today’s data catalog world, we need to bounce over to another application outside the data catalog to create a visualization, do analysis, or otherwise serve a data-driven function that results in data value to the organization. This is the dream, anyway.
This 1980’s-inspired approach to finding and working with data is antiquated. Just because data management folks tend to be separate from data engineering, who tend to be separate from applications development—this is no excuse. Data catalogs are really an incomplete answer to a much larger data lifecycle problem. We got stuck at this incomplete result because it was more convenient to keep our working silos than address the whole problem.
Before we discuss the right answer, let’s first understand what is missing. Data catalogs typically provide a mostly decoupled human interface to contextual information about the data assets and related systems of an organization. The best data catalogs combine both business and technical metadata and serve a wide variety of users with a wider variety of use cases. It’s like listing the adjectives related to the nouns of data.
Some data catalogs can refresh some of their metadata through automated data crawlers and the like, hence the “mostly” above. Rarely do today’s data catalogs do an effective job capturing the changes of data assets and systems without a healthy amount of human effort. Resultingly, data catalogs usually fall behind reality in some ways before they even make it to production, let alone stay up to date afterwards. Strike one!
Data catalogs are generally standalone affairs, requiring people who want to know information in the catalog to use the catalog directly. Though security and access controls can be addressed through groups, single sign-on (SSO), and other techniques with a little effort, there’s a bigger problem. It is that we now need to make potential data catalog users aware that the data catalog exists, and then train them how to use it! A system that neither creates the data nor does the analysis—not exactly what most sane people want to spend a lot of time learning. A big strike two!
Finally, the piece that seals their fate is data catalogs are still mired in the “look for something, then review the list” nonsense of 1980s paper catalogs. The data catalog contains context and descriptions, but then leaves it to the human mind to decide and then pick up the phone and call 1-800-GIMME-THAT-DATA. This is just bananas. Strike three!
We need to find a way to deliver much more than current data catalogs, and I see this reasonably happening in one of two ways:
Option 1 – Lose the Interface
Data catalogs have a lot of useful descriptions, and if we can make those available as part of other data tools we use, we amplify our abilities with those tools without adding as much friction as comes with today’s standalone applications that require marketing, training, design, etc. The benefit of this is we can do similar catalog activities compared to what we are already doing, but work out APIs and other background connectors to keep the user-side asks more limited.
The downside is that many data applications we would want to overlay or feed information to won’t likely play nice, so we may not be able to achieve everything we want. At least we are getting the data catalog away from being something standalone and putting it alongside the data capabilities that really matter!
Option 2 – Expand from Data Catalog to Knowledge Engine
What data catalogs do, even today, is important—it just isn’t enough. So why not expand the footprint entirely and cover everything in the data lifecycle after data creation? Truly integrating the informative context from the data catalog alongside real data capabilities! It’s this combination of insights and computational horsepower that will lead us to knowledge and the ability to transform our businesses for the better with data!
The benefits are clear. We would gain much clearer understanding of where data is and how to use it, with less friction in the process of making it happen. We would not have to remember that the data catalog is a thing that we need to learn how to use and navigate separately. Additionally, this would allow us to eventually leverage more powerful machine learning and artificial intelligence techniques to identify relationships amongst data assets—things that today are arduously manual processes.
The challenge here is that it will be hard, especially with our siloed personnel and self-imposed blinders that have us saying things like “that’s not my job” or “I’m not technical.” It’s time to recognize that it isn’t about the work we are comfortable doing. It is about the organizations we are here to improve with the data capabilities we must create.
Let’s be more willing to question these tools being sold to us—and if our job is to build the tools, let’s think more about what truly benefits our customers instead of what is easiest to monetize. And until next time, go make an impact!