It was a pleasure to attend a data warehousing event in Austin, Texas in early December to take a few courses and to meet with colleagues and practitioners in data management.
One of the interesting conversations I had was with a data architect who wondered whether he should be continuing to focus on using entity-relationship data models to design star schemas with their facts and dimensions. He didn’t ask his questions in these terms, but he was asking, in effect, does the data warehouse still have a place in a modern corporate information architecture?
I thought William McKnight gave a good answer to this question in his conference course, “Introduction to NoSQL for Those Used to SQL.” Early in his presentation he presented a diagram he labeled “The No-Reference Architecture.” It showed just about every data management technology and technique in current use, including graph, columnar, in-memory, and other NoSQL databases, data stream processing, and data in the cloud—but also operational and analytical SQL databases, master data management, a data integration layer, data warehouses, and data marts. Our data management landscape has gotten richer with new technologies and new processing techniques, but these new techniques supplement the old ones, rather than replacing them.
There is a shift in the status of the data warehouse. It no longer rules the roost as the centerpiece of analytics and reporting. It now shares that role with more dynamic graph databases and stream processing. But there will still be needs to produce many of the more static reports that give managers visibility to operations in those parts of the business that remain stable. After all, even Amazon’s giant retail operation doesn’t fundamentally change even though the products being sold are changing all the time. Online giant Facebook, which gave the world the NoSQL Cassandra database, famously announced at the 2013 TDWI Conference in Chicago its intention to develop SQL-based business data reporting.
Outside of analytics and reporting, the newer technologies are making new operations possible. Customers now expect their retailers to suggest products they might also want to buy; information providers are expected to return search results that reflect what the user means by a search and not just pages with similar words; and social networking sites are expected to deduce relationships using data brought together from many diverse sources.
Even though these new technologies are crowding the data management space, very few of the old technologies will fall into disuse. As a result, there are higher demands on data professionals to know an ever-larger mix of technologies and their proper uses. We have already seen an increase in specialization in our profession, from a time when everyone was a “programmer”, to a world with data architects, system architects, data analysts, data scientists, computational linguists, and on and on. We’ve seen this evolution of specialization before, in the medical world. As medical knowledge has increased, there has been increased specialization, and now, there’s practically a different doctor for every organ. I remember when I tore a ligament in my knee, I was warned not to take advice from the back doctor at the orthopedic practice. “He’s the best back doctor there is,” they told me, “but he doesn’t know anything about knees.”
We are seeing the same hyper-specialization in information technology, and it’s neither good nor bad; it’s just necessary. No one can know it all, and more needs to be known every day. So if you’re a data warehouse specialist, it’s important to know that you should refer operational document storage questions to the right NoSQL specialist, but you should still take on those traditional business reporting needs. You might want to learn about graph data analytics and how those could add to the insights your warehouse can produce, but you probably shouldn’t even consider replacing your star schemas with a graph.
Data modeling shifts its place in the new landscape, too, from a purely prescriptive tool—using it to dictate what we design—to a descriptive tool—enabling us to visualize data in whatever schema it arrives. There’s a bit of an interesting paradox in that most of our analysis of unstructured data is looking for the structure—the repeated patterns—that are present there. It sure would be nice to be able to visualize the data that comes to us with no evident structure, and to visualize the structure we’ve found in some subsets of that data. The data modeling tool Hackolade (hackolade.com) is beginning to deliver such capabilities for a variety of NoSQL databases. Key to such tools will be a graphical notation that can express both data structures and data instances, and can represent arrays and nested types—things that the Concept and Object Modeling Notation (COMN) can do, but traditional entity-relationship (E-R) models cannot.
So, if you’re a data warehousing specialist, make room for the new specialists. They probably won’t take any business from you, because they’ll be serving the data needs of users who were formerly served by no one. And if you are one of the new NoSQL types, make sure you learn the basics of data warehousing from professionals already established in the field. They have valuable insights about data management – some learned the hard way. If you can learn a few universal truths from them and figure out how to apply such principles in your work, you will save yourself and your customers from millions of dollars spent racing down blind alleys.
This monthly blog talks about data architecture and data modeling topics, focusing especially, though not exclusively, on the non-traditional modeling needs of NoSQL databases. The modeling notation I use is the Concept and Object Modeling Notation, or COMN (pronounced “common”), and is fully described in my book, NoSQL and SQL Data Modeling (Technics Publications, 2016). See http://www.tewdur.com/ for more details.