Also in 2011, Philip Howard at Bloor Group proclaimed, “The EDW is dead. Period. Like a dodo. Like Monty Python’s parrot.” More recently, Stephen Smith’s July post on The Demise of the Data Warehouse generated a lot of interest and many comments.
“The reports of my death
have been greatly exaggerated.”
Yet, a 2015 survey conducted by Dimensional Research and cited by Snowflake Computing shows that:
- 99% of respondents see their data warehouse as important for business operations
- 70% are increasing their investment in data warehousing
I believe that the data warehouse is alive despite the many declarations of death. Nearly every enterprise has a data warehouse and many have more than one. So why do we see so many premature obituaries? They derive from the many struggles of data warehousing. Note that I say the data warehouse is alive. I don’t say “alive and well.” Data warehousing faces many challenges, but let’s not confuse challenged with terminal.
We’re sometimes too quick in technical fields to see new technologies as replacements for older technologies. COBOL for example was declared dead in the mid-1980s. Yet COBOL has a role today in healthcare for 60 million patients daily, 95% of ATM transactions, and more than 100 million lines of code at the IRS and Social Security Administration alone. And it’s not only COBOL. Just last week I read an article declaring death for the popular programming language Ruby. In 2013 SQL was declared dead, yet thousands of SQL job postings can be found on the web today.
So, please read these proclamations of data warehouse demise with a healthy dose of skepticism. The data warehouse is alive but it faces many challenges. It doesn’t scale well, it has performance bottlenecks, it can be difficult to change, and it doesn’t work well for big data. It certainly hasn’t lived up to the promises of the past. Smith is right that the single version of the truth still eludes us.
Data warehouses still meet the information needs of people and continue to provide value. Many people use them, depend on them, and don’t want them to be replaced with a data lake. Data lakes serve analytics and big data needs well. They offer a rich source of data for data scientists and self-service data consumers. But not all data and information workers want to become self-service consumers. Many – perhaps the majority – continue to need well-integrated, systematically cleansed, easy to access relational data that includes a large body of time-variant history. These people are best served with a data warehouse.
The data warehouse needs to be modernized. Migrating to the cloud resolves many data warehousing challenges. Scalability and elasticity are well-known cloud benefits. Cloud data warehousing also brings benefits of managed infrastructure, cost savings, rapid deployment, and fast processing. Data warehousing expectations need to be reset. It will never be the one-size-fits-all data repository, and the single version of the truth will continue to elude us. With realistic expectations, however, data warehouses become an integral part of comprehensive data management and information services architecture.
It is not my intent to discount or dismiss the views of others. Michael Hiskey is a smart man and an innovative thinker. Stephen Smith is a smart man and every conversation with him is thought provoking. I’ve not met Philip Howard but I assume from his research and writing that he has similar qualities.
Smith’s vision of DL + MDM is a good beginning. I think it needs to be extended to become DL + MDM + DW. Architectural views may vary: warehouse inside the lake vs. warehouse along side the lake, for example. Regardless of architecture, all three approaches – DL, MDM, and DW – contribute value and capabilities to adaptable and comprehensive data management strategy.
Don’t discount or decommission your existing data warehouse. Don’t relegate it to the legacy junk pile. You need it, but you need to modernize it. And while planning for modernization, think about next-generation MDM too.