I consider Bill Inmon a seer. He predicted (and also shaped) the popularity of the data warehouse in the 1990s. I remember, in 1996, driving into Manhattan with a coworker and entering a packed ballroom to hear Bill Inmon speak about something called a “data warehouse.” There were over 2,000 people who came to hear him speak, and I took one of the last seats, right in the front row. He talked about the key principles for designing and architecting a useful data warehouse, principles which I applied shortly after that presentation.
In 2011, Bill wrote Building the Unstructured Data Warehouse, where he emphasized the value of introducing text into analytics and reporting. Shortly after this book was published, the big data craze went into full swing, with text receiving most of the spotlight.
Fast forward to today. Data Lake Architecture is Bill’s latest book. In a very straightforward and conversational style, similar to his groundbreaking work with the data warehouse and later with unstructured data, he covers the principles for a sound data lake.
Data Lake Architecture is for those who need an explanation of a data lake and need best practices on how to design and architect one. Bill defines the data lake as the place where big data is stored. He provides advice on steering clear of the data garbage dump and building a useful repository for analytics. He covers data lake architecture, including the roles of the five data ponds:
- Analog pond. Storage and analysis of mechanically generated measurements such as electronic eyes, manufacturing control machines, log or journal tapes, and periodic metering measurements.
- Application pond. Storage and analysis of data from one or more applications, similar to the data warehouse environment.
- Textual data pond. Storage and analysis of text and the surrounding metadata to make sense of this text.
- Raw data pond. Where data first enters the data lake, similar to the staging area of a data warehouse.
- Archival data pond. Storage of data whose probable useful life has diminished, but might be needed at some future point in time for analysis.
Bill covers the steps to build each of the ponds and their place in the big data infrastructure. He also discusses how users such as the data scientist would make business decisions based on data lake data. An overview is given as to the different types of analysis possible, including statistical analysis and textual disambiguation. Many examples illustrate the principles throughout the book.
Data Lake Architecture is the first book we are publishing in an audio format! That’s right, learn about data lakes while cruising the interstates, now that is the life. You will also find the book in print format, which although not ideal for those long car rides, will be just the format for relaxing at the beach this summer. And don’t forget it is also available in your favorite e-book formats (Kindle, Apple and Google). Enjoy!
By the way, Bill Inmon will be speaking at Data Modeling Zone (www.DataModelingZone.com), October 17-19 in Portland, Oregon. His first session will be on Crossing the Unstructured Barrier (http://datamodelingzone.com/crossing-the-unstructured-barrier/) and his second on Taxonomies and Ontologies (http://datamodelingzone.com/taxonomies-and-ontologies/ ).
Our next book that will be released is outside the world of IT, and is about being in the moment in stressful situations. I don’t want to take away all of the surprise, but it has to do with firefighters! Until the next column!