In this book, William Kent explores what goes on in the mind of a data modeler when he tries to represent the real world with a limited set of symbols. When we think of information, we think of them as fitting into nicely categorized buckets. But as per Kent, “Information in its real essence is probably too amorphous, too ambiguous, too subjective, too slippery and elusive, to ever be pinned down precisely by the objective and deterministic processes embodied in a computer”.
If Information is so amorphous and elusive, how is the data modeler expected to pin it down to a few concrete entities? Is there a standard set of rules that data modelers can follow? The answer is ‘No’. Hence, different data modelers can make different choices about the same reality. In fact, the same data modeler can make different choices about the same reality at different times and in different contexts. And all these models might be correct – it all depends on the context and the requirements. The choices that we make as data modelers are often arbitrary – but many modelers are not aware of this simple fact.
Once you accept this premise, you will see the value of the book – this book gives you a set of questions to make you think deeply about representing information on a data model. I hope that you read the previous sentence correctly: Kent raises questions, but he does not often give any answers. Data models are at best poor approximations of the complex and messy real world. So, the more you can think about ‘how you think’, the better it will be for you to make more informed choices about representing a real world concept in a data model. There are no simple answers to many of the questions that Kent raises in this book; but just being aware of these questions can make you appreciate the challenges in this endeavor and the limitations of our own approaches to data modeling.
In the past, I have been frustrated many times when I encountered some strange behavior of data. Reading this book makes me realize that these challenges are everywhere and they will bounce up in every project, one time or the other. Kent makes us sit up and take notice of what he is saying by challenging even the most fundamental entities that we take for granted – employee, phone, warehouse, parts, etc. Reading Kent’s questions about these very familiar entities will make you realize how many assumptions we make whenever we think of one of these entities.
If I have to describe the impact this book had on me in one sentence, I would say that this book has made me humble. Data modelers are known to argue passionately for the correctness of their own models and I have to admit that I am one of them. In the past, I have argued about why certain models I have produced were the best. Had I read this book then, I would have been more conscious of the fact that many of my choices were arbitrary and it would have helped me to understand other viewpoints better. Steve Hoberman quotes an example in his commentary about a data modeler standing up on the chair to justify that his model is correct – If that modeler reads this book, he might end up changing his perspective completely.
The only issue I had with the book was that the writing style was a bit dry and sometimes the terminology was outdated. However, Steve Hoberman has bridged the gaps with his commentary. Whenever I felt something was a bit vague, Steve Hoberman has pitched in with examples. In addition, he shares a lot of insights based on his real-world experience.
If you work in any data integration exercise, whether as a data modeler or as a business analyst, you will benefit a lot by understanding the typical challenges you will encounter in describing reality which are thoroughly looked at in this book. If you understand these challenges, you will approach your next project very differently: you will understand that defining the requirements is actually more than 50% of the data modeling effort. So, you will invest a lot more time in making sure that you avoid many of the problems that Kent expounds. And more importantly, you will be more at peace with yourself as you encounter the strangeness of real-world data.