Book Review: UML and Data Modeling

I always look forward to a new book by David Hay ever since I discovered his first book (Data Model Patterns: Conventions of Thought) in the mid-1990s at the Borders bookshop at Pentagon City when I was making one of my regular trips across the Atlantic to visit data modelling colleagues working with the US Army. Most of these visits were extremely frustrating. My team was responsible for data management for the British Army and, to put it bluntly, we did not see eye to eye with our American colleagues over how to approach data modelling. This was not very helpful because we would need to be interoperable if (or, perhaps I should say, when) both armies deployed together on operations. How refreshing it was, therefore, to find an American author who not only thought like us but even used the same data modelling notation as us.

This latest book does more than it says on the cover. What David sets out to do is to show how, with a few tweaks to both the notation and the thought processes, you can develop a model that represents the business information needs of an organisation using UML – what the data modelling community normally calls a conceptual data model. David recognizes that not only are the thought processes that underlie the two modelling techniques – entity-relationship modelling and object-oriented modelling – basically different but that the thought processes are far more important than the notation used.  So the book starts with two introductions: one for data modellers and the other for UML modellers.  He then goes on to explain the techniques to develop what he is now calling an architectural data model using UML. Along the way he justifies the change of name from conceptual data model to architectural data model, talks about aesthetics and best practice and gives a worked example. 

All of this, with a forward and preface, in 123 pages. But the book is 233 pages long, so what is in the other 110 pages? There is the expected extensive glossary, comprehensive bibliography and index, but they only take up forty pages. The remainder of the book comprises just two appendices.  The first of just two pages is a brief summary of the approach that is introduced in the book. Then there is the magnificent Appendix B where David explains how we got into this mess (my word, not his) in the first place, with the data modellers in one camp and the UML/object modellers in another.  He does this by going through the history of procedural and object-oriented programming alongside a history of data and database architecture, emphasizing that the differences between them have given us what is commonly called an ‘impedance mismatch’. Having dealt with the history he then brings in the future, the Semantic Web, and briefly explains the impact this will have on object-oriented system development and on the management of data. 

You really do get a lot for your money with this book.  You get a well thought out and sensible approach to modelling business information requirements using UML that is based on thorough research (David has recently been working with the Object Management Group to develop their Information Metadata Management project). In Appendix B you also get a helpful history of the information systems industry. Although the latter is worthy of a book in its own right, I have never seen this history brought together in one place before.

I am glad I bought this book. Everybody who has an interest in data or object-orientation ought to have a copy on their bookshelf.


submit to reddit