Authors: Joyce Bischoff, Ted Alexander and many others
Publisher: Prentice hall (ISBN 0-13-577370-9)
Data Warehousing has had a profound impact on information technology and data administration far beyond the building of reporting databases. Certainly, improved decision-making capabilities
continues to be the driving force behind plenty of data warehouses. BUT… warehouse projects have been known to impact almost every other aspects of IT within the corporation as well: data quality
efforts, data design efforts, client server efforts, data modeling efforts, and on and on.
The book – Data Warehouse: Practical Advice from the Experts, (Prentice Hall Publisher 1997), touches on many aspects of corporate data administration as it advises readers of
proven practices in the completion of data warehouse projects. Data Warehouse, authored and compiled by Joyce Bischoff and Ted Alexander with help from many recognized names, is an
exceptional book that I found easy to read skipping to-and-fro through the twenty-eight chapters. This is a book that will provide a novice or an experienced warehouse builder with reliable advice
on a chapter-by-chapter basis.
Ms. Bischoff begins the introduction in chapter one by writing about ‘any information, anywhere, anytime’ and the formation of an ‘architecture that is needed’. This sets
the tone for chapters that touch deeply on the topics of data architecture and data availability, from experts including but not remotely limited to Sid Adelman, Peter Brooks, Bischoff herself,
Paul Hessinger, Jack Sweeney and Richard Yevich. In total, Ms. Bischoff and Mr. Alexander make available chapters and acknowledge the participation of more than twenty individuals. In addition, the
forward contains an enthusiastic recommendation by John Zachman.
Data Warehouse: Practical Advice from the Experts is separated into six sections that cover Getting Started, Planning the Warehouse, Data: The Critical Issue, Design and
Implementation, Data Warehouse Administration, and Warehouse Trends. Each section contains three to ten chapters related to the section, in the approximate order in which the reader can expect to
address issues when building a data warehouse. The bulk of the articles are related to Getting Started and Design and Implementation, but some of the articles that I found to be most interesting
were included in the Data: The Critical Issue section.
A review of this book, section by section, chapter by chapter would be entirely too long. I have selected to detail the Data: The Critical Issue section to give you an idea as to what type of
information you can expect to receive from reading this book. This section includes chapters by Sid Adelman (Data Quality), Dave Gleason (Meta data), Jack Sweeney (The Role of a Directory/Catalog
a.k.a. the Repository), and Mr. Gleason again (Data Transformation).
In one of two chapters in this book by Sid Adelman, Mr. Adelman offers a list of sixteen indicators of data quality, a way to assess existing data quality in general and as it
pertains to the data warehouse, and paragraphs on the impact and cost of poor quality data. The chapter continues by helping the reader to determine which data should be improved, the steps to data
‘purification’, and the categories of tools to consider when concentrating on data quality. This is a lot of information to cover in one chapter and the author covers it succinctly by
bulleting issues with brief descriptions.
In the first chapter by Dave Gleason, Mr. Gleason offers three simple scenarios where meta data can answer many questions about the data in the data warehouse. The author describes
the lifecycle of meta data, where meta data is located, what it takes to maintain meta data, and how it can be integrated with data access tools. In his second chapter, Mr. Gleason covers the
fundamentals of data transformation, data conversion, cleansing and scrubbing, and field-level mapping. The author continues with how to implement data transformation manually and through tool
automation. Again, the amounts of information exposed in these chapters summarize what could be spread out to read volumes.
In the chapter by Jack Sweeney, Mr. Sweeney writes about the challenges and the components of an active meta data directory/catalog (repository). In this brief chapter, Mr. Sweeney
offers considerations for repository administration, meta data access, physical data movement, event management, and the development of APIs to intelligently link meta data to warehouse data.
My comments on the advice of the experts, or their ‘do-ability’ in the corporate environments that I have worked, is not important. In almost every situation, I agreed with what the
authors stated or found myself relating the material to my experiences. I can state that I learned a lot about the IT issues to consider when building a data warehouse by reading this book. The
advice of the experts furthered my knowledge in the field in each and every chapter.
One nice thing about this book is that the chapters do not have to be read in any particular order. If the reader is already up to their ankles in the warehouse project, there is benefit from
starting on the chapter that will present advice on the topic that is of most interest. Most chapters are easy to read, however, I did find myself re-reading sections where the understanding did
not sink in the first time. Readers may find numerous statements philosophical-by-nature but certainly these statements consistently strive for what should be considered ‘best
practices’.
When John Zachman states in the Forward that this book ‘is clearly the result of quality thinking’ and ‘a realistic statement of the state of the art in data warehouse’, I
believe that he is accurately stating what is to come from reading the book. This is a book that I am very happy to have as a weapon in my arsenal when it comes to data warehouse and data
administration related topics.