Data Warehousing, Data Mining & OLAP

Authors: Alex Berson and Stephen J. Smith
Publisher: McGRAW-HILL (ISBN 0-07-006272-2)

Data Warehousing, Data Mining, & OLAP, written by Alex Berson and Stephen J. Smith (Computing McGraw-Hill 1997), focuses on data delivery as a top priority in business computing today. The
authors use the forward to specify the three areas of data warehousing to be covered in the book as 1) bringing data necessary for enhancing traditional information presentation technologies into a
single source, 2) supporting online analytical processing (OLAP), and 3) the newest data delivery engine, Data Mining.

The book is broken into five parts, Foundation, Data Warehousing, Business Analysis, Data Mining, and Data Visualization and Overall Perspective. Each part goes into a tremendous amount of detail
starting general and moving to the specific, detailing at least five long chapters within each section.

The Foundation section begins by introducing the data warehouse, presenting an overview of client/server architectures and presenting parallel processors and cluster systems. The section continues
by discussing distributed database management systems, and by individually offering an overview of major client/server RDBMS database environments such as Oracle, Informix, Sybase, IBM’s DB2,
and Microsoft MS-SQL Server. This section builds a tremendous foundation of warehousing technology by detailing hardware architectures, multiprocessing architectures, and RDBMS features and

The second section, Data Warehousing, begins by detailing data warehousing components and the processes of building a data warehouse. This section of the book details mapping the warehouse to the
parallel processing architectures, selecting database schemas for decision support, the process of extracting, cleaning, and transforming data, and describes meta data as a key component of
supporting the knowledge workers. The chapters go into tremendous details, discussing tool requirements and offering a look at tool-by-tool vendor-based solutions.

The Business Analysis section of this book begins by breaking reporting and query tools into categories including reporting tools, managed query tools, executive information system (EIS) tools,
OLAP tools, and data mining tools. The authors talk about the need for developing reporting applications and then discuss many of the most recognized reporting and querying tools on the market
today. The chapters in this section also detail OLAP (what it is and and why it is necessary), introduces patterns and models for business analysis, explains different types of statistical
analysis, and delves briefly into the technologies of expert systems and artificial intelligence.

The fourth section, Data Mining, introduces the topic by discussing its motivation, measuring its effectiveness, and by defining the difference between discovery and prediction. The first chapter
in this section talks about the state of the data mining industry and compares the present technologies to that of days in the recent past. The rest of the chapters in this section discuss decision
trees, neural networks, genetic algorithms and rule induction. The section wraps up by helping the reader to select and use the right tools.

The final section, Data Visualization and Overall Perspectives pull together the information from the previous sections. In this section, the authors assume a basic understanding of what was
delivered in the other sections. This section focuses on “putting it all together” by discussing scalable solutions, the data warehouse market, costs and benefits of data warehousing, and by
describing Berson and Smith’s impressions of what is to come (and may already be here) in the field of data delivery. These impressions cover distributed warehouses, internet/intranet for
information delivery, object-relational databases, and very large databases (VLDBs).

The appendixes of the book provide additional information beyond that already detailed in the sections and chapters described above. The appendixes include a detailed glossary of business and
technical terms used and discussed in the chapters, a section on improving return on investment (ROI), Dr. E.F. Codd’s twelve guidelines for OLAP, and the Data Warehousing Institute’s
ten mistakes for data warehousing managers to avoid.

With this book, Data Warehousing, Data Mining, & OLAP, Alex Berson and Stephen J. Smith have delivered an important reference for all individuals developing data warehouses right now. The book
provides a level of detail that is hard to find in one place anywhere. Through their ability to introduce, define, and detailed all aspects of data delivery, and the depth of information about
tools presently on the market, this book will be a tremendous tool and reference guide to any individual responsible for delivering data to the corporation.


submit to reddit

About Robert S. Seiner

Robert S. (Bob) Seiner is the publisher of The Data Administration Newsletter ( – and has been since it was introduced in 1997 – providing valuable content for people that work in Information & Data Management and related fields. is known for its timely and relevant articles, columns and features from thought-leaders and practitioners. Seiner and were recognized by DAMA International for significant and demonstrable contributions to Information and Data Resource Management industries. Seiner is the President and Principal of KIK Consulting & Educational Services, a data and information management consultancy that he started in 2002, providing practical and cost-effective solutions in the disciplines of data governance, data stewardship, metadata management and data strategy. Seiner is a recognized industry thought-leader, has consulted with and educated many prominent organizations nationally and globally, and is known for his unique approach to implementing data governance. His book “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” was published in late 2014. Seiner speaks often at the industry’s leading conferences and provides a monthly webinar series titled “Real-World Data Governance” with DATAVERSITY.