Author: ROBERT GROTH
Publisher: PRENTICE HALL PTR (ISBN 0-13-756412-0)
This new book on the hot technology of data mining is the third release in “The Data Warehousing Institute Series from Prentice Hall PTR”. DWI has made a valuable and timely choice of subjects
for its first book to branch out into the various data-warehouse-related disciplines. Although, strictly speaking, data mining is really only indirectly related to data warehousing, Groth devotes
appropriate attention to how the two technologies can and do work well together.
As Groth points out in this book, “heart” of the data mining process is in data preparation, and the primary benefit data warehousing brings to data mining is in this area. Business professionals
who may be tempted (either independently or with the assistance of a consultant) to assume that quick benefits can be derived by pointing a data mining “tool” at a bunch of customer or
transaction data should heed Groth’s words before proceeding any further. Nuggets found in this book, such as “[Data preparation] can take the bulk of the time in the data mining process” prove
that the author has been in the trenches (the mines?), and can help to set reasonable expectations upon embarking into this new frontier.
The book strikes an excellent balance between technical background and business application, describing the theoretical basis of what goes on inside data mining software neural networks, linear
regression, confusion matrices–all the way up to how real products can be used to solve real business problems. Groth reviews the methods and processes followed in a typical mining project, and
presents several case studies in assorted industries. Especially enlightening are detailed descriptions of three representative data mining software products (the vendor of one, DataMind, happens
to provide Groth’s day job), the data mining marketplace and trends, and intelligently-categorized listings of many vendors of data mining and related software.
A very-well-intentioned bonus is a CD containing demo versions of the three data mining products described in the text. There are some complicating factors however;the demo versions of DataMind and
Predict function under Microsoft Office 95, but not with Office 97. The demo of KnowledgeSeeker supplied by the vendor to the publisher is an expired version, so it won’t work anywhere. I was able
to request and receive a demo version of DataMind from the vendor which works fine under Office 97. The DataMind demo, which has a 100-record limit, was enjoyable and informative to work with. The
interface is very intuitive, and the test data provided can be edited to create different “training sets”. This capability coupled with DataMind’s features for explaining its results, provided a
highly-interactive method for getting an idea about how the program “ticks”.
There are a few areas however where the excellent overall quality of the book is compromised in the details. For example, “data diagrams”, which describe examples of related data structures used
for mining, appear similar to IDEF1X data models, but cardinalities are often confusing at best. (A Sales_Transaction has many Customers? A Customer has one Sales_Transaction? We hope not.) Also,
the strong correlation between the amount of data available for building–“training”–a model in a data mining study, and the quality of the resulting model, was apparently not worth a mention.
Then there is one of this reviewer’s pet peeves: the dreaded “withdrawl” from a checking account. Even the spell checker in MS Word catches that one.
But all in all, Data Mining: A Hands-on Approach for Business Professionals is an excellent introductory resource for the topic. Its timely coverage of techniques, issues, trends, vendors and
products should prove quite worthwhile for business professionals evaluating the potential of data mining, as well as for the technical professionals who support them.