Authors: Michael Berry and Gordon Linoff
Publisher: John Wiley and Sons, Inc.
ISBN: 0-471-17980-9
If Robert Groth’s Data Mining: A Hands-on Approach for Business Professionals (reviewed in TDAN 5.0) is the introductory text for data mining, Data Mining Techniques: for Marketing, Sales,
and Customer Support by Michael Berry and Gordon Linoff is the text for the second semester. At 444 pages, this book is a logical next step for those who are seriously interested in this topic, and
considering, or perhaps embarking on, their first data mining project.
The depth of coverage of this book is impressive. The bulk of the text is devoted to detailed explanations of the various types of data-mining algorithms, including memory-based reasoning, link
analysis, decision trees, neural networks, and all the way to genetic algorithms. Being a database professional rather than a statistician or mathematician, I had despaired of ever grasping the
detailed nature of, for example, neural networks. (Life did go on nonetheless.) But the authors are able to gingerly steer the reader through such potentially dense topics in a straightforward
manner, using understandable examples and case studies, and avoiding buzzwords and jargon unless and until the terms are clearly explained. I may not be able to single-handedly write a neural
network yet, but if pressed could at least now describe the difference between a transfer function and a combination function.
Berry and Linoff have been practitioners in the field of data mining for a number of years (they have a Web site at www.data-miners.com), and readers will reap the benefits of their practical
experience. A concise section on statistical terms provides definitions and examples of statistical functions used both in data mining and in pre-processing of data prior to building models. Case
studies describe some of the more worldly difficulties the authors have come to grips with in practice, such as filling in incomplete data definitions, translating data from EBCDIC to ASCII, and
converting source data defined by a COBOL OCCURS clause into an historical time-series data structure that’s understandable by a mining tool. And “Practical Issues” describes the challenges
of, once the model-building is done, running the completed model in a truly production environment.
Other sections of the book serve to place data mining in perspective relative to other techniques for exploiting data. “Data Mining and the Corporate Data Warehouse” reviews the common ground
between the two disciplines, with common-sense yet uncompromising advocacy of meta data and the value of detail data over summary data. A perhaps unexpected inclusion is an entire chapter devoted
to OLAP, explaining the differences between this technology and data mining, and the pros and cons of both. Apparently there is some confusion in some camps that the authors felt compelled to clear
up. Delineation is not difficult: OLAP is a reporting technology, and is not intended, as is data mining, to discover patterns in underlying data, and perhaps develop predictions from the patterns.
Perhaps the very most practical chapters are left for the back of the book. “Choosing the Right Tool for the Job” explains which data-mining techniques are best for what types of business
applications. An example which might be surprising is that standard statistical techniques are applicable to a large extent to all common data mining tasks: classification, estimation, prediction,
affinity grouping, clustering and description. The authors also provide what they call “The Data Mining Report Card”, describing how well each technique supports each type of task. “What To Look
For In A Data Mining Software Package” can provide a strong basis for development of a request for proposal for data mining vendors (of which there are literally dozens). In addition to the more
common software evaluation questions dealing with platform, scaleability and such, less obvious criteria such as multiple levels of user interface and range of techniques are addressed here.
In short, Data Mining Techniques: for Marketing, Sales, and Customer Support is quite readable, even for those without a statistical or AI background, and is definitely recommended for anyone
wanting to embark on more than a casual study in this timely subject.