When is an answer “good enough”?

There are many areas of data science and AI where we need to be satisfied with an answer that is not perfect and yet still provides business value. The data scientist’s problems are often not solved with straightforward statistics and are instead much more complex. That’s where heuristics excel. In fact, the entire field of heuristics is about producing quick solutions that are not perfect but still good enough. A heuristic is a problem-solving approach to reaching a non-optimal yet still useful solution.

**Zacharias Voulgaris, PhD** has written many books on data science, machine learning, and artificial intelligence, but this is the first book completely dedicated to heuristics. The book’s goal is to allow the reader to “become proficient in using heuristics within the data science pipeline to produce higher quality results in less time.”

Voulgaris illustrates how heuristics can help you solve challenging problems through simple examples and real-life situations. Apply Jaccard Similarity and a variant, F1 score, Entropy, Ectropy, Area Under Curve, Particle Swarm Optimization, and Genetic Algorithms (along with GA variants). Beyond just exhibiting the various known and lesser-known heuristics available today, he examines how you can go about creating your own through a simple and functional framework. The book contains five sections. Part I provides an overview of heuristics and covers each of the various types of heuristics. Part II focuses on data-oriented heuristics and how they apply specifically to data science problems. Part III explains optimization-oriented heuristics and how they solve challenging optimization problems. Part IV is all about designing and implementing your own heuristics to help with particular problems. Finally, Part V contains additional topics on heuristics, such as transparency and limitations.

Chapter summaries reinforce the key points of each chapter, a glossary explains important terms within our field, and several appendices contain reference material on heuristics and related programming tools. I like that several chapters expand on the material with a code notebook (.jl files) in the Julia language.

Here is an excerpt from Chapter 2 of the book, used with permission from the publisher, Technics Publications:

*This chapter will explore
heuristics as metrics and algorithms and examine some important considerations.
This can lay the foundations for the chapters that follow, where we’ll look at
heuristics in more detail and, in particular, applications related to both data
science and AI**. Feel free to consult the glossary if you
need more information on any heuristics terms. Also, consider specific examples
in your work or experience where this information applies. This can help cement
what’s described here and make it your own.*

*Heuristics as Metrics*

*Heuristics as Metrics*

*Heuristics as metrics
are an ideal option for modern data science work since they are the best of
both worlds, combining conventional analytics with data-driven** analysis. In fact, it is the cornerstone of
the data-driven paradigm, while at the same time, they borrow a lot from the
model-driven** paradigm underlying conventional analytics. *

*However, heuristics as
metrics are not supported by theory. They are just methods that work for a
particular problem and measure what we need to gauge. That is what makes them
relatively easy to apply. A heuristic metric is a powerful tool, but only when
applied to problems relevant to that metric in terms of scope, which we’ll
cover later in this book.*

*Additionally,
heuristics as metrics are the most obvious application of these tools, even if
many people take them for granted. There are a few well-known heuristic
metrics, such as the F1 score for assessing classifier performance in relation
to a given class, or the various similarity metrics that show us the similarity
between two data points. These metrics are quite popular, and people don’t
really think about them very much because of their frequent use. That’s why
they often don’t even use the term heuristics to describe them. *

*Heuristics as metrics
are commonly used in data science work, though they aren’t limited to data
science. Still, it makes more sense to use them with actual data than with the
outputs of arbitrary functions, as in abstract AI** applications such as
optimization. In any case, they can be very valuable and, in some cases, a
flexible tool. Of course, their flexibility depends on how well you know them
and your skills as a problem-solver.*

*Although heuristic
metrics are somewhat similar to conventional statistics, they are a very
powerful tool in the data-driven paradigm. So, whenever you’re using machine
learning and machine learning-related methods, you will have to deal with
heuristics in one way or another. *

*Since heuristics are
underused, there is much untapped potential. Heuristics as metrics, in
particular, have a lot of room for growth and evolution. *

*Heuristics as Algorithms*

*Heuristics as Algorithms*

*Heuristics as
algorithms are more popular than heuristics as metrics, at least for mainstream
data science and AI**. This is because heuristics as algorithms
have been around longer, and therefore, there are many applications in both of
these fields. In fact, it is likely that AI would have never evolved much
without the use of algorithm-related heuristics. The same goes for data-driven** data
science (particularly machine learning), which relies greatly on some of these
heuristics-based methods.*

*What’s more,
heuristics as algorithms are ideal for complex problems. Such problems may
involve high dimensionality, many restrictions, or anything that would require
a lot of computational power to handle analytically. Perhaps that’s why
heuristics algorithms are particularly useful for optimization problems and AI** in general, even though this
kind of heuristic can apply to all sorts of scenarios involving processes. In
all cases, a heuristics algorithm needs to be relatively flexible and scalable
for the heuristic to be useful.*

*For example, you can
use a heuristics as algorithms approach to find the best variables to use in a
data set for feature engineering and other scenarios where you deal with
Natural Processing** (NP**) problems. You can even view the common
clustering algorithms used today as this type of heuristic algorithm. Also,
because heuristics algorithms are ingrained in our understanding of the
data-driven** approach, many people incorrectly think of
this type of heuristic when they hear this term. However, in academia, where people
are more aligned with this way of thinking, the term heuristic has a very
particular meaning in this kind of work, usually related to metrics and methods
for facilitating the solution of a complex problem.*

*Heuristics as
algorithms are a more challenging area and require a good understanding of the
problem. Maybe that’s why most practitioners do not choose them when figuring
out the solution to a new problem. After all, developing a new heuristic is not
a simple matter, as we’ll discuss shortly. Still, it’s often necessary,
especially when the problem is too difficult or too computationally expensive
to solve otherwise. *

*Just like in other kinds of heuristics, heuristics as algorithms has a great deal of untapped potential, especially with research. So, there are lots of theoretical methods in the research sector. For example, when it comes to optimization, new algorithmic heuristics often make their way to papers as potential ways to tackle very challenging problems.*