Eric Siegel’s “The AI Playbook” serves as a crucial guide, offering important insights for data professionals and their internal customers on effectively leveraging AI within business operations. The book, which comes out on February 6th, and its insights are captured in six statements:
— Determine the value
— Establish a prediction goal
— Establish evaluation metrics
— Prepare the data
— Train the model
— Deploy the model
In this article, I’ll unpack Eric Siegel’s recommendations for managing data projects. While seasoned data practitioners may find many familiar, they serve as valuable anchors for best practices and should ensure the next generation, including those that are experimenting with generative AI, understand the lessons learned by vanguard CIO, CDOs, and data practitioners.
Determine the Value
Siegel emphasizes in the book the need to define a clear value proposition for data initiatives and products. This means determining upfront what the data combined with data models will predict and how these will specifically enhance business operations and customer offerings. Concrete objectives, such as reducing customer churn or cutting manufacturing costs by specific percentages, are vital. Furthermore, organizations must outline the strategies for achieving these business outcomes.
Establish the Prediction Goal
For data practitioners, establishing a detailed prediction goal is crucial, marking the confluence of technology and business strategy. Siegel believes this must be specified in detail. Doing this well demands collaboration and input from business leaders. The goal is to translate business objectives into clear technical actions. The reason for this is that this is effective data initiatives intersect technology and business.
It’s also essential to set achievable expectations for precision. This involves determining the predicted outcomes and planned responses. Data practitioners must enlist business leaders to weigh in. If they succeed, this exercise will translate the business intention into well-defined requirements for technical execution. Here, Siegel claims it is critical to define realistic expectations on precision. This involves defining what’s predicted and what is to be done with it.
Establish Evaluation Metrics
Once what machine learning will predict is defined, data practitioners should shift their focus to the quality of a machine learning model’s predictions. Benchmarking a model’s effectiveness doesn’t require understanding its mechanics. Siegel recommends measuring performance with lift (a factor quantifying prediction improvement over guessing ) or cost (which considers the impact of false positives and negatives, rather than just accuracy, which is the rate of correct predictions).
In the pre-big data era, decisions were often made based using gut feeling or simple linear extrapolation from backward-facing data. With more and current data available, using lift makes sense for comparison since it measures the improvement a model provides over random guessing. Metrics are crucial for assessing training and the operational phases of a model. The value of an imperfect prediction is in its utility, such as predicting customer behavior in marketing campaigns. Siegel shows here a profitability curve illustrating that neither contacting every lead nor none is optimal; instead, it’s about finding the profitable middle ground.
Prepare the Data
Next, Siegel emphasizes the primacy of data over algorithms, highlighting that the main return on investment comes from data. It’s long been understood that data is the cornerstone of predictive strength. Siegel shares an important point — existing data wasn’t created for machine learning or generative AI, suggesting that data preparation is often overlooked and undervalued. Building on this, there’s a call for organizations to resolve data issues and undertake what Stephanie Woerner at MIT-CISR refers to as “industrialization” of data. Firms that achieve this “combine data collected from customer interactions and elsewhere to become a single source of truth that anyone with permission in the firm can use for decision making.”
Siegel underscores the importance of data in driving machine learning outcomes, indicating that most of a project’s effort lies in data preparation, typically the realm of data engineers. According to Jennifer Redmon, “New data science graduates have a false sense of security that the data they receive will be sound.” Effective data prep aims to create comprehensive datasets that are extensive (long), encompassing a wide array of representative scenarios, detailed (wide), providing rich information across variables, and well-organized (labeled).
Train the Model
Siegel highlights the importance of validating a model’s sensibility and debugging it before data practitioners and business users learn from data. Here, it is critical to determine what you can learn from the data. Does it create unexpected outcomes? Siegel aims, at the same time, to demystify data science for novices, making it accessible to business users. However, there needs to be more clarity on how traditional data science models differ from generative AI models, which may have distinct considerations and applications.
Deploy the Model
Siegel emphasizes that deploying a model — moving it from development to practical use — demands organizational commitment, involving not just executives, but also operational staff. As CIOs who I know would insist, data leaders must ensure staff are prepared for the changes that come with new digital processes. Resistance to change is a significant hurdle; thus, altering legacy processes is crucial. As noted by the CIO of the American Cancer Society, buy-in from all levels, especially those directly affected, is essential. At the end of the day, digital transformation changes what someone does or doesn’t do in their day. Deployment, if well-planned and executed, can be the most substantial phase of a data project, altering daily routines and workflows.
Parting Words
Siegel’s book serves as a reminder of the key stages in a data-centric digital transformation. Every successful project should go through the steps suggested by Siegel. While initial planning is vital, engaging mid-level stakeholders is equally important to ensure buy-in throughout the organization. The example of Kodak illustrates the risks of resistance from middle management, even when a company holds valuable patents, as in the case of digital cameras. For data practitioners, adhering to Siegel’s outlined steps is advisable for successful data transformation. For those who want to hear Siegel speak, he will be a keynote speaker at #CIOChat Live 2024!