I very much enjoyed reading The Enrichment Game by Doug Needham. Doug has spoken many times at our Data Modeling Zone conferences over the years, and when I read the book, I can hear him talk in his distinct descriptive and conversational style.
The Enrichment Game describes how to improve data quality and data useability by combining “pieces,” processes, and people together. It requires taking a more holistic (program, not project) perspective and also anticipating future business requirements.
Doug has several analogies in this book to get his points across, and there is one on Agile that I reread several times because it makes so much sense and I have been trying to find a simple way to explain why Agile needs data management. This analogy is the answer. He talks about a fighter and how this boxer can only be successful if he combines agility with stability. I won’t give more away – read the book! But stability represents data management – well done on this analogy!
With permission from the publisher, here is a subset of the introduction to this book – enjoy!
What is meant by the phrase “enriching data“? According to Lexico, the word enrichment means: “the action of improving or enhancing the quality or value of something” (Oxford n.d.).
According to the book Infonomics (Laney 2018), data should be treated as an asset that could be added to a company’s balance sheet. Like other assets, data can be enriched to add value to the organization as a whole. This book is about the methods, techniques, and people involved while enriching data for an organization to use.
A software application written to add value to a consumer’s life does not and cannot capture all of the data that will prove useful later. An application’s performance will suffer if it stores all interaction data from the user for all time, so some weeding out of data must occur. The app does capture most of the necessary data, but questions will arise during the application’s lifetime that the application itself cannot answer.
Some questions are simple: Is this a new user? How many interactions has the application had with this user? Is this a frequent user? These types of questions are relatively easy to answer as long as all of the right data is captured, such as timestamps for interactions.
Some questions are more difficult: What browser is the consumer using? What device is the user connecting from? In what ways is this user similar to other users? The answers to these questions must be found by enriching the data.
The process of enriching data makes simple data more thorough. This thorough data, by its nature, is both more interesting and more informative.
The sources of enriching a single application’s data are limited only by the imagination. Some examples of other data sources that could be used to enrich data from a single application are:
- Application logs
- Other applications built by the company
- Third-party applications like Salesforce or Customer Relationship Management (CRM) software
- Statistical population data for the user’s zip-code
- Social media data that the user may interact with
- Third-party data sources like credit rating agencies
- Other data brokers
Combining this data together makes each interaction the user has with a company part of a universe of data that knowledge workers can explore to look for patterns. This universe of data is called enriched data.
At a base level, knowledge workers can produce reports showing the various important metrics the company uses. Other knowledge workers, like data scientists, use this enriched data to identify new patterns, new use cases, and new opportunities.
Enrichment drives insight.
Insight drives innovation.
Once you change the way humans and machines learn from the data, you change how the data can be used.
We will discuss the various “game pieces,” which are sources of data used to enrich application data. Next, we will enumerate the types of other data used to enrich application data and the methods for summarizing data. Next, we will examine the different types of data workers and their roles in the Enrichment Game. We cover the movement and placement of data and the legal implications of moving data around within the enterprise. We will discuss what to do with all of this data once you have it. Finally, we will discuss if you, your organization, and your customers are ready to play the Enrichment Game.
It all starts with knowing the pieces on the board and how they interact.
This book gives an overview of how all the pieces fit together rather than an in-depth look at any piece individually. Every topic mentioned in this book has volumes written about it already. A few Google searches with some of the key terms will give you more than enough detailed information to be reading for quite some time. I will limit the details of many of these topics and only give a brief overview of them. But I will share some of my experiences both where best practices were followed and had a positive outcome and when best practices were ignored, and the outcome was less than positive.
Our goal in discussing these things is to discuss why data, data engineers, data governance policies, data operations personnel, and the tools they need to do their job effectively need to come together in a particular way to meet the needs of the business. It will help you understand why these people, processes, procedures, and tools are needed, in what sequence they are needed, and how to bring all these things together to enrich the data that already exists within the enterprise shows the maturity of an organization. It also shows the maturity of leadership tasked with creating an enriched platform.
When enriching data, it’s important not to fall into the trap of spurious correlations, or connections between things that appear to have a strong correlation but really have nothing to do with each other. In other words, correlation does not mean causation. For instance, from 1999 to 2020, the number of people who drowned after falling out of a fishing boat correlates with the marriage rate in Kentucky (Vigen n.d.). But it would be foolish to assume that either of these things caused the other or that the correlation has any meaningful significance. The same can be true of data generated by apps.
Enriching data provides additional value by showing more contextual information around a particular event or transaction. However, the enriched data should be more useful than it was without the enrichment. Does knowing which phase the moon was in while someone bought a flashlight at their local supermarket have any predictive value? It might, if the reason for purchasing the flashlight was related to a power outage that recently occurred, and the person who bought the flashlight worked for a search and rescue operation. A full moon provides much more light available to a search and rescue operation than a new moon. While you may not anticipate search and rescue needs or even power outages if your store is the main supplier for the needs of a community, knowing the phase of the moon may be useful for having some items readily available and easy to find in your store.
One piece of additional contextual information in isolation may not be useful, but enriching data from multiple sources to get a detailed picture that indicates why someone made a purchase or used your software could be quite valuable in anticipating the needs of consumers.
Enriched contextual information about your data provides additional insight into the use of that data by your users.
Many companies have a detailed idea of an ideal customer or customer persona for different situations. These customer personas were identified through survey data and optional questionnaires on the websites. Your company markets to certain personas. What are all the attributes you have identified for your ideal customer persona(s)? Is your ideal customer male or female? Are they a college student, or an empty-nester? Do they live in a city, suburb or rural area? How do they use your products? Do they purchase items regularly, or do they only purchase items to prepare for a trip or an adventure? Does your application capture all of these attributes? How can you enrich the data you have to match data to your customer persona?
The difficulty I have seen with using these personas is that since a persona is an archetype of what a customer would look and act like, no actual purchases could be tied back to a customer persona. For example, at one company I worked, they had an ideal person for whom they created marketing material. She was a 30 something married professional mom of two children. Our application did not collect information on how many children our customers had. Also, we did not collect information on marital status or age. We could derive some of this information based on the purchase patterns, but the data in each application we were using only contained a portion of the persona information.
Relating purchase patterns, delivery addresses, items purchased, survey data, demographic data for the delivery location, and other things got us closer to being able to say, “Persona 1 made these types of purchases,” and “Persona 2 made these other types of purchases.”
Only by enriching the raw data from each application with data from our other supporting applications could we verify our persona assumptions and even tweak the persona definition based on usage patterns. No data from any individual application gave us enough visibility to the customers’ needs to relate purchase patterns to our personas. Only the fully enriched set of data could begin to give us insights into our personas.
The Enrichment platform creates a dedicated place for internal analysis and the opportunity to create new and additional data products derived from an application or group of applications that your business uses to interact with consumers.