Fit for Purpose data has been a foundational concept of Data Governance for as long as I’ve been in the field…so that’s 10-15 years now. Most data quality definitions take Fit-for-Purpose as a given. For example, John Ladley, in his article for CIO, “Ensuring the quality for ‘fit for purpose’ data,” wrote:
“Our topic for this installment is data quality, which we can simply define as data “fit for purpose.”[1]
As a slight variation, there’s the definition offered by the revered “Data Management Body of Knowledge (the DMBoK),” published by Data Management Association (DAMA), and shared by Dataversity:
“…the planning, implementation, and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meet the needs of data consumers.”[2]
Inherent in this preoccupation with fit-for-purpose is that the data consumer knows what they need, and the data producers must assure the data meets the requirements the consumers communicate. This can’t help but remind me of the difficulties I’ve often met in the traditional waterfall method of system development. First comes the business requirements, then the technical specs, then development, and then the technology team presents the users with the application, ready for UAT, and they say, “What? This wasn’t what I wanted!” I have had personal experience where data quality improvement efforts focused on the consumer deciding what they need first, leaving the producers waiting for requirements while the clock ticks toward whatever date is set for achieving the consumers’ quality goals. Almost invariably, once the requirements are ready, the producers scramble to make the necessary improvements, and yet the resulting data still is not satisfactory.
True to the title of my column, Through the Looking Glass, I want to flip the whole concept of fit-for-purpose based data quality on its head and explore the concept of “Supply side Data Governance.” I will begin by returning to the DMBoK definition mentioned above.
“Consumption,” etymologically speaking, is a much more interesting word than “purpose” (c. 1300, purpus, “intention, aim, goal; object to be kept in view; proper function for which something exists”)[3] Consumption’s original meaning (late 14c.,”wasting of the body by disease; wasting disease, progressive emaciation”[4]) explains how it became a synonym for tuberculosis. Even its usage in the DAMA data quality definition derives from a meaning with darker overtones: “act of consuming, the using up of material, destruction by use.”[5]
More relevant to this discussion, consumption also brings us into the realm of economics, as it has a specific definition and usage in the “dismal science.” As the Corporate Finance Institute puts it:
“Consumption is defined as the use of goods and services by a household. It is a component in the calculation of the Gross Domestic Product (GDP). Macroeconomists typically use consumption as a proxy of the overall economy.”[6]
CFI goes on to note that “the study of consumption theory has helped economists formulate numerous theories such as the Law of Demand, the Consumer Surplus concept, and the Law of Diminishing Marginal Utility.”
From here, it’s not a giant leap to the great economic debate of the last fifty plus years, “supply-side economics” vs. “demand-side economics.” Those of us who remember the early 1980s (here I go dating myself again) can recall this became far more than a battle of economists over dueling theories. The country faced high, persistent inflation, which the Fed fought by raising interest rates to a level we have difficulty conceiving of today. The Indeed Editorial Team offers a concise comparison of these competing approaches,[7] and for a deeper dive into this topic, I recommend James D. Gwartney’s fine article “Supply-Side Economics.”[8] Gwartney begins his article with the two different but related ways the term “supply-side economics” is used. The second usage, “how changes in marginal tax rates influence economic activity,” is not relevant here (although it sure would trigger a lively political discussion!), but the first usage is certainly on point:
“Some use the term to refer to the fact that production (supply) underlies consumption and living standards. In the long run, our income levels reflect our ability to produce goods and services that people value. Higher income levels and living standards cannot be achieved without expansion in output.”
Let’s take a shot at rephrasing this to make it relevant to data governance. “The supply of high-quality, accessible, complete data underlies its consumption and value to users. In the long run, our ability to maximize the value of our data reflects the ability to produce data which users value. Data driven business value, revenue generation and analytical insights cannot be achieved without the expansion of data output.”
I have thought another term for “fit-for-purpose” data quality is “demand-side data quality,” as the needs of the data consumers, the demand side of the equation, dictate what the data producers, the “supply side” provide.
I will describe a scenario to illustrate how Supply-side Data Governance might work in practice. Before I start, however, a brief aside.
Some readers may think Supply-Side Data Governance sounds suspiciously Field of Dream-ish. In her blog for teradata.com, Monica Woolmer writes, “I’m glad to report that the focus is no longer on the technology first; at last, the belief in the old idiom from ‘if you build it, they will come’ (pinched from the 1989 classic film Field of Dreams) has all but disappeared.”[9]
Now, Field of Dreams is one of my favorite movies (and never leaves me dry-eyed by the end), but I’m not suggesting that building a technology solution first, like a master data management system, and expecting success to follow is Supply-Side Data Governance. Rather, my experience shows that data providers, so familiar with the data they create, generate, or collect for consumers, don’t need to wait for their downstream consumers to figure out their requirements before setting in place beneficial governance, quality checks, controls, and pipelines.
Back to the scenario. Let’s say you manage the data for a line of business within an organization, and your transaction data is critically important to a host of other internal consumers, as well as your external customers.
Following the “traditional” Demand-Side Data Governance would mean your consumers would each need to painstakingly collect their requirements for fit-for-purpose data, and you would need to translate those into the actions necessary to assure the data meets those requirements.
But you are going to try something different. You know (or should know) what your organization needs from this data. And you also have a basic idea of general checks and controls to build, because these are consistent across data sets— row count variations, for example. You can establish accountabilities and responsibilities for this data, based on whatever data governance role framework you have in place. Also, you already are painfully aware of certain deficiencies in the data – no need to wait for your consumers to address those!
You implement your data governance approach, through data quality checks, remediation procedures, defining critical business terms, and setting access controls based on content (is there PII, etc.?).
Then, you go out to your consumers with a sample data set, cleansed based on the rules and processes you’ve put in place, with a business glossary and data dictionary, and a detailed explanation of your data quality rules.
Now, naturally your consumers will come back with more requests. They may disagree with your rules and thresholds. They could define certain elements differently. There may be calculations and transformations they want you to perform. You collect their feedback through an agile structure, and you iterate through sprints. At each point, you are creating “working data,” and refining the requirements. You are dynamically creating “fit-for-purpose” data through partnership with your consumers.
Meanwhile, you create or acquire new data, and the cycle
begins again. What’s different is that the Data Supplier, closer to the data
than the consumers, drives the process through iteration, not waterfall.
We’ve just peered into the Looking Glass from a
“fit-for-purpose” perspective. Note that I haven’t said to abandon that
concept. Instead, I think Supply side Data Governance, as strange as that term
may sound (and with its mixed connotations, depending on how those of us who
remember the early 80s feel about that period), can lead to faster and more
meaningful achievement of working data than a demand-side approach where, all
too often, the data consumers find out what they thought they want is not what
they need.
[1] Ladley, John, “Ensuring the quality of ‘fit for purpose’ data”, October 17, 016, CIO, https://www.cio.com/article/236174/ensuring-the-quality-of-fit-for-purpose-data.html
[2] Knight, Michelle, “What is Data Quality,” March 9, 2022, Dataversity, https://www.dataversity.net/what-is-data-quality/#
[3] Online Etymology Dictionary, https://www.etymonline.com/word/purpose
[4] Online Etymology Dictionary https://www.etymonline.com/word/consumption
[5] Ibid, https://www.etymonline.com/word/consumption
[6] CFI Team, “Consumption”, January 22, 2022, Corporate Financial Institute, https://corporatefinanceinstitute.com/resources/knowledge/economics/consumption/
[7] Indeed Editorial Team, “Supply-Side Economics vs. Demand-Side Economics: Definitions and Examples,” January 5, 2021, Indeed.com, https://www.indeed.com/career-advice/career-development/supply-side-vs-demand-side
[8] Gwartney, James D., “Supply-Side Economics,” Econlib.org, https://www.econlib.org/library/Enc/SupplySideEconomics.html
[9] Woolmer, Monica, “Within data and analytics, the ‘If you build it, they will come’ mentality is finally dead”, September 20, 2017, teradata.com, https://www.teradata.com/Blogs/Within-data-and-analytics,-the-If-you-build-it,-t