Over the past few months, my team in Castlebridge and I have been working with clients delivering training to business and IT teams on data management skills like data governance, data quality management, data modelling, and metadata management. Many of these teams were ‘blended’, combining staff from across the traditional divide (as an aside: this is exactly the approach that should be taken if we look at models like Rik Maes’ Amsterdam Model).
A big chunk of our training with clients focusses on the “why” of data in the organisation. This is because if we don’t understand the “why”, the business and data strategy alignment, it’s very hard if not impossible to prioritise the what, when, and who of our efforts. After all, Simon Sinek tells us in his book, Start with Why, that the “why” needs to come before the “what” and “how” when we are trying to influence human behaviour. This prompted me to think again on a simple question: Why do we collect and manage data and information?
A quick look through some online sources gave me several possible answers. LetsTalkScience tells me it’s “to preserve it for later use.”[1] The Council for Quality and Leadership gives me 12 reasons why data is important, but doesn’t actually explain why.[2] A variety of other sites gave me great (but predictable) examples of why data is important to organisations like understanding customers, identifying waste and opportunities for improvement.
I’m sorry… but YAWN!!!!!!!!!!!!!! They are all great examples of the “WHAT” of data— the “what we can do with this stuff when we have it” question. But none of these really get to the “why”. And to change human behaviours we need to be able to start with “why”.
So… why do we capture, store, process, share, and present information and data? Being a data quality geek I did a quick fishbone diagram on a sheet of paper and did a 5 Whys analysis (if I find time over the coming weeks I hope to transfer it to an electronic form that is actually legible by others). Why do we preserve data for later use? Why does data help us understand customers? Why? Why? Why?
Based on that rough analysis, the simple why of “Why do we manage data?” is this:
We record and manage data to enable communication about things to allow an action or an outcome and to share meaning.
This is an imperfect summary and I hope to refine it further. But it will do for now as it encompasses the communication of information between people, the communication between a machine and a person, and the communication between a machine and a machine.
Over the past 20 years, I’ve been involved in a number of certification development initiatives in the data management space, from the IQInternational/IAIDQ’s Information Quality Certified Practitioner certification to the DAMA CDMP. I decided to look back at my notes from those efforts to see if there was any insights to be gleaned if I applied the new “Why” perspective. Interestingly, when I looked at the weighting of knowledge and skill criteria in the design of exams and syllabuses, an interesting trend emerged:
The knowledge and skill domains that are most important in data management are the domains that relate either to managing and improving communication of abstract concepts or deal with fixing or preventing the impacts of poor communication about and with data.
Bluntly: depending on how you want to slice it, between around 70% of the questions in the CDMP certification relate to things that are done to support communication or which require effective communication. Within the IAIDQ’s IQCP certification, specific knowledge and skills around communications and communication methods accounted for at least 42 out of 192 identified knowledge and skill elements in the role of a data quality professional. A thing that wasn’t mentioned was proficiency in any particular technology or tool set. Knowing how to drive a data quality software tool is a “how” not a “why”.
Whether it is the creation of internal communication structures for data issues as part of data governance, or the creation of what Graham Witt calls the “narrative supported by a picture” that is your data model, or the codifying of definitions as part of metadata and master data management, what we are trying to do ultimately is to improve the quality of communication of, about, and with data and to remove noise from that communication so there is a meeting of minds between the sender and the receiver so that the right understanding and actions can occur.
As my brain was trying to distract me from the looming deadlines for my chapter updates for the 2nd edition of the data ethics book I co-wrote a few years ago, I pondered what this all might mean for the data literacy discussion. If the “why” of all things data is to enable better communication, what does that mean for how we should be thinking about the questions of data literacy and data acumen? What competences do we need to develop in what contexts?
A lot of the discussion around data literacy compares it to the model of literacy and numeracy in traditional education. Being able to write your name means you have a basic level of functional literacy. But, when we look back into pre-history, we find evidence of recorded symbolic communication of things dating back to at least 6000 years BCE. That’s over eight thousand years of people recording something to impart a meaning or communicate a fact or insight. Whether it is the Jiahu symbols of China or the horned man deep in the caves of Lascaux in France, we’ve been scribbling things down to tell each other things for a large chunk of human existence.
The metadata associated with these prehistoric markings is lost to time so we don’t have a clear understanding of what they mean or why the were recorded. But for someone, at some point in time, there was a “why”.
That why came before the how of systematic writing as we know it in more modern times. Before the creation of formal tooling and structures for encoding and imparting meaning. Back in the dim mists of time, the first data manager started with “why”.
The lesson here is important: if we don’t consider the audience for our communication our “why” might get lost. The ancient humans who created the first proto-writing symbolic records probably had a good understanding of their “who”— who were they communicating to and would the be able to understand it. Perhaps there were small tribes of shaman who wandered around helping translate symbols and reorder them in to better structures to help improve communication of meaning. Was this the birth of data management consultants?
Today, we need to consider who our who is. Because communication is a two-way process. And as part of the skill of communicating we need to be removing noise, addressing issues of different mental models of how the world works, perception bias, and more. And this is, ultimately, what data management professionals need to do on a day to day basis. And this is why it is a terminal mistake for data project teams to dive into the data without first understanding the “Why”.
As part of shaping the message of “Why we manage data” in our organisations, we need to consider who the audience is for that message, both internally and externally, and ensure we are expressing the right Why in the right way.
At the IRMUK EDBIA conference in November 2022, my colleague Sue Geuens and I taught a full day course on communications skills for data professionals to help people find the “why” and communicate the “why” and to tell the data story in their organisations. At the end of the day, one of the delegates asked an interesting question:
“Why does this not feature on the CDMP exam, even though it’s in the DMBOK?”
Why indeed.
[1] https://web.archive.org/web/20221202082941/https://letstalkscience.ca/educational-resources/learning-strategies/recording-data
[2] https://web.archive.org/web/20221202082519/https://www.c-q-l.org/resources/guides/12-reasons-why-data-is-important/