I’ve been thinking a lot about data mapping lately. I know, weird, right? With analytics, AI, cloud, etc., why would someone do that? What’s even stranger is that I’ve been thinking about its impact on data leaders. For clarity’s sake, I’m not talking about geographic maps with data points, I’m referring to the process of defining the requirements for how a data structure should be transformed into another data structure. Most think of this kind of data mapping as very tactical and not strategic enough to warrant the attention of busy leaders, but I beg to differ.
What got me started on this was observing the work effort of my clients, who are medium to large enterprises with a wide array of initiatives using various technologies and architectures. When I started informally analyzing the objectives of many of their meetings, I realized much focus was on understanding and agreeing on how data is mapped between data structures.
Some of us who are older think of an Excel spreadsheet with source and target columns when we think about data mapping. There are a few of these in my client meetings. Most often, a pipeline, notebook, SQL query, report, or some other artifact is the focus. This obscures that, in most cases, the core issue is a discussion about how the data ended up being the way it is, from the way it was. In essence, it’s an exercise in communication and alignment with the requirements, rules, and how they were implemented.
How did I conclude this is a strategic issue that deserves a data leader’s time? In a nutshell, two things are at the heart of succeeding with data: quality risk and people cost.
Risks and Costs Associated with Poor Data Mapping
My experience has been that the most significant quality risk and the highest costs related to data lie at the boundary between data structures being converted and transformed into other data structures. The risk is that misunderstood requirements cause data inaccuracies and confusion even when nothing seems to break. Some call it data not being fit for purpose. This undermines trust and confidence, which is difficult to earn and more difficult to win back. But what does data quality have to do with data mapping? Two things:
First, when the data is mapped from one data structure to another, it is a golden opportunity to compare the values and identify quality problems.
Second, it’s a chance to define how those data quality problems will be handled. This lowers the risk of poor-quality data, but more importantly, it shows that the data team uses a disciplined approach and can communicate and set consumer expectations.
From a people cost perspective, I started doing simple math, multiplying the number of people in client meetings by the number of meetings. It quickly became apparent that poor data mapping is a drag on the data leader’s budget. The challenge is that it’s subtle. There is no data mapping line in the budget. Still, I can almost guarantee part of the reason why your data architects/analysts/report developers, etc., are maxed out is that they are spending a significant part of their time trying to tease out how data changed from what it was to what it is and if it’s correct.
Dissecting the Essence of Data Mapping
Fundamentally, data mapping is about communication, sharing understanding, agreement, and the implementation of that agreement. That’s pretty abstract, so I will elaborate. Data mappings, by definition, start with the definition of the ‘from’ or ‘source’ data structure. Someone is the owner of that data structure and as such, has a vested interest in it as well as a deep understanding of it. I am using the term data structure as a meta term for table, file, spreadsheet, json, message, etc.
Mappings end with the data in a defined target data structure. Once again, it has an owner who presumably understands the requirements that it must support, including the required data. And we have someone playing the role of a facilitator who understands both and can document the rules for how the data must be changed from source to target. That is the essence of the activity of data mapping, but it’s not so simple and requires excellent listening, questioning, validation, and technical skills.
Implicit and Explicit Data Mapping
Data mapping happens whether you want it to or not. It’s amusing when someone says something to me like, “It’s a bunch of documents. We just get to work.” They’re not so subtle point being, it’s a waste of time and you are full of it. It’s amusing because they seem to think they can avoid data mapping. So, where does their data get mapped? It’s done in the code or code equivalent of pipelines. And how easy is that to understand and untangle? And how many people have the skills to do that? And how easy is it to show the mapping to a requirements expert who needs to validate it? You see my point.
I’ve also had discussions about data products and data mapping. Data products don’t magically eliminate the need for data mappings. The creators of a data product need to do mapping to produce it. Consumers of the data product need to do data mapping to use it.
I’m not saying that all data mapping needs to be done in a document and then implemented in code. Sometimes that can be over kill and it’s ok to just “get to work.” What I’m saying is that everyone needs to realize that each time data is extracted from a data structure, crosses a system boundary, and is deposited into another data structure, data mapping has been performed either explicitly or implicitly. The organization needs to have standards that force thoughtful consideration of the criticality of the system, complexity, who will maintain it, and other factors.
Data Leader’s Playbook
If you accept or are pondering my assertion that data mapping is critical, I suggest you consider taking these actions to raise your team’s game.
1. Define Data Mapping
Get your director reports together and agree on the definition data mapping, a statement of its purpose, and its value. Yes, it’s elementary, but it’s critical.
2. Create a Data Mapping Policy
Create a high level policy statement that clarifies when a mapping is required and how it is to be maintained.
3. Create Data Mapping Standards
Create a standards document that explains the types of data mapping artifacts and mechanisms that should be used in specific scenarios. Yes, there can be several that range from formal Excel mappings to less formal code level markup.
Data mapping can also extend to technologies such as lineage, API management, catalogs, governance, and privacy. These should eventually be included in the standards, but don’t get stuck trying to figure it all out in the beginning. Just get moving and start with a simple, basic standard.
4. Add Mapping Terms and Example to Your Term Glossary
Define the specific data mapping terms and add them and best-practice examples, to your team’s term glossary.
5. Create and Deliver Data Mapping Training
Data mapping is not taught in school, so unless someone has worked for a consultancy or has had an excellent old-school boss, they have never learned it. To teach everyone about data mapping, create a short training course supported by videos and self-paced learning materials.
6. Enforce Project Data Mapping Validation
Require the review and approval of data mappings by project stakeholders as a milestone. This will ensure requirements are interpreted correctly prior to continuing into a development cycle.
7. Implement Data Mapping Links
You should be logging the execution of data movement and transformation. When you do, you know when there is an issue and can raise it. To save the team time, add a simple pointer to the data mapping that supports any issues raised.
Conclusion
Data mapping may not be the sexiest topic, but it is the common denominator for lowering data quality risk and eliminating cost nightmares associated with rework.
An organization without guidance will find its own way. This usually results in endless hours of people trying to unwind how and where data came from, how it was transformed, and why.
It is important and foundational enough that data leaders should eliminate it as a drag on their organization’s effectiveness and free up more time for analytics and AI.