I have been fortunate to spend a majority of my career- over 22 years- as a data quality professional. I know more now than when I began, and often wish that I had learned some lessons earlier in my career. In celebration of Granite Falls’ decade of providing data quality and governance solutions, let me share ten of these lessons.
Communication is Real Work
1) I used to think that communication was this thing I had to do that got in the way of my real data quality work. After all, wasn’t it more important to be profiling the data, getting to root causes, and implementing improvements? I would grumble to myself “Do you want me to do the work or do you want me to talk about it?” A great “aha!” moment for me was when I realized that communication is part of the work. Communication is as essential to meeting my goals as the other activities. My manager at the time said, “Danette, I understand what you are trying to do but you have to give me the words so I can take that message to others.” I realized that I had to figure out how to better express myself and my ideas. If I could clearly voice my thoughts, then someone else could more easily understand and accept the message. They could then share those ideas with others, and so on. My reach could be much greater with many people discussing those ideas. I’ve spent a good part of my career improving my ability to communicate, and continue work on those skills.
Recognizing High Quality Data and Information
2) I can now better articulate how to recognize high quality data and information. If we have high quality data or information, we can find the data or information that we need (we can get to it and access it); it is available when we need it (it is timely and not late); it includes everything we need (nothing is missing); it is secure (it is safe from unauthorized access and manipulation); we understand it (we can interpret it); it is correct (it is an accurate reflection of what is happening or what did happen in the real world). Because of all these things we trust the information when we get it, and we can use it with confidence. The trust in and use of data and information is the whole point of the work that we do. So, what is the difference between data and information? I see data as facts or items that are of interest to the organization, and information as those facts in context. I decided to avoid the deep theoretical arguments about the differences between them, and often use the words interchangeably. However, I do distinguish between them when the distinction: 1) helps me take better action when managing the data and information at various points along its life cycle, and 2) improves communication. If I am talking with someone whose concern is what they see on reports, I may use “information.” If I am talking databases and architecture, then “data” could be more appropriate. What I am bonded to is using the words that are most meaningful to the audience I am speaking with so we can take more effective action.
The Information Life Cycle
3) I learned about the universal resource life cycle from Larry English (Plan, Acquire, Maintain, Dispose, Apply).[i] He taught me that information is a resource and is essential to performing business processes and achieving business objectives, just as money, inventory, facilities, and people are resources. Every resource should be properly managed throughout its life cycle in order to get the full use and benefit from it. I modified the names of the life cycle phases slightly from Larry’s original, added the “Store and Share” phase, and coined the acronym POSMAD[ii] as an easy reminder of the fundamental phases of the information life cycle: Plan for the resource, Obtain the resource, Store and Share the resource, Maintain the resource, Apply the resource, and Dispose of the resource. The more I work with data, the more I understand that knowledge of the life cycle is crucial to better addressing issues that crop up over the course of information’s life.
The Information Environment
4) The information environment is equally as important as the information life cycle. There are four key components affecting information quality throughout its life cycle: data, processes, people/organizations, and technology. (This is one place where the distinction between data and information helps us take better action.) The “information environment” takes account of those four key components, in addition to others that for the sake of space I won’t mention here. It’s a bit like a data ecosystem. Using the background of the information life cycle and the components of the information environment, we see relationships that we haven’t seen before. We see causes and effects. The beauty of a data quality approach is the ability to pull together all of these different elements and find new solutions, make better decisions, and offer a fresh perspective. We develop solutions that previously were not clear because we are looking at things in a more holistic fashion.
The Role of Tools
5) Too often people think If we just get the right tool then all our data quality problems will be taken care of. The right tools are an essential component of the information environment, but they are not the end of the story. Thinking the right tool will take care of all your data quality problems is like believing the right x-ray machine will make you healthy. We do need the x-ray machine to peer into the body and give us information on which to act for our health. But we also need the technicians who run the machine, doctors who analyze the results, patients who are willing to take action on their doctor’s advice, and support staff to get approval from the insurance company to pay for the test so we can have it taken in the first place. With information, we have to understand how and where to use the appropriate tools. We need to implement good processes around them. So while we can’t do our work without the right tools, data quality is not only about the tools.
Programs, Projects, and Operational Processes
6) I see programs, projects, and operational processes as fundamental means of putting data quality and governance into practice. They are different, yet complementary.[iii] An organization needs to make use of all these methods because each plays a role in creating, improving, and sustaining high quality data and information within an organization. I have learned that many people know about projects and operational processes. For instance, they think data quality is “done” once a data cleansing project is complete. Many people are unfamiliar with the idea of an on-going foundational program that provides services, knowledge, skills, tools, and techniques related to data quality and governance that can be utilized by multiple projects and operational processes. I have experienced that without the glue of a program, it is impossible to sustain data quality over an extended period of time.
Ten Steps to Data Quality
7) One thing that has impacted my own career is the development of my methodology: Ten Steps to Quality Data and Trusted Information™ (Ten Steps™).[iv] The Ten Steps are based on much learning from others with data quality expertise and my own experience. They were developed as a practical, actionable guide to assess, improve, and create information and data quality: Step 1 – Define Business Need and Approach, Step 2 – Analyze Information Environment, Step 3 – Assess Data Quality, Step 4 – Assess Business Impact, Step 5 – Identify Root Causes, Step 6 – Develop Improvement Plans, Step 7 – Prevent Future Data Errors, Step 8 – Correct Current Data Errors, Step 9 – Implement Controls, and Step 10 – Communicate Actions and Results. This is not a linear process, but a highly iterative, pick-and-choose approach in which applicable steps, activities and techniques are applied based on need. It is an approach that I and others have found helpful—both in conceptual knowledge of data quality and in practical application.
8) Early in my career I was passionate about data quality. After all these years, I still am. However, at the time I didn’t understand why everyone else didn’t get it. I knew it was important to have high quality data, but had a hard time verbalizing why. I would request funding, people, or support to address data quality needs. Managers would ask “Why is data quality important? What impact do data quality and governance have on the organization? On me and my responsibilities? Why do they matter?” I tried not to show my irritation, but I was feeling prickly inside. Another great “aha” moment for me was the realization that when managers or executives asked those questions, they were just doing their jobs. If that was the case, what was my job? My job was to be able to answer those questions! The irritation went away. I could then use the energy that was going into being annoyed and frustrated into figuring out how to answer those questions — a much more productive use of my time and energy.
It Takes a Village
9) You have probably heard the old saying that it takes a village to raise a child. Well, it also takes a village to create, manage and ensure high quality data and information. As mentioned, there are many different sections in the information life cycle. Handling it is complex, but doable when you know that all must work together. If we are doing our job well, we understand that every person and organization that touches the data has an impact on the quality – for good or ill. We understand that those in both the business and IT sides of the house have a contribution to make. We include our business partners inside and outside of our organization. We value the vendors who have invested the time, money, and people to provide the tools that make our jobs easier. We appreciate all of them, make use of their knowledge, and appropriately include them in the work.
The Small World of Data Quality
10) Most of us are familiar with the phrase “It’s a Small World” made famous by the Disney song of the same name. We hear it often enough that it feels trite, but I have learned to appreciate the real meaning behind it. I have taught and presented internationally about data quality over the years, but initially only in English and to audiences where English was well understood. The first time I taught and presented to a non-English speaking audience using translators was in Brazil where the attendees heard my words through their headsets with simultaneous translation into Portuguese. I believed in what I was sharing, but was unsure how it would translate cross-culturally. I saw that the core concerns of data quality transcended language, culture, and country. It was exciting to realize that no matter the background of those I was working with, the things I was bringing to the plate were fundamental, and could be applied to situations everywhere throughout the world. Data quality became, in my mind, a unifying force. Brazil taught me that. I continue to present in other countries, sometimes with translation into other languages. I like to joke that data quality people will be the ones who bring peace to the world because we all speak the same language—data quality. I have made many friends and acquaintances abroad, and have gained an appreciation for the depth and unity of the international data quality community. It truly is a small world.
So what’s next?
Changing the Cocktail Party Conversation
I have a vision for future conversations at gatherings with non-data people. When I say that I am a data quality professional, people will nod with immediate understanding. They will know what that means — in the same way that people generally know what it means to be an accountant, attorney, or engineer. When my vision becomes a reality it means that the management of data has been incorporated into the fabric of our society. People understand that data and information are important; they understand that it takes expertise, knowledge, education, and training to manage them appropriately. There will be hundreds or thousands of degree programs in this field at universities around the world. Today, every organization knows that there needs to be a management system around money (it is called Finance) and people (it is called Human Resources). No organization can be successful without effectively managing these resources. In the future, I hope every organization will know that they also need a management system around data and information to be successful.[v] One of the missions of my career is to bring us closer to that day. Do you think we can make it in the next ten years?
You’ve heard some of my lessons and visions. What lessons have you learned? What visions do you have? Feel free to share!
[i] See Improving Data Warehouse and Business Information Quality by Larry English (John Wiley & Sons, 1999), pp. 200-209.
[ii] See Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information™ by Danette McGilvray (Morgan Kaufmann, 2008).
[iii] See Danette’s chapter on “Data Quality Programs and Projects” in the book Handbook of Data Quality: Research and Practice. Shazia Sadiq, editor (Springer, 2013) and “Data Quality and Governance in Projects: Knowledge in Action”, a Cutter Consortium Executive Report by Danette McGilvray and Masha Bykin, at http://www.cutter.com/offers/dataqual.html.
[iv] See Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information™ by Danette McGilvray (Morgan Kaufmann, 2008).
[v] See Data Driven: Profiting From Your Most Important Business Asset by Thomas C. Redman (Harvard Business Press, 2008)