
I’m sure that you know an interesting story about dating. Finding that right person is not easy and the journey often includes many missteps along the way. As interesting as such stories might be, it is not the dating I want to talk about in this issue. If you are reading this article, then I am assuming you are a data person with some data responsibilities or accountabilities.
Having worked with data for many years, I find that almost all data people share some common sentiments. Perhaps the most prevalent is that if you are going to acquire and use data, it should be of high quality. It would be very hard to find someone in the data world who would disagree. Data needs to be of high quality. And now in the age of AI, the concerns about the quality of data used to build AI models are even greater. But that begs the question: What is data quality and how do we measure it?
The immediate answer that most of us would give is that data has fitness for use or are fit for purpose. While this is a nice succinct answer, it immediately leads to another question: What does fitness mean? The fitness for use answer comes from the total quality management (TQM) movement that revolutionized manufacture at the end of the last century. It is a quote from Joseph Juran, one of the founders of the TQM movement.
But what most people don’t realize is that Juran was talking about the final product or service being built. Fitness means you are building a product that customers want, a product or service that creates value. For example, in terms of data products, say a language model. We could ask: Is the language model doing what we intended, such as composing individually tailored email messages to our customers that result in responses. If it is not, then why not? That might lead us to ask, is it because the data used to build or train the model has some deficiencies, i.e., not of high quality. At this point on the input side, we get to the measurable features of the data and a different definition of data quality.
These features of data are what we call data quality dimensions. I discussed some of the different dimensional frameworks and specific dimensions in more detail in my August 2025 article, “Is Your Data Quality Management Practice Ready for AI.” Based on this reasoning, another and sometimes more useful definition for data quality is data that meets requirements expressed in term of data quality dimensions. This is in keeping with the ISO 9001 standards defining “Quality is meeting requirements,” so an alternate definition to Juran’s is “Data Quality is meeting data requirements.
One of the dimensions of data quality is consistent representation, one of the most prevalent deficiencies in the data held in organizations. Data issaid to have consistent representation in a specific domain when a given data value always has the same meaning, and conversely, a given meaning is always represented by the same data value. Data consistency is aligning data syntax with data semantics. For example, in the domain of U.S. addresses, the data value “AR” means the state of Arkansas when using the ISO 3166 Standard for representing the names of countries and their subdivisions. Conversely, when referencing the state of Arkansas in this domain, the data value should always be “AR.” This is a measurable requirement.
By the way, I have a question here. If I feed a language model only address data where there was 100% conformance with ISO 3166 state codes, then how would the model know to correct the state value “Ark” to “AR” if it had never seen this older abbreviation? Perhaps the data quality dimension that most model builders are most interested in is accuracy, not validation. For more discussion on this point, please see my article “Data Validation – Data Accuracy Imposter or Assistant?”
So now, let’s turn our attention back to dating. The ISO 8601 Standard for date and time representations for information interchange requires that calendar dates be represented in the format YYYY-MM-DD. For example, 2026-01-10 to represent January 10, 2026, instead of 1/10/2026 in U.S. format or 10/1/2026 in European format. The ISO format is very clear and unambiguous.
Despite this being one of the simplest and easiest data standards to use, I rarely see any data people following this standard in their day-to-day activities. Even though they might expect their data team to do it, they don’t do it themselves. When signing a document, writing a memo, writing a report, or giving a PowerPoint presentation, how hard can it be to express the dates in the ISO format?
You might ask, “Why would you want to do that?” My answer: To make a clear statement that you are a data person; that you believe in data, data quality, and data standards. Who knows, maybe doing this could start a great conversation about data standards, data requirements, and data quality, which in turn could even lead to a date and the start of a great relationship. It has certainly happened to me. I mean starting the conversation part, not the dating thing. Please consider adopting the habit of following the ISO 8601 standards for dates to show your commitment to data quality.
