Published in TDAN.com January 2005
What do you think of when you hear the word “red”? I would venture that, given a room full of people, we would get a room full of descriptive answers. Certainly some of these answers
might be literal, such as a description of the specific characteristics of some shade of the color red, while some might refer to more figurative ideas, such as “warmth,”
“love,” “danger,” “stop,” etc. Not only that, depending on the context in which the concept is put to use, there is a sliding scale of any requirement for
precision of definition. For example, if we were considering painting our office’s walls, there is a significant difference between red, maroon, berry red, or barn red. But when looking at a
traffic light, should you replace one of these shades with another, people will still stop when that color is lit.
Through this simple exploration of the disparity in perception of this relatively simple concept, we can see the potential difficulty when we try to nail down more complicated concepts that occur
in various business contexts. It is the context and the application that dictate whether the physical string representation of a concept indicate two separate objects or just one. And while people
can discern both the similarity and distinction between two things, (despite the words we use to refer to those things), computers have a more difficult time at making the distinction.
Consider this: if the CEO of your organization asked you on the spot to tell him how many customers your company has, would you be able to tell him? I can’t say how many times in training
seminars that when I have asked people to provide a definition of “customer,” I immediately hear numerous groans reflecting the time invested and the frustration they have experienced
struggling with this exact issue back at the office. And that is just for one concept!
This notion, regarding semantics and naming, become a big deal once one attempts to consolidate data, either through some kind of Master Reference Data, Enterprise Information Integration, or
Customer Data Integration project, or any time data is to be exchanged, and it is a significant issue that must be addressed as part of a metadata strategy. The core principal revolves around the
ability to distinguish terms used to represent business concepts and those terms used to portray manifestations of those business concepts.
About 2 years ago, I wrote an article for TDAN about data value domains, and since our company’s focus is in information quality, accurately defining and modeling data value domains as a part
of a framework for validating information against business rules is a prime activity in which we are engaged with our customers. Consequently, we spend a lot of time looking for good ways to
describe and deploy data domains within a reasonable metadata context.
One good treatment of data value domains provided within the ISO/IEC 11179 Metadata Registries standard (see http://metadata-stds.org/11179) seeks to further refine an approach to data value metadata, and attempts to characterize the differences and similarities
between data elements, value domains, data element concepts, and conceptual domains (all definitions are taken from Part 1 of the 11179 Standard):
- A data element is a “fundamental unit of data (that an) organization creates, manages, and disseminates.”
- A value domain is the “set of permissible (valid) values (for a data element).”
- A data element concept is “the concept of which data elements form its extension, without reference to a specific value domain.”
- A conceptual domain is “the concept of which value domains form its extensions, without reference to a specific value domain.”
To illustrate these ideas, a data element is an indivisible object bound to a representation (possibly incorporating a value domain, data type, unit of measure, and a format specification), such as
a “Country Code,” a “Street Address,” or a “Product Identifier.” A data element concept refers to the perception of the data element. For example,
“Employee Compensation” may evoke the understanding of an amount of money given to an employee, but remains a data element concept because it is not specified in terms of a currency or
a time period over which the employee is paid, or whether that amount is paid as salary, bonus, holiday turkeys, or other benefits.
Similarly, there is a difference between a value domain and a conceptual domain. The notion of World Countries is a conceptual domain, consisting of the set of world countries (Afghanistan,
Albania, .., Zimbabwe, etc.). The domain is conceptual, though, because the representation of permissible values is not specified. Alternatively, we might have multiple value domains associated
with the conceptual domain of World Countries:
- Full Names (“AFGHANISTAN”, “ALBANIA”, .., “ZIMBABWE”)
- ISO 2-Character Country Codes (“AF”, “AL”, .., “ZW”)
- ISO 3-Character Country Codes (“AFG”, “ALB”, .., “ZWE”)
- ISO 3-Digit Country Codes: (“004”, “008”, .., “716”)
Each value within each of the four value domains has a one-to-one mapping to a value in the conceptual domain. And there are certainly other value domains that could be mapped to the conceptual
domain, such as graphic images detailing the shape and borders of each country. What is important is that as long as we know the context associated with a specific value (i.e., the associated
conceptual domain), we can determine what that value represents. In other words, if we see the value “716,” then as long as we know that the value domain is ISO 3-digit Country Codes
and the conceptual domain is World Countries, then we know the value refers to the concept of “Zimbabwe,” excluding any other meaning.
This method of documenting data domain metadata reflects an approach of successive refinement of concepts as a way to more effectively provide both a precise and an accurate definition and
delineation of data value domains and their corresponding contexts. Trying to identify the difference between the concept (e.g., countries) and the numerous ways of representing instances of each
concept provides value in determining how different kinds of business rules are to be applied in different data environments. In instances where some of our clients have been stymied by the
definition process, our company has had some success in using the ISO-11179 notions in introducing metadata concepts and guiding people though the metadata process.
Copyright © 2004 Knowledge Integrity, Inc.