In the previous post, we discussed the emerging role of standards and the business drivers that influence an organization’s approach to their adoption. In this post we focus on the details of the standards themselves.
Let’s first have a look at what the standards say. I tend to look at ISO 8000 as an umbrella standard as it is inextricably linked to two other standards: 1) Standards about how your data should be described (ISO 22745), and 2) how the information about your data should be exposed to data sharing partners (ISO 11179). These standards really address how organizations manage the information about their metadata– in other words the metadata about the metadata.
Think about it, when you are pulling together data for a report or an analytical project, you have two questions you need to answer: 1) what data do I need; and, 2) do I have “good” data? Both of these questions are directly impacted by these standards. The first question is answered because you have data that is well labelled. The meaning of the data is communicated because the data is fully described (labelled) using terms that are contextual. The second question is answered because you now have a comprehensive definition of what is “good” to run your data quality process against.
A brief description of these standards is provided below. Note that these are comprehensive standards – each with many chapters (in ISO Speak – “parts.”) The information provided below focuses on key elements that impact the supply chain discussion. The business case for each organization will drive which concepts are adopted and the degree to which the full scope of the Standards are implemented.
ISO 8000 :: The Metadata About my Data Dictionary
ISO 8000 exists to enable organizations to share data with the knowledge that they are sharing high quality data. Specifically the focus is on supply chain master data – although there is no reason why the standard should not be applied more broadly.
Within the supply chain context, the use case centers around supply chain managers:
- Describing in detail what they require in a product;
- Communicating those requirements to a supplier (internal or external); and,
- Receiving information from the supplier.
The standard is focused on how organizations:
- Systemically package and communicate information about their data – making data portable
- Create metadata that enables evaluation of conformance with a set of specifications
- Ensure that data is labelled completely and correctly with respect to: Syntax, Semantic Encoding, Conformance to requirements, Provenance, and Accuracy
ISO 22745 :: The Data Dictionary Described by ISO 8000
This standard describes what makes up the information that should be in a data dictionary in order to have shareable data required in the above use cases. These dictionaries are referred to as “Open Technical Dictionaries” as they are expected to be used – or available – across all supply chain participants.
A critical aspect of this standard is its requirements for the use of unique concept identifiers to create unambiguous language-independent descriptions of individuals, organizations, locations, goods, services, processes, rules and regulations.
If we use the retail environment as an example, “Concepts Identifiers” would identify linked vocabularies organized around retail product categories. Take Clothing for example. To describe products that are clothing products, one might want to label them as either “summer wear” or “winter wear;” likewise it would be useful to know whether these were men’s clothes or women’s clothes. Organization of vocabularies in this manner are known as ontologies. This article by Kurt Cagle in Forbes provides an excellent overview of ontologies and why they are important.
The process of using concept identifiers from an external open technical dictionary is a form of semantic encoding compliant with ISO 8000.
ISO 11179 :: The Registry That Holds the Concept Identifiers Required in Your Dictionary
This standard is broader in scope than simply the holder of concept identifiers. One can think of a Registry as a metadata repository that holds the linked vocabularies used to semantically describe the data in the Data Dictionary. The data dictionary entry is linked to a term in the Registry so that the entry can be fully described “in context.” For example, a supply chain manager for a retail store might have a product category called “Formal China”. In this instance “China” would be linked to a concept called “Tableware”. If the retail store also handled clothing, there may also be a “Formal China” entry that referred to attire used by Chinese people for formal occasions. In this case, “China” would be linked to a concept term in the Registry called “Country”. Using this approach logic can be coded into the data management software to execute master data rules incorporating semantic knowledge.
A key concept in 11179 is that the Registry that holds the data is exposed and can be accessed by third parties. Registries can be maintained and published by a buyer, a vendor, or by a third party (generally an industry association or a governing body).
So What is Holding Us Back?
On the face of it, these ideas all make sense. We should clearly adopt these practices now as it will improve the flow of high quality information up and down our supply chain. The challenge is that data that is labelled or curated to this level often requires a significant upgrade in capabilities. Data classification, the management of hierarchically linked data vocabularies, and the management / governance discipline to make all this work are likely to require an evolution in data management maturity. I revert back to my comment in the last post – be incremental. Build out a roadmap that not only aligns to your value proposition, but also aligns to your ability to evolve your data management and governance operational model. There is no point investing in tools if you cannot apply your new capability to your supply chain operations.
For many, the capability to build out the data classification and vocabulary management capabilities presents the biggest unknown. The next post addresses the evolving role of machine learning to simplify and automate the management of data quality within the Supply Chain.