How many words are there in the English language? The Oxford Dictionaries answers this question with a very strong “it depends”: it depends on how you count them. The number varies from 171,476 (the number of full entries in the second edition of the 20-volume Oxford English Dictionary) to a quarter of a million, to about three-quarters of a million if you count different senses of a word as different words.
With so many words to choose from, there ought to be a distinct word for every task, right? It turns out that this is not so. Despite having so many words to choose from, we routinely use one word for multiple purposes. According to an NPR interview with author Simon Winchester, the word “run” has 645 meanings! As another example, take the word “set”—one that we really can’t live without in our daily data lives. I counted 95 definitions for this word in Webster’s Online Dictionary.
These are extreme examples, but it turns out that almost any ordinary English word in daily use has multiple definitions.
So then, why is it that our business glossaries support exactly one definition for any given term? If natural languages, like English, depend on multiple meanings for ordinary words, then limiting a business term to exactly one definition is, well, unnatural.
Not long ago I was collecting business glossaries across a company in order to integrate and synchronize them, in the interests of tightening up our definitions of homegrown terms, and especially improving communication between groups that often have difficulty understanding each other. Wouldn’t it be great, we thought, if the customer support people and the technical staff could understand each other?
Thus, I came across these two definitions of “service”:
(Architecture glossary): a self-contained piece of software that performs a single function with a specified outcome
(Customer support glossary): don’t say “service”; say “product”
It would clearly be impossible to synchronize these two definitions into one entry in the glossary. We can’t tell the technical staff to drop “service” from their vocabulary, and we can’t train customer support personnel to tell customers that a “service” is a piece of software.
Take another favorite term in business glossaries: “customer”. Most people who’ve circulated among the data profession for any length of time know the holy wars often fought over the meaning of this word. Here’s the secret to ending the fighting: Support multiple definitions. In one financial services firm, the retail brokerage business decided that, for sales purposes, a person should continue to be considered an active customer up through 12 months after the last transaction they conducted; after that, they would be considered inactive. That was lovely, until the anti-money-laundering (AML) team came along and had to scan all active customers’ transactions, but “active customer” was anyone who had conducted a transaction in the last 13 months! Which group was wrong? Neither. Both were right. They just needed a mechanism that would allow them to record both definitions of “active customer”, and to choose between them for the context at hand.
That is why I set up our Wiki-based enterprise glossary so that, just like a natural-language dictionary, each term could have multiple numbered definitions. Two entirely different definitions for “service” sat side-by-side. A person who had heard the term “service” used in an unfamiliar way could go to the glossary, see both definitions, and pretty quickly figure out which meaning was relevant.
By numbering the definitions, we could have distinct URLs referencing particular definitions, so that documentation could hyperlink to exactly the right sense of the term it was using. This is very similar to the way an ontology (a representation of knowledge) provides an IRI (think of it as a URL) that distinctly represents a particular meaning of an object or a predicate that it is using. In fact, it paves the way for business glossaries and ontologies to support each other.
So, then, back to the question I posed above: Why does all business glossary software that I am aware of at this moment support only one definition per term? I guess that it is because of a misguided belief that the way to get to a common understanding of business systems, and a common vocabulary, is to insist that each term have only one meaning. This is a vain hope, as experience proves and natural language evidences. It is far better to record every known meaning of a business term. Then the glossary can be used to:
- Support every legitimate use of a term and eliminate the confusion when one term really does have multiple meanings
- Document any banned vocabulary and the realms in which it is banned (like “service” in the customer support area)
- Identify two side-by-side definitions of a term that are so close in meaning that they could be collapsed into one
Now, if business glossary vendors caught this vision, they could deliver more valuable products to their customers and even create linkage between business glossaries and ontologies—a powerful combination indeed!
This monthly blog talks about data architecture and data modeling topics, focusing especially, though not exclusively, on the non-traditional modeling needs of NoSQL databases. The modeling notation I use is the Concept and Object Modeling Notation, or COMN (pronounced “common”), and is fully described in my book, NoSQL and SQL Data Modeling (Technics Publications, 2016). See http://comn.dataversity.net/ for more information.