The following is an excerpt from Bonnie’s new book Business Metadata, to be published by Morgan Kaufman in 2007, co-authored with Bill Inmon and Lowell Fryman.
How well are ideas in the “knowledge base” represented so they can be found? And how easy is it to express a search in a way that a computer can locate exactly the type of information you are
looking for? This is also a form of communication, though often overlooked.
Are Most Searches Successful?
Here are some quick facts that IDC research found:
- The typical knowledge worker spends from 15-35% of their time searching
- 50% or less of all searches are successful
- Only 40% of survey respondents reported that they were able to find what they needed on their corporate internet [Feldman, 2001]
It should be noted that the IDC study was conducted in 2001 and again in 2003. A more recent survey was conducted in 2005 by the Center for Media Research and found the following:
- Average worker spends 30% of time spent on search
- Cost of $18,000 each year per employee in lost productivity
- Average of $5.4B lost hours for typical US corporation [BEA]
Find-ability Makes a Difference
It can therefore be concluded that the ability to find information is not just a “nice to have” but it drastically affects the bottom line. How, therefore, can it be improved? Since this book is
about Business Metadata and not about search tools, we are next going to focus on the impact of business metadata to the search experience.
How Business Metadata can Help Search
Improve File and Web Page Names
One of the most basic ways of improving the find-ability of information on the internet is to improve the names of documents; file names are a very important piece of business metadata. A wonderful
short article by Kevin Hannon of InfoCurators [Hannon, 2004] illustrates how searches won’t be able to hit their target due to the nonsensical names that the files have. Such ridiculous,
meaningless names make the files virtually invisible over the corporate intranet. Here are a few examples that he sites, but I’m sure we all have a few of our own that are just as crazy:
||Acme’s Annual Report|
||Technical field document that is really a pdf|
||Safety requirements for a laser product|
Know Who the Searchers Are
Hannon provides an example of naming web pages and illustrates how pages will not be returned, based on who is doing the searching. He shows how using the company name as a prefix in a web page
name will not cause the page to come up on most search engines because the beginning of the name is primary. The example he uses is a cancer information site. Only doctors would append the term
“disease” after the name of the ailment, so including this term in the page name will increase hits for doctors doing searches but decrease searches by non-medical practitioners. If the aim of
the website is to provide information to the medical community, then this naming technique will probably work just fine. However, if the goal is to provide information to the public at large and to
disease sufferers and their families, perhaps this term should be left off the page name.
As we have seen, the title is a very valuable piece of business metadata. Even the individual words that are the components of the title are business metadata. It could be said that the file name
or article/web page title may be the most important piece of business metadata because it represents the primary way that things are found. However, second to this is how documents or files can be
classified: what is the topic of the file or article? What topics are related to it? Can I find not just the specific article I’m looking for, but related ones? The next section illustrates how
this can be done.
To most people, the word “taxonomy” (if not confused with stuffed animals-“taxidermy”) means the classification of the animal kingdom. This is perhaps the most famous example of a taxonomy, but
fails to illustrate how in this modern, internet world taxonomies can be useful to searching for information. A taxonomy is a classification system that can guide how web pages are organized on a
website, and can help customers quickly navigate the site and pinpoint items they’d like to purchase quickly and easily.
For example, say I’d like to purchase a purple skirt. I go to a department store’s website and I first distinguish I’d like to look at the section called Women; see Figure 1. Next, I pick
Skirts. Notice that “Skirts” are not a featured item like Dresses, Coats, Suits or Jeans are. Oh well. I then click on Skirts on the left hand side; see Figure 2. Next the Skirts part of the
online catalog is displayed; see Figure 3. There are some user-controlled options, but color is not one of them. A really good website would allow me to type “purple skirt” in the search window
and it would bring up all the skirts that have purple in them. When I did this in the Macy’s site, it brought back two choices that sure looked like pink to me, and not purple!
As we have seen in the Macy’s example, the taxonomy organizes the items by their classification, and enables navigation by beginning with a broad term and allowing the user to get more and more
specific navigating through the categories until the precise item is located.
Zach Wahl, internationally known taxonomy guru, makes the distinction clear between a strict, formal taxonomy (like classification of the animal kingdom or pharmaceuticals) and a business taxonomy.
The purpose of a business taxonomy is for website navigation and to facilitate “findability”. Mr. Wahl says:
able to understand both the terms and the hierarchy of the taxonomy and react to it in a meaningful and consistent manner. If this is done effectively, the end user will receive a powerful
‘findability’ tool, enabling them to discover information through browsing the taxonomy and view information in an intuitive and consistent manner. [Wahl]
We discussed earlier that you should know who your searchers are. This is key in designing any taxonomy. However, since the web invites an endless variety of users from all walks of life, the
taxonomy must be understandable to all.
Lowest Common Denominator Factor
The example from the above section about names from the medical community with both doctors and patients/family members searching for information about diseases highlights an important factor to
keep in mind when designing taxonomies: The Lowest Common Denominator. Instead of having separate taxonomies for distinct communities that take into account specific technical vocabulary
differences, try using the simplest terms that everyone will understand. More from Zach Wahl:
the designers should identify the “lowest common denominator” if user types and build using terms and topics that will immediately resonate with them [Wahl].
Simple is Best
Business taxonomies used for searching should be simple. The top list of categories should be very broad and in easy-to-understand language for all users. Whereas a technical hierarchy might go as
many as 12 levels deep, Mr. Wahl points out that business taxonomies usually consist of about 8 top-level categories and each category has no more than 3 sub-categories. Remember, too many clicks
chases the users away! Mr. Wahl sums this up by saying
There is no universal, “right” way to build a taxonomy. There are, however, principles or guidelines that can help, and the most universal guideline is know your search community. A taxonomy
should also always be considered “under construction” and the search experience should continually be monitored and improved over time. If people are frustrated by the search experience on a
company’s intranet, it will not be used, and valuable time will be wasted as employees run around asking each other for documents when they could have found them quicker online.
Correctly Categorizing Documents for Search
Where does a document belong in the search hierarchy? Who decides? A document may be authored or owned by one division of the company but perceived as being part of another functional area, for
search purposes. For example, Hannon [Hannon, 2005] makes a point that Tuition Reimbursement policies may be set by Finance but are considered related to HR, so the average searcher would expect to
find tuition reimbursement guidelines under the HR category; same with something like Affirmative Action policies, which may be written by the legal department, but should be searchable under HR.
Therefore, the originator of the document may not always be the appropriate place for it to reside for search.
Who decides where documents belong is a decision for Governance.
Governance and Taxonomy
Zach Wahl points out that it is critical to have a cross-disciplinary team assisting in taxonomy creation [Wahl]. Every class of user type should be represented. This ensures that all types of
users’ needs will be considered in the construction of the taxonomy. This team can form the governance body for the taxonomy. The main point is to get user involvement and buy-in from the business
so they have some ownership and don’t write off enterprise search failure as an “IT thing”.
Governance also helps decide where documents belong, and what classifications are appropriate. The author usually thinks he or she has a good idea of where the document belongs, but he or she may
not understand how a user might go about looking for the document. Tools can be very helpful to decipher what documents are about.
Documents that are consistently being mis-categorized are signs of a faulty, confusing taxonomy. For this reason, the website’s search metrics should be monitored periodically. If there are many
abandoned searches, this could be an indicator that the taxonomy is hindering, not helping, the search process.
Business metadata plays a very important part in enabling enterprise search. Any descriptive information about files or data, which is in the language of the business, is business metadata.
Therefore, the name of the file is a very key piece of business metadata, as is any classification scheme like a taxonomy.
- Hannon, Kevin. “Enterprise Taxonomies: One Large or Many Small?” Information Curators, http://colab.cim3.net/file/work/SICoP/ontac/meeting/2005-10-05/single_vs_multiple_taxonomies_hannon.pdf
- Hannon, Kevin. “Metadata is Money”. InfoCurators, 2004. Available on request from www.infocurators.com.
- IDC, European Knowledge Management Fact Book, IDC #21511, January, 2000.
- Wahl, Zach, “Masterclass: Business taxonomy, Part I”, Inside Knowledge, Available by subscription: http://www.ikmagazine.com/