Business Metadata and Enterprise Search

Published in TDAN.com January 2007

The following is an excerpt from Bonnie’s new book Business Metadata, to be published by Morgan Kaufman in 2007, co-authored with Bill Inmon and Lowell Fryman.

How well are ideas in the “knowledge base” represented so they can be found? And how easy is it to express a search in a way that a computer can locate exactly the type of information you are
looking for? This is also a form of communication, though often overlooked.


Are Most Searches Successful?

Here are some quick facts that IDC research found:

  • The typical knowledge worker spends from 15-35% of their time searching
  • 50% or less of all searches are successful
  • Only 40% of survey respondents reported that they were able to find what they needed on their corporate internet [Feldman, 2001]

It should be noted that the IDC study was conducted in 2001 and again in 2003. A more recent survey was conducted in 2005 by the Center for Media Research and found the following:

  • Average worker spends 30% of time spent on search
  • Cost of $18,000 each year per employee in lost productivity
  • Average of $5.4B lost hours for typical US corporation [BEA]


Find-ability Makes a Difference

It can therefore be concluded that the ability to find information is not just a “nice to have” but it drastically affects the bottom line. How, therefore, can it be improved? Since this book is
about Business Metadata and not about search tools, we are next going to focus on the impact of business metadata to the search experience.


How Business Metadata can Help Search


Improve File and Web Page Names

One of the most basic ways of improving the find-ability of information on the internet is to improve the names of documents; file names are a very important piece of business metadata. A wonderful
short article by Kevin Hannon of InfoCurators [Hannon, 2004] illustrates how searches won’t be able to hit their target due to the nonsensical names that the files have. Such ridiculous,
meaningless names make the files virtually invisible over the corporate intranet. Here are a few examples that he sites, but I’m sure we all have a few of our own that are just as crazy:

  • Acme covermth
  • Acme’s Annual Report
  • Microsoft Word-006702_1.doc
  • Technical field document that is really a pdf
  • Microsoft Word-06682_0.doc
  • Safety requirements for a laser product


    Know Who the Searchers Are

    Hannon provides an example of naming web pages and illustrates how pages will not be returned, based on who is doing the searching. He shows how using the company name as a prefix in a web page
    name will not cause the page to come up on most search engines because the beginning of the name is primary. The example he uses is a cancer information site. Only doctors would append the term
    “disease” after the name of the ailment, so including this term in the page name will increase hits for doctors doing searches but decrease searches by non-medical practitioners. If the aim of
    the website is to provide information to the medical community, then this naming technique will probably work just fine. However, if the goal is to provide information to the public at large and to
    disease sufferers and their families, perhaps this term should be left off the page name.


    Classification

    As we have seen, the title is a very valuable piece of business metadata. Even the individual words that are the components of the title are business metadata. It could be said that the file name
    or article/web page title may be the most important piece of business metadata because it represents the primary way that things are found. However, second to this is how documents or files can be
    classified: what is the topic of the file or article? What topics are related to it? Can I find not just the specific article I’m looking for, but related ones? The next section illustrates how
    this can be done.


    Taxonomy

    To most people, the word “taxonomy” (if not confused with stuffed animals-“taxidermy”) means the classification of the animal kingdom. This is perhaps the most famous example of a taxonomy, but
    fails to illustrate how in this modern, internet world taxonomies can be useful to searching for information. A taxonomy is a classification system that can guide how web pages are organized on a
    website, and can help customers quickly navigate the site and pinpoint items they’d like to purchase quickly and easily.

    For example, say I’d like to purchase a purple skirt. I go to a department store’s website and I first distinguish I’d like to look at the section called Women; see Figure 1. Next, I pick
    Skirts. Notice that “Skirts” are not a featured item like Dresses, Coats, Suits or Jeans are. Oh well. I then click on Skirts on the left hand side; see Figure 2. Next the Skirts part of the
    online catalog is displayed; see Figure 3. There are some user-controlled options, but color is not one of them. A really good website would allow me to type “purple skirt” in the search window
    and it would bring up all the skirts that have purple in them. When I did this in the Macy’s site, it brought back two choices that sure looked like pink to me, and not purple!

    Figure 1 Highest Level Hierarchy: Women
    Figure 2. Selecting “Skirts” on left side of screen
    Figure 3. Browsing Skirts at Macy’s

    As we have seen in the Macy’s example, the taxonomy organizes the items by their classification, and enables navigation by beginning with a broad term and allowing the user to get more and more
    specific navigating through the categories until the precise item is located.


    Business Taxonomy

    Zach Wahl, internationally known taxonomy guru, makes the distinction clear between a strict, formal taxonomy (like classification of the animal kingdom or pharmaceuticals) and a business taxonomy.
    The purpose of a business taxonomy is for website navigation and to facilitate “findability”. Mr. Wahl says:

    …a successful business taxonomy must be designed for intuitive browsing by end users. Design at every stage of the business taxonomy must, therefore, consider whether the average user will be
    able to understand both the terms and the hierarchy of the taxonomy and react to it in a meaningful and consistent manner. If this is done effectively, the end user will receive a powerful
    ‘findability’ tool, enabling them to discover information through browsing the taxonomy and view information in an intuitive and consistent manner. [Wahl]

    We discussed earlier that you should know who your searchers are. This is key in designing any taxonomy. However, since the web invites an endless variety of users from all walks of life, the
    taxonomy must be understandable to all.


    Lowest Common Denominator Factor

    The example from the above section about names from the medical community with both doctors and patients/family members searching for information about diseases highlights an important factor to
    keep in mind when designing taxonomies: The Lowest Common Denominator. Instead of having separate taxonomies for distinct communities that take into account specific technical vocabulary
    differences, try using the simplest terms that everyone will understand. More from Zach Wahl:

    …the business taxonomy must be explained with simple terminology that avoids jargon or technical complexity that could confuse potential users. When considering the terms for a business taxonomy,
    the designers should identify the “lowest common denominator” if user types and build using terms and topics that will immediately resonate with them [Wahl].


    Simple is Best

    Business taxonomies used for searching should be simple. The top list of categories should be very broad and in easy-to-understand language for all users. Whereas a technical hierarchy might go as
    many as 12 levels deep, Mr. Wahl points out that business taxonomies usually consist of about 8 top-level categories and each category has no more than 3 sub-categories. Remember, too many clicks
    chases the users away! Mr. Wahl sums this up by saying

    In other words, the business taxonomy sacrifices detail for usability and consistency [Wahl].

    There is no universal, “right” way to build a taxonomy. There are, however, principles or guidelines that can help, and the most universal guideline is know your search community. A taxonomy
    should also always be considered “under construction” and the search experience should continually be monitored and improved over time. If people are frustrated by the search experience on a
    company’s intranet, it will not be used, and valuable time will be wasted as employees run around asking each other for documents when they could have found them quicker online.


    Correctly Categorizing Documents for Search

    Where does a document belong in the search hierarchy? Who decides? A document may be authored or owned by one division of the company but perceived as being part of another functional area, for
    search purposes. For example, Hannon [Hannon, 2005] makes a point that Tuition Reimbursement policies may be set by Finance but are considered related to HR, so the average searcher would expect to
    find tuition reimbursement guidelines under the HR category; same with something like Affirmative Action policies, which may be written by the legal department, but should be searchable under HR.
    Therefore, the originator of the document may not always be the appropriate place for it to reside for search.

    Who decides where documents belong is a decision for Governance.


    Governance and Taxonomy

    Zach Wahl points out that it is critical to have a cross-disciplinary team assisting in taxonomy creation [Wahl]. Every class of user type should be represented. This ensures that all types of
    users’ needs will be considered in the construction of the taxonomy. This team can form the governance body for the taxonomy. The main point is to get user involvement and buy-in from the business
    so they have some ownership and don’t write off enterprise search failure as an “IT thing”.

    Governance also helps decide where documents belong, and what classifications are appropriate. The author usually thinks he or she has a good idea of where the document belongs, but he or she may
    not understand how a user might go about looking for the document. Tools can be very helpful to decipher what documents are about.

    Documents that are consistently being mis-categorized are signs of a faulty, confusing taxonomy. For this reason, the website’s search metrics should be monitored periodically. If there are many
    abandoned searches, this could be an indicator that the taxonomy is hindering, not helping, the search process.


    Conclusion

    Business metadata plays a very important part in enabling enterprise search. Any descriptive information about files or data, which is in the language of the business, is business metadata.
    Therefore, the name of the file is a very key piece of business metadata, as is any classification scheme like a taxonomy.

    ——————————————————————————–


    References

    • Hannon, Kevin. “Metadata is Money”. InfoCurators, 2004. Available on request from www.infocurators.com.
    • IDC, European Knowledge Management Fact Book, IDC #21511, January, 2000.
    • Wahl, Zach, “Masterclass: Business taxonomy, Part I”, Inside Knowledge, Available by subscription: http://www.ikmagazine.com/

    Share

    submit to reddit

    About Bonnie O'Neil

    Bonnie O'Neil is a Principal Computer Scientist at the MITRE Corporation, and is internationally recognized on all phases of data architecture including data quality, business metadata, and governance. She is a regular speaker at many conferences and has also been a workshop leader at the Meta Data/DAMA Conference, and others; she was the keynote speaker at a conference on Data Quality in South Africa. She has been involved in strategic data management projects in both Fortune 500 companies and government agencies, and her expertise includes specialized skills such as data profiling and semantic data integration. She is the author of three books including Business Metadata (2007) and over 40 articles and technical white papers.

    Top