This is a follow-up article to last quarter’s “A Step Ahead: Categories – Boon or Bane.”
There’s an old cliché that says, “To a hammer, everything looks like a nail.” I think this is very true of data and how we view it. It’s difficult to get out of our box and view data differently when we’re used to our paradigm and data worldview. However, when we explore the details of the data, we may butt up against the limits of our thinking and if we are not careful, we’ll ignore the limits and keep going. In this column, I’m going to look a little closer at how we deal with paradigm mismatches and propose a solution. Can more than one paradigm peacefully coexist?
One of my least favorite topics in categorizing data is the notion of hierarchy. It really represents a point of view. The conflict of hierarchy vs. relationships can be seen in the object/relational database debate several years ago. Relational was the paradigm of the data modeler and object-oriented was that of the developer. The two ways of viewing data clashed: Objects are not tables/entities. Objects are inherently hierarchical and in relational theory, there is no good way to represent a hierarchy. The interesting thing is, relational theory came about from a rebellion against the hierarchical databases, which were extremely difficult to modify. If you wanted to add an attribute you would have to redesign the whole structure.
What is the issue? With hierarchies, there is no way to relate two entities in any way other than that of a hierarchy. With relational, there’s no nice way to implement a hierarchy. There is a way that the two can possibly coexist, with graphs or ontology, but that is another discussion for another column. This column will look at a few challenges that go along with hierarchies and things to watch out for.
What Hierarchies Are Good For
Hierarchies show up in all sorts of category structures, such as organizational charts, product lines and website designs. Hierarchies often imply inheritance, which enables attribute values to be passed down from the parent to the children. A product hierarchy is a good example of this. A parent category might specify the value of an attribute of number of wheels as 4, and all the sub-categories within this parent category all have the same number of wheels.
Accounting uses this type of structure to break down various costs. A department store may ask, how much was sold in the various departments: Housewares? Women’s Clothing? Makeup and Beauty? Etc. Those of us who designed data warehouses back in the day remember this well. You might have departments and then sub-departments. Within Housewares, what portion of sales did cookware represent? This type of analysis was critical for sales predictions dictating inventories and store shelf design.
When to Have Caution!
Hierarchies are also used for website design and knowledge management. Websites often use taxonomies as their backbone. A taxonomy is a system of classification that imposes a hierarchy. A common example is the taxonomy of organisms.
However, there are two different types of hierarchies: Exclusive and Inclusive. An Exclusive hierarchy means that an individual element can only belong to one parent. An Inclusive hierarchy means that an individual element can belong to more than one parent. Object-oriented people refer to “multiple inheritance” and it can cause difficulties; some object-oriented systems do not allow it.
Here’s how it can get messy.
Suppose the website is for a women’s clothing line. The company sells tops, dresses, skirts, and pants. There are sub-categories that further break out into child categories; for example, pants may be subdivided into shorts, capris, and trousers (long pants) and skirts may be classified by length: mini, midi, and full-length.
But what do you do with “skorts”? Skorts are shorts that look like skirts. They can be classified as either.
The taxonomy designer must always ask: “What is the purpose of the hierarchy?” If it is to assist the website browser and maximize the findability of an item, you want to allow for the item to be found in the maximum number of places on the site. However, if it is to aggregate the sales and present total sales per department, there is the potential that the skort sales will be counted twice: once as skirts and once as shorts. This means you must use Exclusive hierarchies which count each individual item in one and only one category.
Last quarter’s column discussed the conundrum of sorting papers: Because it is a physical reality, each paper can only exist in one and only one folder. The case of counting sales for products that can show up in multiple categories is a similar issue. What do you do? I think the solution is the reverse of the paper filing problem. With paper, the solution is usually to abstract up: Create a more general category and store the papers this way. With product accounting, the solution is probably the alternative: Abstract down and create more granular categories. For example, most clothing stores have “tops” and “bottoms,” the latter divided up into pants, shorts, skirts, and skorts, with skorts being their own category. However, findability on the website you’ll want skorts to be found in both shorts and skirts, as well as its own category of skorts.
This hierarchy problem used to show up in lots of data warehouses. When you are designing categories and using a hierarchal structure, you must make sure you specify inclusive vs. exclusive hierarchy.
This is one of the many reasons why I don’t particularly care for hierarchies. Another reason is there’s a lot of data that doesn’t nicely fit into a hierarchy. I like to say: “Not everything is a hierarchy!” This is why I like graphs. Maybe I can be talked into writing another article about why graphs help this problem!
©2024 The MITRE Corporation. ALL RIGHTS RESERVED. Approved for Public Release; Distribution Unlimited. Public Release Case Number 23-01095-5.