A Step Ahead: Categories – Boon or Bane?

Do you love or hate organizing papers and objects in your home?

For those that hate it, why do you hate it?

I like the results of organizing: reduction of clutter and ease of finding things. But I hate the process of organizing. The reason usually has to do with the fact that the object or paper can live in only one place at a time. Therefore, if you are making folders to organize papers: What folder does this paper go in? What if the paper’s topic fits in more than one folder? Which do I pick to create a folder? Or if one folder already exists, do I naturally put it in that one, even though perhaps the newer one would fit it better? What if you have something that fits in three categories at once? Then, what do you do? And another problem: What if it doesn’t fit anywhere at all? Do you create a new folder, or do you file it under “miscellaneous”? And then there’s the bother of having to create a new folder. What a pain!

Welcome to my world!

The problem here, of course, is physics. Does all this go away in the digital world? Can I put something in two or more folders? Yes, I can put shortcuts to it. I can relate an entity to multiple entities in a relational model or a graph database, but not so well in a hierarchical one such as an object-oriented model.

But the fact that many things don’t naturally go in one and only one category is an interesting problem. The book “How to Think” by Alan Jacobs has a chapter on lumping and splitting where he discusses this problem. The book explains that there are natural “lumpers,” those who are predisposed to using currently existing categories, and “splitters,” those who prefer to split and create new categories.

I’m going to introduce an idea that most data modelers will recognize immediately: Consider the level of abstraction used in organizing. It’s a spin on the book that came out quite a few years ago called “Everything Is Miscellaneous.” I’m not proposing that you have one folder for all papers; that is chaos. Instead, go to a higher level of abstraction from what you might normally do: Create more general categories. For example, instead of having a folder for each service provider, generalize up. Instead of a folder for Dr. Smith, have one for all medical-related papers. And it would include not only bills but also an explanation of benefits and other medical-related paperwork.

Along with this also comes the question, “What papers do I actually store and which ones do I want to digitize?” Let’s face it: scanning individual bills is a real drag. Some folks I know like to do this. Instead, what papers will you necessarily need to refer to later? Digitize these. My life got a lot simpler when I created one folder for all medical bills, and I scanned only the ones that the FSA (medical savings account) requested that I needed substantiation. It was a lot easier to scan on demand the few that were needed.

And then, of course, is the “opt-in” to go paperless: sending you all correspondence digitally. I wish all providers did this! Unfortunately, not all do.

We all organize things (and concepts) differently, based on our experiences, training, and environment. There is not only the problem of things that could go in more than one place, but also the problem of changing concepts and names. Companies frequently change hands and get bought out by a “Pac-Man” company. The above discussion about using more abstract or general categories will help with this, such as the internet provider which keeps getting bought out by another company, and changing its name does not have to affect where I put it in a folder or file. You can further generalize by using a category of “Utilities” to cover everything from internet to electricity.

Categories are used in data everywhere. The next article in this series will delve into taxonomies and hierarchies, touching on specifically inclusive versus exclusive hierarchies. Stay tuned!

Author Biographies

Bonnie O’Neil is a Principal Data Management Engineer at The MITRE Corporation and is a well-known expert on all phases of data architecture including data catalogs, data quality, business metadata and governance; she has assisted both Fortune 500 companies and government agencies in data management projects for over 30 years. She is a regular speaker and workshop/tutorial leader at many conferences and was the keynote speaker at a conference on Data Quality in South Africa. She is the author of numerous articles in TDAN as well as The Data Catalog: Sherlock Holmes Data Sleuthing for Analytics, published in March 2020, which is her fourth book.

Judy Gerber is an Information Systems Engineer at The MITRE Corporation. She is an experienced practitioner and consultant for data management, IT Service Management, and governance. She has a master’s degree in library science, post-graduate training in Information Science, and certificates as an ITIL® v4 Management Professional, ITIL® v3 Expert, PMP, and Six Sigma. She enjoys guiding government and commercial executives and practitioners in negotiating the complex interfaces between technology and information, data, and knowledge.

MenuMenu

Author Biographies

Share this post

The MITRE Corporation