There is a great deal of talk in our industry about the importance of having common, standard data semantics and language and the value this brings.
However, I think one of the greatest obstacles in achieving this is what I call data ‘mine’ing.
I am not talking about ‘data mining’ meaning, “the process of collecting, searching through, and analyzing a large amount of data in a database, as to discover patterns or relationships.” [i]
I am talking about data ‘mine’ing, defined as “acting in a manner where this data is mine.” [ii] Another definition may be “behaving in a way where one claims exclusive ownership over information and there is resistance to seeing data the same way, sharing, or integrating this data.” You know, like in the Finding Nemo video clips where the seagulls are all yelling, ‘mine, mine, mine, …’
For example, specific applications usually define their terms and definitions based upon their specific needs. In my experience, applications often resist standardization of terms definitions and are focused on delivering their project, with their data, on time, within budget and meeting their requirements. So, when a data science project (or any other effort) is working with their data, and a data governance steward tries to help the project use standardized semantics for usage of customers, products, parts, or any other terms, there is often conflict.
Our information technology industry continues to develop ‘data silos’ based on this prevalent mindset of data ‘mine’ing and this issue is exponentially increasing.
What can we do?
Let’s look at this first from a technical point of view and then from a human behavior perspective.
Technical Perspective: Universal Data Semantics
What are ‘Universal Data Semantics’?
Let’s start with the term ‘universal.’ The word ‘universal’ is often defined as ‘holistic’ or ‘generally applicable.’ For example, dictionary.com defines it as “of, relating to, or characteristic of all or the whole” or “applicable everywhere or in all cases; general.” [iii]
‘Semantics’ deals with understanding the meaning of things. Therefore, Universal Data Semantics has to do with having holistic and generally applicable meaning for data. In other words, data is commonly and consistently defined.
Based on decades of experience, I have found that there are ‘universal’ concepts, ideas, and ways of structuring, maintaining, passing, and using data that are based on common language, standardized designs, and consistent data semantics (i.e., Universal Data Semantics).
Technically, if we can build systems using similar semantics and understanding, then systems are more likely to be integrated. It seems that many IT systems are being designed using a great variety of semantics and it would benefit us if we could use ‘universal’ data semantics. In today’s world, most systems are like the ‘Tower of Babel’ story where we are speaking many different languages, using widely differing semantics, and thus many applications are not integrated and this creates chaos, difficulty maintaining systems, misunderstandings and unpredictable or erroneous results.
The following describes some example of Universal Data Semantics.
Every project that I have worked on for decades deals with people and organizations or other types of parties (parties may also include automated agents) that play various roles such as customer, supplier, employee, partner and so on. These are the actors that do things. There is a great amount of information maintained on these parties such as contact information, objectives, relationships, demographics, etc.
There is a foundational principle about Universal Data Semantics: ‘Even though they exist, people argue about them.’ For example, people may say, ‘Yes, all organizations have parties, but we have so many different words for this and some want to call them actors, entities, associates, business partners, individuals, businesses, or any number of other terms.’ While we may use different words or terms, we could agree that there is a ‘universal’ concept for this important aspect of systems dealing with the entities that ‘do’ things, whether we call them parties, actors, entities, or something else. If we can agree on this, we can start classifying and structuring data related to this major data area, domain, or category.
Then another ‘universal’ concept is that for an organization to exist, it must offer something. We may think of this as a ‘product’ data area and this is where we may include different offerings, prices, costs, and other data related to the offerings. Similarly, we may have different words regarding this type of data, and we may call it a product, offering, item, solution, service, good, or something else. There may be subtleties in these definitions and semantics that can have large impacts on the quality of the data being produced and used. Or there may be types of products that could be either goods (tangible items), services (work being done), or solutions (combination of goods and/or services). While there are a great number of ways to refer to this type of data area, a major portion of systems involves data on what is being offered (i.e., a Universal Data Semantic).
A core aspect of business is that parties make commitments, often involving products, and there is a great deal of information on these commitments which may also be called orders, agreements, or contracts. This is another example of a ‘universal’ concept in business.
As we progress through most organizations’ process, there are many other ‘universal’ concepts. After there is a commitment of some kind, there is naturally a delivery of the commitment and this may involve logistics/shipping information or alternatively there may be work that is performed regarding this commitment. Other ‘universal’ business concepts are that there are requests for payment (invoices) and payments, accounting data, human resources data, and information systems data.
Different industries may refer to these ‘universal’ concepts differently. For instance, the ‘commitment’ data area for insurance may be about ‘policy’ information, while the commitment for a professional services organization may refer to this as ‘engagement’ information.
What I just described is a very tiny subset of
‘universal’ data semantics and concepts at a high level just to illustrate that
there are ‘universal’ constructs that could help in our integration efforts. There
are so many more examples of ‘universal’ data semantics and if you would like
to see more, please refer to many other books and publications that I (or
others) have written over the decades on this topic. [iv]
The point is that if we could collaborate and develop systems around ‘universal’ data semantics, it would create tremendous value in terms of consistent ways that we collect, store, pass, use and interface applications.
Imagine a world, where there is a common language for
systems and we work collaboratively with similar semantics, making systems
interfaces and designs so much more understandable, maintainable, and usable.
But is this possible?
Behavioral Perspective: The Human Factor and Zen
The greatest barrier to this vision of ‘Universal Data Semantics’ that we, as people, see ourselves as separate.
Data silos come from people silos. This is the root of the systems integration issue.
Thus, when we see ourselves as separate, the above technical
argument falls apart. People gravitate to their own way of seeing things and
people want to be right. ‘Oh, this is your idea and I want to use my
idea or someone else’s idea.’ Two major factors in differing semantics deal
with different levels of generalization and the use of different terminology.
For example, there are differences in opinions that this ‘universal’ concept of
‘party’ is too generalized, and it should be more specific, for example, it
should be categorized as information about people and organizations. Others say
that this should even be more specific and there are separate data areas for
customers, suppliers, partners, employees, and so on. Then others disagree on
terms, for example, some want to use the term supplier while others use the
It is appropriate to appreciate that there are different perspectives. Yes, we are separate beings and we see things differently, but we are also completely interconnected and interdependent.
Zen and many other philosophies have attempted to shed light on the illusion that we are separate. Zen (which is a Japanese word that means ‘awareness’) can help through various processes regarding deconstructing our egos and moving us towards collaboration.
Likewise, many scientists have attempted to dispel the false assumptions of separation. For instance, Albert Einstein wrote:
“A human being is a part of the whole called by us, ‘universe’ (i.e. we call this the Universe), a part limited in time and space. He experiences himself, his thoughts and feeling as something separated from the rest, a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from this prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty.”
And, just like this optical delusion of separation has caused pain in so many aspects of life, it creates chaos, disintegration, and inefficiencies in our information systems.
So, how can we move towards a more collaboratively and in an integrated fashion using ‘Universal Data Semantics’ where we are speaking a common language?
I have seen some progress in organizations by working on various human factors dealing with collaborative methods. These include deploying organization change, management principles, frameworks, and tools for creating shared purpose/vision/values, understanding motivations, developing trust and transparency, improving communications skills, and effectively managing conflict.
The bottom line is that it is useful to have possible ‘universal’ data semantics to use and try to cultivate consistent, standardized semantics, taxonomies, and data constructs. However, what is also important is that we deeply realize the truth that we are not islands, that we are deeply connected, and that we move our intentions, actions, and outcomes from data ‘mine’ing to data ’ours’ing.
So, I will leave you with this challenge. The next time,
that you are involved in a discussion regarding common data semantics, after
sharing your perspective, deeply appreciate alternative perspectives, and then
let go of any ownership, righteousness, or data ‘mine’ing about your own
perspective, in order to collaboratively come to an understanding with others
of the ‘Universal Data Semantic’.
[ii] ‘Data “Mining” to Data “Ours”ing’, July 2007, https://9d013f80-4cf4-4ebd-bcd6-b644f452c6e3.filesusr.com/ugd/592292_4958fd78593d49e5b330fdeb7ed445f3.pdf