[This is the third of three articles discussing the new/old ideas of semantics and ontology and how they affect the way we analyze data. (See Part One: An
Introduction to Ontologies, and Part Two: Converting Data Model Entity Classes And Attributes To Owl. The first article introduced the main concepts.
The second showed an example of the first steps in converting a data model to the web ontology language, OWL Specifically, it was concerned with defining entity classes (classes) and attributes
(datatype properties). This one completes the job by addressing relationships (object properties).]
Introduction: Our story so far…
As presented in part one of this series, Part One: An Introduction to Ontologies, an ontology in ancient Greek philosophy was the branch of
metaphysics “concerned with identifying, in the most general terms, the kinds of things that actually exist.” In modern language, an ontology is simply a catalogue of the things known to exist
- in a domain of interest (a particular context),
- with rules governing
- how terms can be combined into valid statements, and
- how sanctioned inferences that can be made.
In that article, we learned that a data model
- is a kind of ontology
- that begins by defining categories of data
- with established (directly or indirectly) rules for collecting data in those categories
- and that is organized for presentation to humans.
Moreover, we learned that a document in an ontology language
- also represents an ontology
- but it begins by identifying instances of actual data,
- it then classifies those instances,
- and it is organized for processing by computers, so that they can make inferences from it.
So, the purpose and approach to developing data models and ontologies in ontology languages are different. Data models (and the databases derived from them) are concerned with filtering data coming
in, while the tools that manipulate ontology languages are concerned with manipulating whatever data they find and putting out inferred conclusions. The fact is, however, that the underlying
structures are largely the same, and they can be mapped from one to the other. That is what we will do here.
The ontology languages currently in use are based on XML. [For the nickel tour of enough XML to understand this article, see part one of this series, http://www.tdan.com/view-articles/5025/.] Each consists of a pre-defined set of XML tags. The languages have evolved over time, with each generation
built on the previous one. The first was the resource development framework, otherwise known as “RDF”. This provided the simple ability to describe sentences as
subjects, predicates, and objects, without the ability to support any constraints or inferences. The second, the web ontology language or “OWL”, is more extensive, and
will be the subject of this paper.
There are in fact three flavors of OWL: OWL Lite, OWL DS, and OWL Full. Describing the distinctions among them is beyond the scope of this paper. We will here be using OWL DL.
(You may wonder why the “Web Ontology Language” has the acronym “OWL”. It seems that In Winnie the Pooh, Owl imagines that his name is spelled “WOL”, until his friends correct him. Here, the
World Wide Web Consortium (W3C) decided simply to start with the correct spelling.)
Figure 1 shows an update version of the taxonomy from the previous article, Part Two: Converting Data Model Entity Classes And Attributes To OWL. In that
article we converted the entity classes and attributes to OWL. In this case we have added some relationships. The approach used here is particularly powerful in dealing with a company’s semantics,
because each relationship is a simple assertion about the business. In this example, we have three relationships and six roles:
- Each city must be located in one and only one state; each state may be the location of one or more cities.
- Each county must be located in one and only one state; each state may be the location of one or more cities.
- Each state must be located in one and only one country; each country may be the location of one or more states.
(Yes, this example is United States-centric. “Province” fits the example just as well.)
Note that there are several constraints inferred here:
- City, county, state, and country are disjoint. That is, “Springfield” cannot be both a city and a state.
- A city or a county must each be located in at least one state , and each state must be located in at least one country.
- A city or county referred to cannot be located in more than one referred to state. Similarly, a state referred to cannot appear in more than one country referred to. (“Referred to” means that
a name shared by two cities (like “Springfield”) could be in different states. The same city cannot be in more than one.)
- A city (or a county) must be located in exactly one and only one country.
There are in fact two different ways to represent relationships in OWL. The first calls it an objectProperty, with the classes it relates hard coded as characteristics of
the relationship. The second simply defines it as a property that can be re-used.
Relationships – Version 1
The first approach to relationships includes the definition of the two related classes in its definition. As with attributes (datatypeproperties), an objectproperty
is first defined before it can be used. In Protégé, this is done on the properties tab, as shown in Figure 2.
As with datatype properties, a new one is specified in the explorer tree on the left (not shown here) and a name is then specified. (Good practice also requires a comment.) Because this property
will not be re-used, the name should include the target class. In this example, this will be a property of state, so the name is “located_in_country”. “State” is then specified as the
property’s domain and country is specified as its range.
Note that “Functional” is checked. This means that each instance of the first class (in this case the Domain STATE) may be related to no more than one instance of the relationship’s target
class (in this case the Range “COUNTRY”). This is equivalent to a maximum cardinality constraint of “1”.
Move now to the Class tab, as shown in Figure
In this case, the objectProperty “located_in_COUNTRY” is already shown as a property of state.
The XML for this property is:
Note that by making the object property “functional” we are constraining it to apply to only one country for each state. An alternative way to do this is to specifically define a
restriction on any class that is its domain. This is a strange way to assign properties of properties, but it involves asserting that a class is a sub-type of all
possible classes that have that property. Specifically, the class is defined as a sub-type of a restriction.
In this case, STATE is a sub-class of two RESTRICTION classes, where one RESTRICTION consists of the objectProperty “located_in_COUNTRY” and minimum cardinality (for that objectProperty) of
1, while the second RESTRICTION consists of the objectProperty “located_in_COUNTRY” and maximum cardinality (for that objectProperty) of 1.
The resulting definition of STATE, then, looks like this:
Relationships – Version 2
The second way to define a relationship does not require that it be linked to a range. This means that “located_in” is simply defined as an objectProperty that can be reused.
Before starting with this approach, however, note that when two or more entity classes are related to a single instance of another entity class, logically, this is equivalent to saying that the two
constitute a third class that is the “union” of the two. This is defined as the following:
Now this union class can participate in relationships.
As before, the relationship (object property) is considered a property of one of the classes. Here, object properties are treated like the cardinality constraints just described. Here, the class is
asserted to be a sub-class of all those classes that have that object property. This takes a little doing to get your head around, so I’ll say it again.
A class that has a particular object property is considered to be a sub-class of the class of all classes that have that object property. If a wheel is part of a bicycle, wheel is considered to be
a sub-class of all objects that are part of a bicycle. Note that this includes handlebars if they have the same “part of” relationship that wheels have.
The second approach to relationships is not currently supported by Protégé, so we’ll have to code it from scratch.
In the XML, as before, first the relationship (object property) is defined in isolation:
Then we define that the class city is one of those kinds of things that’s located somewhere. Specifically, it is a sub-class of a restriction-the restriction that is defined by the property
(“onProperty”) “located_in” that takes all its values from (“allValuesFrom”) state. That is, each city has the property located in where all possible values for the object of located in are
contained in the class state.
Notice that in Figure 4, we’ve added the entity class physical site, that must be located in one city and in one postal area. If we are going to convert this to OWL, we’ll have to use the second
approach to relationships, that will allow us to re-use the located in relationship.
That is, we will define physical site to be a sub-type first of all classes that have the object property “located in” and which take all their values for this property from city, and then to be
a sub-type first of all classes that have the object property “located in” and which take all their values for this property from postal area.
This paper, even in three chapters, has not come close to describing OWL in all it’s variations. It does prove that a data model can be expressed in OWL, although constraints assumed by data
modelers must be described explicitly. Partly for this reason, OWL can describe things that data modeling cannot. It can describe a world where these constraints do not apply. This is useful for
making inferences from a large body of data, but it is not useful for building commercial systems to support operations.
Indeed it would be nice if OWL had the ability to express more constraints than we asked for. One shortcoming of data modeling is that it is not expressive enough to describe many business rules.
These are additional constraints upon the data structure. These are the opposite of the “open world assertion”.
Note that as expressive as OWL is to the computer, it is pretty hard going for human beings. In discussing the structural assertions that describe a company, the graphic approach afforded by data
modeling is the clear winner. A utility that could then convert the model to OWL and take advantage of inference engines’ ability to make inferences from what was agreed to would be a very useful
thing indeed. It is unlikely that anyone will invent a utility to go the other way.
 G. Kemmerling, Philosophical Dictionary. http://www.philosophypages.com/dy/o.htm#onty.
* As shown here, the XML is relatively well-organized and readable. In the actual file, be warned that Protégé puts the chunks in thoroughly
random order. It is not a pretty picture.