Published in TDAN.com July 2006
[This is the second of what now seem to be three articles discussing the new /old ideas of semantics and ontology and how they affect the way we analyze data. The first article introduced the main concepts, while this one will show an example of the first steps in converting a data model to the web ontology
language, OWL Specifically, it is concerned with defining entity classes (classes) and attributes (datatype properties). The next issue will cover relationships.]
Introduction: Our story so far…
As presented in part one of this series, an ontology in ancient Greek philosophy was the branch of metaphysics “concerned with identifying, in the most general terms, the kinds of
things that actually exist.”[1] In modern language, an ontology is simply a catalogue of the things known to exist
-
in a domain of interest (a particular context),
-
with rules governing
-
how terms can be combined into valid statements, and
-
how sanctioned inferences that can be made.
-
In that article, we learned that a data model
-
is a kind of ontology
-
that begins by defining categories of data
-
with established (directly or indirectly) rules for collecting data in those categories
-
and that is organized for presentation to humans.
Moreover, we learned that a document in an ontology language
-
also represents an ontology
-
but it begins by identifying instances of actual data,
-
it then classifies those instances,
-
and it is organized for processing by computers, so that they can make inferences from it.
Moreover, the point of view for developing data models and ontologies in ontology languages are different. Data models (and the databases derived from them) are concerned with filtering data
coming in, while the tools that manipulate ontology languages are concerned with manipulating whatever data they find and putting out conclusions. The fact, however, is that the underlying
structures are largely the same, and they can be mapped from one to the other. That is what we will do here.
The ontology languages currently in use are based on XML.[2] Each consists of a pre-defined set of XML tags. The languages have evolved over time, with each generation built on the previous
one. The first was the resource description framework, otherwise known as “RDF”. This provided the simple ability to describe sentences as subjects, predicates, and
objects, without the ability to support any constraints or inferences. This was expanded to “RDF Schema” to provide for the distinction between classes and instances. Finally, the
web ontology language or “OWL”,[3] adds extensive features for drawing inferences, and will be the subject of this paper.[4]
Starting Point: A Taxonomy
One point that should be made is that a particular kind of ontology is a taxonomy. In the ontology world, anything can be related to anything else on a many-to-many
basis. A taxonomy is a special case where each member of a particular (sub-type) class is, by definition, also a member of a single parent (super-type)
class. All attributes of the super-type class are also attributes of every sub-type class in question, although additional attributes can distinguish one sub-type class from another.
An example of a taxonomy, represented as a Barker/SSADM data model, is shown in Figure 1.
Figure 1: Taxonomy
Here each box represents an entity class (simply “class” in the ontology world), while a box’s being inside another box means that the entity class it represents is a sub-type of the entity
class represented by the outer box. In this example, the concept of geographic location (a bounded area on the Earth) is subdivided as
follows:
-
geopolitical area – a geographic location in two dimensions whose boundaries are defined by law or
treaty. -
city – the boundaries of a municipality, incorporated under state law.
-
county (or parish) – the boundaries of a formal division of a state.
-
state (or province) – the boundaries of a formal sub-division of a country.
-
country – the recognized boundaries of an independent nation.
-
administrative area – a geographic location in two dimensions whose boundaries are defied by a company
or other organization. -
postal area – an administrative area whose boundaries are defined by a national post office.
-
geographic telephone aea – an adminstrative area, defined by a telephone company consortium and
identified by an area code that is defined by geography. -
natural area – a geographic area in two dimensions whose boundaries are defined by natural phenomena,
such as a lake boundary or a habitat. -
surveyed area – a geographic area in two dimensions whose boundaries are set by direct measurement.
-
other geographic area – a geographic area in two dimensions that is not one of the above.
To convert this to an ontology for computer processing, it is necessary to render it in a particular form of XML. There are several software packages available for creating OWL ontologies.
Most noteworthy for being free to anyone who wants it is Protégé (http://protege.stanford.edu/download/release/full/).[5]
Others, such as Cerebra are a little easier to manipulate, but they cost money, and once you understand what’s going on, Protégé works just fine.
Defining Classes
The first step in converting the data model to OWL is to define the classes and their relationships to each other.
The Protégé screen for specifying classes and sub-classes is shown in Figure 2.
In the upper left is an explorer-like tree that simply lists the classes and their sub-classes. Note that the ultimate super-class for all ontologies is “owl:Thing”.
Note the classes derived from the above model listed in hierarchical fashion.
Protégé creates two files:
-
The .prj file that contains control information about the Protégé session.
-
The .owl file that contains XML describing the ontology.
Figure 2: Protégé Classes
The XML file begins with a standard set of header information:
The tag encompasses the entire ontology. The tag is at the end of the script. This definition begins with the definition of the namespaces that define both the
“Geography” ontology as well as RDF, OWL and other elements of the languages themselves. The tag specifies that this document is, in fact an ontology.
Next are the class definitions. First geographic location with its associated comment:
Note that these and all other definitions discussed in this paper are inside the tags above. Also note that geographic_location, like all classes
that are not otherwise specified as a sub-type, must be defined at least as a sub-type of thing.
Next come the sub-types of geographic_location (for example, geopolitical area):
This is followed by[6] definitions
of the classes administrative_area, natural_area, surveyed_area, and other_geographic_area. Also included
are the sub-types of each of these sub-types, such as city:
First Constraint: “Disjoint”
OWL by default does not assume any constraints. To define something as a sub-type does imply that it shares attributes with its super-types, but there is no requirement either that the
sub-types be exhaustive (every instance of a super-type must be an instance of a sub-type) or that they be disjoint (no instance of a super-type can be an instance of more than one sub-type).
In fact there is no assumption of disjointedness between any pair of classes. There is the ability to specify this explicitly, however.
The Protégé screen for doing this is shown in Figure 3
Figure 3: Protégé Disjointedness
In the lower right corner is a section on “Disjoints”. You begin by selecting a class in the explorer tree (in this case geopolitical_area), and
pressing the second button from the left in the “Disjoint” box. This adds it to the list shown. You can simply add other classes using the
same button, or you can press the third button from the left to select “all siblings”. It then asks if you want all the siblings of this sub-type to be disjoint from each other or only
disjoint from this class. Answer the question, and all the siblings are added (in this case, natural_area, other_geographic_location, and
surveyed_area).
The disjoint constraint is then added to the XML for the class definition. For example:
Datatype Properties (Attributes)
The next step is to add attributes to the model. Back in Figure 1, attributes were
shown for geographic location (“ID”, “Name”, “Description”), country (“Telephone Country Code”),
and postal area (“Postal (ZIP) Code”). With OWL, first the attributes are called datatype properties, and are defined
independently of any classes they might belong to.
Note, however, that in our example, “Name” is a reserved word, so it cannot be a datatype property. For this reason, we will have to use “Geographic_Location_Name”, which is not a bad
choice. Also, parentheses are not allowed, so we will simply call it “Postal Code”, and in the comment annotate that in the United States it’s called “ZIP” code.
In Protégé, there is a separate tab for defining properties, and a portion of it is shown in Figure 4. Note that in addition to the properties we have added are a set of
properties that Protégé makes available by default-“broader”, “CUI”, etc. These are discussed in the advanced course.
Figure 4: Defining Properties
Datatype properties show up in XML, looking like this:
Figure 5 shows Protégé back on the Classes tab, where properties are assigned to individual classes. This is done in the lower right corner, in the “Properties” box. The
third button from the left allows you to assign classes as “domains” for the properties you previously defined on the other tab.
This in turn updates each property definition in the XML, adding a line identifying a domain for the property:
(Note that the same datatype property may be assigned to more than one domain class.)
It is important to understand that the word domain here has a very different meaning than what we are used to in the relational world. Where, in relational terms, a
domain is a set of constraints on an attribute or column, here it refers to the class that a datatype property applies to.
Again, there are no constraints on these properties. If you want to add the relational constraint that the property may have only one value per instance of the class, press the fifth button
from the left in the “Properties” section, which will bring up a window to allow you to specify the “maximum cardinality” of each property. This will result in the symbol “plus or
minus” plus “1” to appear under the property. It also means that an entry is added to the “Assorted Conditions” box.
Figure: 5 Property Examples
Note that constraints only apply to properties as they are used for classes. Specifically, they show up in the XML for the class definition, not in the property definitions. For
example:
Here the class geographic_location, in addition to being a sub-type of the generic OWL class thing, is also
a sub-type of a generic class called restriction. That is, all instances of geographic_location are
also instances of things that are governed by a specified set of restrictions (in this case, all things with a cardinality of “1”). This makes sense when you think about it, but it’s a
little hard to get your brain around it initially. The definition of each restriction involved includes specifying the property being restricted-“onProperty” (in this case,
“Geographic_Location_Name”)-and the restriction itself-“maxCardinality” (in this case, “1”).
Note that bringing up the class postal area so that it can receive the property “Postal Code” shows that it already inherits the properties of
geographic location. See Figure 6.
Figure 6: Adding Postal Code
[This seems to be enough for one article. Chapter Three in the next issue will discuss relationships.]
[1] G. Kemmerling, Philosophical Dictionary. http://www.philosophypages.com/dy/o.htm#onty.
[2] For the nickel tour of enough XML to understand this article, see part one of this series, http://www.tdan.com/i036ht04.htm.
[3] You may wonder why the “Web Ontology Language” has the acronym “OWL”. It seems that in Winnie the Pooh, Owl imagines that his name is spelled “WOL”, until his
friends correct him. Here, the World Wide Web Consortium (W3C) decided simply to start with the correct spelling.
[4] There are in fact three flavors of OWL: OWL Lite, OWL DS, and OWL Full. Describing the distinctions among them is beyond the scope of this paper. We will here be using OWL
DL.
[5] Note: You have the choice of downloading the Java run-time environment, if you don’t already have it, or not. During installation, also, be sure to specify
“Basic+OWL”.
[6] Well, ok, it isn’t “followed by” at all. Protégé has a way of completely scrambling class and other definitions. Each definition is complete from lead tag to
ending tag, but as units, sequence is not significant, so Protégé does not worry about it. Moreover, even after laboriously re-sequencing things the first time you open and save an
ontology, it is again scrambled. I know, this is supposed to be only for computers to read. But people do have to read it to understand and evaluate it. For purposes of this exposition, I have
re-sequenced things.