|
Data Modeling, RDF, & OWL - Part Two: Converting Data Model Entity Classes and Attributes to OWL
Published: July 1, 2006
Published in TDAN.com July 2006 [This is the second of what now seem to be three articles discussing the new /old ideas of semantics and ontology and how they affect the way we analyze data. The first article introduced the main concepts, while this one will show an example of the first steps in converting a data model to the web ontology language, OWL Specifically, it is concerned with defining entity classes (classes) and attributes (datatype properties). The next issue will cover relationships.] Introduction: Our story so far...As presented in part one of this series, an ontology in ancient Greek philosophy was the branch of metaphysics "concerned with identifying, in the most general terms, the kinds of things that actually exist."[1] In modern language, an ontology is simply a catalogue of the things known to exist
In that article, we learned that a data model
Moreover, we learned that a document in an ontology language
Moreover, the point of view for developing data models and ontologies in ontology languages are different. Data models (and the databases derived from them) are concerned with filtering data coming in, while the tools that manipulate ontology languages are concerned with manipulating whatever data they find and putting out conclusions. The fact, however, is that the underlying structures are largely the same, and they can be mapped from one to the other. That is what we will do here. The ontology languages currently in use are based on XML.[2] Each consists of a pre-defined set of XML tags. The languages have evolved over time, with each generation built on the previous one. The first was the resource description framework, otherwise known as "RDF". This provided the simple ability to describe sentences as subjects, predicates, and objects, without the ability to support any constraints or inferences. This was expanded to "RDF Schema" to provide for the distinction between classes and instances. Finally, the web ontology language or "OWL",[3] adds extensive features for drawing inferences, and will be the subject of this paper.[4] Starting Point: A TaxonomyOne point that should be made is that a particular kind of ontology is a taxonomy. In the ontology world, anything can be related to anything else on a many-to-many basis. A taxonomy is a special case where each member of a particular (sub-type) class is, by definition, also a member of a single parent (super-type) class. All attributes of the super-type class are also attributes of every sub-type class in question, although additional attributes can distinguish one sub-type class from another. An example of a taxonomy, represented as a Barker/SSADM data model, is shown in Figure 1.
Here each box represents an entity class (simply "class" in the ontology world), while a box's being inside another box means that the entity class it represents is a sub-type of the entity class represented by the outer box. In this example, the concept of geographic location (a bounded area on the Earth) is subdivided as follows:
To convert this to an ontology for computer processing, it is necessary to render it in a particular form of XML. There are several software packages available for creating OWL ontologies. Most noteworthy for being free to anyone who wants it is Protégé (http://protege.stanford.edu/download/release/full/).[5] Others, such as Cerebra are a little easier to manipulate, but they cost money, and once you understand what's going on, Protégé works just fine. Defining ClassesThe first step in converting the data model to OWL is to define the classes and their relationships to each other. The Protégé screen for specifying classes and sub-classes is shown in Figure 2. In the upper left is an explorer-like tree that simply lists the classes and their sub-classes. Note that the ultimate super-class for all ontologies is "owl:Thing". Note the classes derived from the above model listed in hierarchical fashion. Protégé creates two files:
The XML file begins with a standard set of header information:
The tag encompasses the entire ontology. The tag is at the end of the script. This definition begins with the definition of the namespaces that define both the "Geography" ontology as well as RDF, OWL and other elements of the languages themselves. The tag specifies that this document is, in fact an ontology. Next are the class definitions. First geographic location with its associated comment:
Note that these and all other definitions discussed in this paper are inside the tags above. Also note that geographic_location, like all classes that are not otherwise specified as a sub-type, must be defined at least as a sub-type of thing. Next come the sub-types of geographic_location (for example, geopolitical area):
This is followed by[6] definitions of the classes administrative_area, natural_area, surveyed_area, and other_geographic_area. Also included are the sub-types of each of these sub-types, such as city:
First Constraint: "Disjoint"OWL by default does not assume any constraints. To define something as a sub-type does imply that it shares attributes with its super-types, but there is no requirement either that the sub-types be exhaustive (every instance of a super-type must be an instance of a sub-type) or that they be disjoint (no instance of a super-type can be an instance of more than one sub-type). In fact there is no assumption of disjointedness between any pair of classes. There is the ability to specify this explicitly, however. The Protégé screen for doing this is shown in Figure 3
In the lower right corner is a section on "Disjoints". You begin by selecting a class in the explorer tree (in this case geopolitical_area), and pressing the second button from the left in the "Disjoint" box. This adds it to the list shown. You can simply add other classes using the same button, or you can press the third button from the left to select "all siblings". It then asks if you want all the siblings of this sub-type to be disjoint from each other or only disjoint from this class. Answer the question, and all the siblings are added (in this case, natural_area, other_geographic_location, and surveyed_area). The disjoint constraint is then added to the XML for the class definition. For example:
Datatype Properties (Attributes)The next step is to add attributes to the model. Back in Figure 1, attributes were shown for geographic location ("ID", "Name", "Description"), country ("Telephone Country Code"), and postal area ("Postal (ZIP) Code"). With OWL, first the attributes are called datatype properties, and are defined independently of any classes they might belong to. Note, however, that in our example, "Name" is a reserved word, so it cannot be a datatype property. For this reason, we will have to use "Geographic_Location_Name", which is not a bad choice. Also, parentheses are not allowed, so we will simply call it "Postal Code", and in the comment annotate that in the United States it's called "ZIP" code. In Protégé, there is a separate tab for defining properties, and a portion of it is shown in Figure 4. Note that in addition to the properties we have added are a set of properties that Protégé makes available by default-"broader", "CUI", etc. These are discussed in the advanced course.
Figure 4: Defining Properties Datatype properties show up in XML, looking like this:
Figure 5 shows Protégé back on the Classes tab, where properties are assigned to individual classes. This is done in the lower right corner, in the "Properties" box. The third button from the left allows you to assign classes as "domains" for the properties you previously defined on the other tab. This in turn updates each property definition in the XML, adding a line identifying a domain for the property:
(Note that the same datatype property may be assigned to more than one domain class.) It is important to understand that the word domain here has a very different meaning than what we are used to in the relational world. Where, in relational terms, a domain is a set of constraints on an attribute or column, here it refers to the class that a datatype property applies to. Again, there are no constraints on these properties. If you want to add the relational constraint that the property may have only one value per instance of the class, press the fifth button from the left in the "Properties" section, which will bring up a window to allow you to specify the "maximum cardinality" of each property. This will result in the symbol "plus or minus" plus "1" to appear under the property. It also means that an entry is added to the "Assorted Conditions" box.
Note that constraints only apply to properties as they are used for classes. Specifically, they show up in the XML for the class definition, not in the property definitions. For example:
Here the class geographic_location, in addition to being a sub-type of the generic OWL class thing, is also a sub-type of a generic class called restriction. That is, all instances of geographic_location are also instances of things that are governed by a specified set of restrictions (in this case, all things with a cardinality of "1"). This makes sense when you think about it, but it's a little hard to get your brain around it initially. The definition of each restriction involved includes specifying the property being restricted-"onProperty" (in this case, "Geographic_Location_Name")-and the restriction itself-"maxCardinality" (in this case, "1"). Note that bringing up the class postal area so that it can receive the property "Postal Code" shows that it already inherits the properties of geographic location. See Figure 6.
[This seems to be enough for one article. Chapter Three in the next issue will discuss relationships.] [1] G. Kemmerling, Philosophical Dictionary. http://www.philosophypages.com/dy/o.htm#onty.[2] For the nickel tour of enough XML to understand this article, see part one of this series, http://www.tdan.com/i036ht04.htm. [3] You may wonder why the "Web Ontology Language" has the acronym "OWL". It seems that in Winnie the Pooh, Owl imagines that his name is spelled "WOL", until his friends correct him. Here, the World Wide Web Consortium (W3C) decided simply to start with the correct spelling. [4] There are in fact three flavors of OWL: OWL Lite, OWL DS, and OWL Full. Describing the distinctions among them is beyond the scope of this paper. We will here be using OWL DL. [5] Note: You have the choice of downloading the Java run-time environment, if you don't already have it, or not. During installation, also, be sure to specify "Basic+OWL". [6] Well, ok, it isn't "followed by" at all. Protégé has a way of completely scrambling class and other definitions. Each definition is complete from lead tag to ending tag, but as units, sequence is not significant, so Protégé does not worry about it. Moreover, even after laboriously re-sequencing things the first time you open and save an ontology, it is again scrambled. I know, this is supposed to be only for computers to read. But people do have to read it to understand and evaluate it. For purposes of this exposition, I have re-sequenced things. Go to Current Issue | Go to Issue Archive Recent articles by David C. Hay
David C. Hay - In the information industry since the days of punched cards, paper tape and teletype machines, Dave has been producing data models to support strategic and requirements planning for more than twenty
years. He has worked in a variety of industries, including, among others, banking, clinical pharmaceutical research, and all aspects of oil production and processing.
He is the founder and President of Essential Strategies, Inc., a fourteen-year-old consulting firm dedicated to helping clients define corporate information architecture, identify requirements, and plan strategies for the implementation of new systems. Dave is the author of the book, Data Model Patterns: Conventions of Thought, and Requirements Analysis: From Business Views to Architecture. His new book Data Model Patterns: A Metadata Map is a comprehensive schema of metadata from many different perspectives. He has also spoken at numerous international and local DAMA conferences, Oracle user group conferences, and many others.
He can be reached at dch@essentialstrategies.com, (713) 464-8316, or via his company's website at http://www.essentialstrategies.com. |