Data management’s history as the blending of business management and information technology makes it an unlikely candidate to have anything to do with the worlds of linguistics and philosophy. In recent years, however, companies and their systems have become so complex that the task of retrieving coherent information from various parts of an enterprise has become challenging, to say the least. The time has come to examine other disciplines for help. A particularly noteworthy issue is confusion in semantics—the fact that different parts of an organization often use terms in inconsistent ways. Thus, pulling up a coherent view of an organization has become progressively more difficult.
The time has come to address semantics straight on. Bring in the linguists and the philosophers!
The Semantic Web
Tim Berners-Lee invented the World Wide Web in the early 1990s.1 He was pleased enough with its phenomenal success, but he was still dissatisfied: The Web allows anyone to retrieve “pages” from anywhere in the world. The pages can be tagged to put them into categories in order to make it easier to find them, but it was not possible for the computer to see inside and make use of the contents of the pages. As an alternative, he imagined a world-wide “database” where all of the contents of all these pages should be directly searchable. The idea is that the computer network would be able to deal with the “semantics” of the pages. He was looking for the facilities to manage a global ontology.
The group that he and his colleagues formed to manage the World Wide Web was called the World Wide Web Consortium (or W3C). It expanded its charter to include the creation of a framework for expressing ontologies, and statements about objects consistent with these ontologies. This framework is called the Semantic Web. Note that the Semantic Web is but a mechanism for expressing ontologies; the W3C does not itself create the ontologies. Various other groups are creating both core ontologies, as well as domain-specific ontologies in various application areas. (Your humble author hopes one day to convert the models in Enterprise Model Patterns: Describing the World2 into a semantic web ontology.)
“The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases that are connected not by wires but by being about the same thing.”3
As with conceptual entity/relationship diagrams, the techniques used to describe and manage the Semantic Web also are methods for representing the world. To an entity/relationship modeler, however, they are very strange, and take some getting used to. They are both more expressive and less expressive than entity/relationship diagrams.
There are three components that form the basis for the Semantic Web:
- First, every term of interest is given a named home in the cybernetic world. Each describes a resource. It has an identified place (a “uniform resource identifier” or “URI”) on the World Wide Web.
- Second, all terms are presented in what are called “triples”—simple sentences in the form of < subject > < predicate > < object >. This is a form analogous to the form used to name relationships in the Barker/Ellis approach to entity/relationship modeling.
- Third, a succession of languages adds specific terms for describing the semantics of these terms.
- The Resource Definition Framework (RDF) sets the basis for defining words in terms of the web and structuring them in the form of triples (the first and second points above). Its contribution to an ontological language is a term for one thing’s being an “an example of” another thing. That is, the second thing is a type of the first thing.
- RDF Schema then adds concept of class, explicitly distinguishing between classes (or types) of things and instances of those classes.
- Web Ontology Language (OWL) adds, among other things, the ability to recognize that properties of a class may be either attributes of that class or relationships with other classes. Other predicates expand the ability to generate inferences.
Resource Definition Framework (RDF)4
The Resource Definition Framework (RDF) is the foundation of the Semantic Web. RDF recognizes that all language can ultimately be represented as “triples”, in the form:
< subject > < predicate > < object >
Note that in its purest form, there is no difference between the way instances and classes are handled, although both subjects and predicates must be identified resources. The object can be either an identified resource, or a literal.
For example, both of these could be legitimate RDF statements:
- esi:Stephen esi:ownerOf auto:TeslaRoadster.5 (instances)
- esi:Person esi:ownerOf esi:Automobile. (classes)
(The prefixes “esi:” and “auto:” are described further below.)
In one nod to technology, resource names have no spaces. They use “camel case,” where, with no spaces, for subject and object, each word is capitalized. For predicates, only the second and subsequent words are capitalized.
There is one profound characteristic of the semantic web: every term displayed constitutes a resource, directly available on the World Wide Web. Specifically, the name of the resource includes the location of its definition in an ontology that is a site on the Web. This guarantees a single definition for every term used (its resource definition). It does this by associating every term with a “Uniform Location Identifier” (URI).6 The URI is built on the Uniform Record Locator (URL) that identifies the site where the term is found. Specifically, the URI is specialized form of that URL.
As an example, we could imagine that somewhere in cyberspace, there lurks an ontology with the URL http://esi.literal.vocabulary.com/terms. This website would have paragraphs to define “Stephen”, “automobile”, and “ownerOf”. Another site, http://autonames.com might be the place to go for a definition of “TeslaRoadster.” To refer to Stephen, then, with his full semantic web name, you would say http://esi.literal.vocabulary.com/terms#Stephen. The assertions made above would look like this:
This scheme of URIs then allows the three terms in any triple to be from different ontologies (anywhere in the world). It is the responsibility of the modeler to deal with the fact that there may be subtle differences between the definition of “widget” in one ontology and its definition in another ontology, but the ability to choose between them explicitly is incredibly powerful.
Because the structures shown above are very difficult to read, the URI can be abbreviated into what is called a qname or qualified name. In this case, we replace the namespace of the URIs, e.g., http://esi.literal.vocabulary.com/term by a short prefix “esi”. That means that a term such as http://esi.literal.vocabulary.com/terms#Stephen can be represented simply as “esi:Stephen”. Similarly, http://autonames.com could be abbreviated “auto”. This yields “auto:TeslaRoadster”. Thus, the full RDF representation of the sentences above would be:
Note that both the subject and the predicate must be identifiable resources. The object may be either a resource, or it could simply be a literal, like “April 9, 2007.”
Note that certain expressions can be defined by a language to make the triples more expressive. The Resource Definition Framework (RDF) is such a language. For example:
- rdf:type – means “is an example of”, or, more specifically, “is of type”
- auto:TeslaRoadster rdf:type :Automobile.
Each of the websites, with its collection of unduplicated terms is called in the trade, namespace. For purposes of this article, we can assume that http://esi_literal_vocabulary.com is the default namespace, which allows us to further abbreviate “esi:” to be simply a colon. Thus, we have:
- :Stephen rdf:type :Person.
You could, for example, say:
- :Stephen rdf:type :Charlie
It doesn’t make any sense, but you could say it.
It doesn’t make sense because of the implication in the sentences above that the object of the predicate “rdf:type” was in fact a class, not an individual. The problem is that RDF by itself has no ability to identify classes of things.
RDF Schema, an extension to the Resource Development Framework, among other things, adds some terms to allow us explicitly to define:
This in turn allows us to specify, for example:
- :Person rdf:type rdfs:Class
- :Automobile rdf:type rdfs:Class
- :AutomobileBrand rdf:type rdfs:Class
Note that in entity/relationship notations, what is here called “class” is called “entity type”.
That is, “:Person” is an example of (of type) “rdfs:class”, as are “:Automobile” and “Automobile Brand”.
Moreover, we can now define:
- :SportsCar rdf:type rdfs:Class
- :SportsCar rdfs:subClassOf :Automobile
That is, this is how we represent the “sub-type” that was in the entity/relationship diagram above. In addition, we can assert that:
- :TeslaRoadster rdf:type :sportsCar.
…which allows us to infer
- esl:TeslaRoadster rdf:type :Automobile.
RDF Schema extends RDF to enable more explicit manipulation of these properties.
Among other things, we can now say that a property has one or more sub-properties:
For example, if we use the property “:forTheExhilarationOf” in the statement:
- :SportsCar :forTheExhilarationOf :Person.
- :Person :exhilaratedBy SportsCar.8
…then we can also assert
- :exhilaratedBy rdfs:subPropertyOf :ownerOf.
That is, anyone who is exhilarated by a particular sports car must also be the owner of that sports car. Note that in Figure 1, above, the notation did not allow us to assert that. The fact that relationships can have sub-types is a concept not treated at all in most entity/relationship notations (although it is treated in UML, of all places).
A second feature of RDF Schema is that it provides a way to identify and constrain the subject and object as playing specific roles relative to that predicate.
Specifically, the subject can be defined to be the domain of a property (predicate), and the object can be defined to be the range of the property. That is:
- :forTheExhilarationOf rdfs:domain :SportsCar
- :forTheExhilarationOf rdfs:range :Person.
This means that the predicate, “forTheExhilarationOf” can only have instances of :SportsCar as its subject, and it can only have examples of :Person as its object.
In other words, all usages of the expression “for the exhilaration of” must be about people and sports cars. If you observe that a :TeslaRoadster is forTheExhilarationOf :Stephen, the :TeslaRoadster must be a :SportsCar, and :Stephen must be a :Person. Alternatively, if :Dingbat is :forTheExhilarationOf :Stephen, then a :Dingbat must be an example of a :SportsCar.
Thus begins the facility to draw inferences. We’ll be able to do more when we get to more specific kinds of properties in the Web Ontology Language.
Web Ontology Language (OWL)9 10
We’ve already seen some semantic elements from conceptual entity/relationship modeling in RDF and RDFS. Specifically:
- An entity type is a class.
- A sub-type is a sub-class.
- A relationship direction is a property.
- An attribute is a property.
In order to fully represent our entity/relationship model in the Semantic Web, however, we need some features from the Web Ontology Language (OWL).
Properties – DatatypeProperty
Merriam-Webster has two definitions of “property”:
- “a: a quality or trait belonging and especially peculiar to an individual or thing.” (That is, an “attribute” to us data modelers.)
- “b: an effect that an object has on another object or on the senses.” (That is, a “relationship” to another object, to us data modelers.)
OWL is where we encounter the specific kinds of properties that represent attributes and relationships. That is, DatatypeProperty (corresponding with definition “a”, above) is a specific kind of property for describing an attribute.
ObjectProperty, on the other hand, (corresponding with definition “b”, above), is a specific kind of property for describing a relationship. When combined with rdfs:domain and rdfs:range, described above, OWL provides a very specific way of dealing with entity/relationship attributes and relationships–with a twist.
You see, the OWL modeler looks at both of these “properties” of a subject class in very different ways from the E/R modeler.
First of all, the entity/relationship modeler tends to begin with entity classes, and then looks to find their attributes and relationships. Among other things, this means the same attribute and/or relationship name can show up more than once. For example, each time you encounter the need for an attribute “name”, you can add it.
In the semantic world, on the other hand, a property is defined first. It is defined in terms of the thing it is a property of. That is, you may start by specifying a DatatypeProperty (attribute) like “name,” and then you look to see what class it is a name of. There can only be one. Similarly, you can have an ObjectProperty like “anExampleOf”, and then ask what classes are linked by that property. There can only be one set.
So, from our entity/relationship model (repeated below as Figure 1), let’s take note of the fact that the entity type Automobile has the attribute “VIN” (Vehicle Identifying Number). To implement this in OWL, you can assert:
- :Automobile rdf:type rdfs:Class.
- :VIN rdf:type owl:DatatypeProperty;
For an owl:DatatypeProperty, the domain is the class the attribute is of, and the range is describes a data type for the attribute–in this case, “string”. (Standard data types are described in the xsd: namespace.)
By using this approach, you are asserting that it is the nature of the DatatypeProperty :VIN to be an attribute of :Automobile. Among other things, this means that, should you find the statement that…
- :VIN rdf:type (is an example of) owl:DatatypeProperty;
…then you can infer that “:Cleveland” must be an instance of the class :Automobile, since, by definition, :VIN cannot be an attribute of anything else.
Note that in the E/R model that is Figure 1, the attribute “Year” is in both Automobile and Automobile Brand. In OWL this would not be permitted. In fact, in this case, it would probably be better to explicitly make “Year acquired” an attribute of Automobile and “Model Year” an attribute of Automobile Brand. Thus:
- :YearAcquired rdf:type owl:DatatypeProperty;
What this means is that any time the attribute “YearAcquired” appears as a DatatypeProperty, it must be for the class :Automobile.
Properties – ObjectProperties
There are three relationships shown in the entity/relationship model in Figure 1, each with two properties. In OWL, each of these (in each direction) is an owl:ObjectProperty. the first (“each Automobile may be owned by one and only one Person”) is represented by the triples:
- :ownedBy rdf:type owl:ObjectProperty;
That is, “:ownedBy” is an example of an “owl:ObjectProperty” that links :Automobile with :Person.
The property going in the opposite direction could be described by a similar structure. But there is an alternative, the property inherited from RDF Schema disposes of all that verbiage:
- :ownerOf rdfs:inverse :ownedBy.
Among other things, from this triple, you can infer:
- :ownerOf rdf:type owl:ObjectProperty;
The relationship between Automobile and Automobile Brand can similarly be represented by:
- :anExampleOf rdf:type owl:ObjectProperty;
- :embodiedIn rdfs:inverse :anExampleOf
Note that in this case, the relationship name “anExampleOf” can only be about an Automobile and an Automobile Brand. This is because inferences make use of this as a single fact. That a :ProductInstance may also be an example of a :ProductType is not permitted.
For entity/relationship modelers, this is a serious problem.
The idea that attribute and relationship names should be so restricted comes hard to an entity/relationship modeler. (Especially one who uses model patterns a lot.) As it happens there is a solution. Warning: The solution will really bend your brain. Yes, it’s time to learn something new.
A full explanation is beyond the scope of this article. The short version is that in order to allow for duplicate values, you identify the class of all things that have that value. Then you make the entity type in question a sub-type of that class. Thus, for example, you can assert that “:Automobile” is rdfs:aSubClassOf the class of all the things in the universe that are :anExampleOf something, where the “something” involved gets owl:allValuesFrom the class “:AutomobileBrand”. Similarly, “:ProductInstance” is also aSubClassOf the class of all things in the universe that are “:anExampleOf” something, where “something” gets allValuesFrom the class “:ProductType”.
Yes, that’s for another article.
There is nothing in any entity/relationship diagramming notation that precludes using an ontological approach to naming relationships. Information Engineering, UML, and even IDEF1X can all assume the discipline developed by Harry Ellis and Richard Barker for naming relationships. The objective of this approach, however, is not to design a database but to describe the world as it exists. The frame of mind of people using notations for database design often precludes them from taking this ontological approach. In the context of database design or object-oriented programming, such ontological considerations are unnecessary.
To create models that can be the basis for a corporate ontology, however, requires a different frame of mind. (Ontology, after all, was the branch of ancient Greek philosophy that asked the question: what exists?) Moreover, if you have that frame of mind and are prepared to create such an ontology, you will be even better equipped to moving into the brave new world of the Semantic Web.12 13
Creating an ontological information model is but a first step. To delve all the way into the Semantic Web’s approach modeling requires yet another, even more radical, adjustment to the way you think about the “things that exist.”
Note that the Semantic Web is but a mechanism for creating and publishing ontologies. The ontologies themselves will be created by others.
As Yogi Berra might say, “The future just ain’t what it used to be…”
- Tim Berners-Lee. 2000. Weaving the Web (New York: Harper Collins).
- David C. Hay. 2011. Enterprise Model Patterns: Describing the World. (New Jersey: Technics Publications, Inc.)
- W3C. Semantic Web Activity: What is the Semantic Web. Retrieved from http://www.w3.org/2001/sw July 19, 2012.
- W3C. 2004. RDF Semantics: W3C Recommendation 10 February 2004. Retrieved from: http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ July 19, 2012.
- Note that in RDF spaces in phrases are eliminated. This approach has come to be called “camel case notation” and means that second and subsequent words are capitalized. Both “class” names and individual names start iwth an initial capital as well, while property names do not (See below for definition of “property”.)
- Originally, this is a generalization of the World Wide Web’s navigation system using the “Uniform Record Locator” (URL). One problem with this is that it can only deal with Latin characters. For this reason, the “International Record Locator”, based on a larger ASCII character set, will be used more in the future. That will deal with Chinese, Arabic, and Polish.
- W3C. 2004. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10 February 2004. (Section 4). Retrieved from: http://www.w3.org/TR/rdf-schema/ on July 19, 2012.
- Be aware that cardinality constraints can be represented when we get to OWL. They are beyond the scope of this article, however.
- W3C. 2004. OWL Web Ontology Language Reference: W3C Recommendation 10 February 2004. Retrieved from http://www.w3.org/TR/2004/REC-owl-ref-20040210/, July 19, 2012.
- The acronym is not derived from the French or Polish or some other language’s rendering of “Web Ontology Language”. Tradition has it that it is from Winnie the Pooh. It seems Owl got in trouble for misspelling his name as “WOL”. So, the W3C wanted to be sure they weren’t held up for the same kinds of criticism. Well, that’s what they say, anyway…
- A note about punctuation: Here you have three “triples,” but each has “:VIN” as the subject. The semicolons at the end of the first and second lines each indicate that the subject will be reused in the subsequent line. Otherwise, all triples end with a period (.).
- The definitive text on the subject is Dean Allemang and Jim Hendler. 2011. Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL, Second Edition. (Boston: Morgan Kaufmann).
- For an earlier and lengthier description of these and related issues, go to:David C. Hay. 2006. “Data Modeling, RDF, & OWL – Part One: An Introduction To Ontologies”, The Data Administration Newsletter. http://tdan.com/view-articles/5025. April 1, 2006David C. Hay. 2006. “Data Modeling, RDF, & OWL – Part Two: Converting Data Model Entity Classes and Attributes to OWL”, The Data Administration Newsletter. http://tdan.com/view-articles/5001. July 1, 2006.David C. Hay. 2006. “Data Modeling, RDF, & OWL – Part Three: Converting Data Model Relationships to OWL”, The Data Administration Newsletter. http://tdan.com/view-articles/4594. October 1, 2006.