Why Think About Theory?
Semantics is often considered by business users of information management systems to be an abstract science with little application to their day-to-day problems, but application of research done in
the academic study of meaning may be rewarded by the development of metadata entity names and definitions that convey immediate meaning to users. The advantages of meaningful names can be reflected
in increased efficiency with reduced level of effort for information transfer through and beyond the enterprise.
Terminology research applies linguistic theory to classifying things in the real world. An introduction to some basic terminology principles will help to understand how naming conventions can be
formed. Then, these conventions can be applied to develop well-formed names.
Semantics and Names
Naming of data entities (data elements, value domains, attributes, etc.) in a rational and organized way is an integral part of the metadata management of an organization. Artificial intelligence
researchers try to develop languages that have restricted vocabularies, rules and constraints so that their meanings may be easily interpreted by both machine and human intelligence. These are
called controlled languages. Naming conventions are sets of such rules applied to names. Unlike natural language names, which have evolved from many influences so that any particular name may or
may not describe the thing named, the goal for metadata naming is to have maximum clarity and transparency of meaning, combined with concision, with minimal effort of interpretation by the end
Concept systems consist of sets of concepts ordered according to the relationships among them [ISO 108]. These can be as simple as ordered lists or keywords, or as complicated as taxonomies and
ontologies. A data model is the most common example of a concept system used by data managers. It is the primary source of the components used to form rational data entity names.
A concept is a unit of knowledge created by a unique combination of characteristics [ISO 108]. There are two types of concept:
- A general concept corresponds to two or more objects that form a group by reason of common properties;
- An individual concept corresponds to only one object.
Relationships among concepts in concept systems provide clues to structure names, which may then be codified in naming conventions. Some of the relationships defined in ISO 1087-1 are:
Hierarchical relation – a relation between two concepts, which is either generic or partitive.
- Generic: the definition of one concept includes that of the other and at least one additional distinguishing characteristic (also known as an IS-A relationship – e.g., an employee is a
- Partitive: one of the concepts constitutes the whole and the other a part of that whole (also known as a PART-OF relationship – e.g., a street name is part of a mailing address).
Associative relation – a relation between two concepts having a non-hierarchical thematic connection by virtue of experience – e.g., “cost” and “amount.”
A designation is the representation of a concept by a sign which denotes it [ISO 108]. Two ways to categorize a designation are shown in Figure 1:
Designation by kind. This designation sub-type consists of three entities that can be used in a process to develop well-formed names:
- A term is a verbal designation of a general concept in a specific subject field (“Employee”).
- An appellation is a verbal designation of an individual concept (“French”).
- A symbol is a visual representation of a concept (“$”).
The three parts of designation by kind can be used as building blocks, guiding development of semantic rules to construct names that convey meaning to human users, as part of a naming
convention. Together with rules concerning relationships among the components and those concerning the appearance of the names, they can be employed to form names by which information
about the data is expressed, in a simplified but still understandable grammar compared to natural language. Ideally, the names resemble summaries of the formal definition of the information being
Designation by intended use. Designators of this sub-type may not consist of names that are meant to convey meaning to a human user. Their primary use is to identify, locate or
refer to a piece of data for use by software or other automated service. As such, they may be cryptic or unintelligible to a naïve user.
Figure 1: Designation and Its Components
This structure can be adapted to the process of developing metadata entity names. In this article, the term name refers to any result of the application of a process involving the
three parts of designation by kind.
When naming classes of objects, terms for general concepts are preferred. Appellations and symbols are used as part of a name in combination with one or more terms, when a name
contains more than just a term. Appellations may be used to name individual concepts. The use of symbols as sole name components should be avoided.
Relationships as defined above are used to determine the relationships of components of a name. These are applied to the semantic and syntactic rules of a naming convention.
Enterprises that have developed a data model have a major tool for developing a rational system of names. This provides a firm basis for collecting and organizing metadata. The components of a
traditional data model may be translated into meaningful information. The semantic information contained may be collected from anywhere in an enterprise’s area of interest. Names can then be
developed using the components of the model as building blocks of name parts.
Using a model for metadata, such as the conceptual metamodel depicted in Figure 2, users can store metadata about classifying, naming, identifying, defining, and registering information in order to
make it understandable and shareable. Data about sources, usages, and derivation of information can be stored in a readily accessible form. This metamodel is the basis for the registry for the
standard described in ISO/IEC 11179, Information technology – Metadata registries [ISO 111].
Using a conceptual metamodel allows relationships among differing representations and value sets of the same information to be mapped together in one place. This is useful, for instance, for
tracking the source of the XML objects generated for interchange back to the original usage (information which tends to get lost as XML structures tend to focus on data syntax but not semantics or
other kinds of metadata), and documentation of other usages of that information within an organization. This information can then be used to avoid redundancy and reprocessing of information.
The metamodel components can be used in the development of entity names. A structure is developed in which higher-level component names are used to construct the lower-level names. Relationships
among the components are reflected in the names, contributing to rationalization of name development and understandability.
Figure 2: Conceptual Metamodel
The proliferation of names has many causes. Each application of data has a unique set of requirements and restrictions that constrain the name used in that application. Determining the semantics of
names is part of a broader issue of getting computers to “understand” meaning.
Since a name is a non-unique form of identification for a metadata entity, a unique identifier must also be associated with the registry entry. This is one of the several means by which a populated
metamodel can maintain a complete set of metadata, including all names in all contexts of applications in which the entity is used, and all sources and targets of an entity used in data
Polysemy. Except within a controlled namespace, there is no guarantee of name uniqueness. Thus the possibility that two or more different data entities may use the same name must
also be accounted for and controlled in the registry.
Synonymy. In a metadata registry, one name may be designated as the “enterprise name,” derived by describing the content of a metadata entity in a structured way, using a set
of rules, i.e., by application of a formalized naming convention. Other names for the same data entity may occur in any context. For example, these may be:
- Software system names
- Programming language names
- Report header names
- Data interchange (e.g., XML) names
- Names in other natural languages
They may have varying levels of rigor applied to their formation and usage. The collection and display of all names used by any one metadata entity is a major strength of the metadata registry. The
process of deriving names from concept systems and arranging semantic components with a naming convention forms a set of consistent, meaningful enterprise names. Names from other contexts, which
may or may not have been formed with naming conventions and therefore may have little or no semantic content, are collected and related to the enterprise name, thus contributing in a valuable way
to enterprise data management.
Applying the principles developed by the terminology research community lets us relate the meanings of objects to the development of their names in a structured way. A name that conveys information
about a business object is an advantage to the understanding of applications across an organization, when all usages can be mapped to a name that anyone can understand, and names can be developed
using sets of rules anyone can utilize.
- [ISO 704] ISO 704:2000, Terminology work – Principles and methods, International Organization for Standardization, Geneva
- [ISO 108] ISO 1087-1:2000, Terminology work – Vocabulary – Part 1: Theory and application, International Organization for Standardization, Geneva.
- [ISO 111] ISO/IEC 11179:2003, Information technology – Metadata registries (MDR) – Parts 1-6, International Organization for Standardization, Geneva. Available for