Using Levels of Abstraction to Name Data Elements

Introduction

Naming conventions for data elements are part of the toolset of data administrators. What we call a naming convention is a collection of rules, which, when applied to data, results in a set of data
elements named in a logical and standardized way. These names will inform the user about the contents of the data value domain (the set of possible valid values for a data element), and the usage
of the data element, in a concise manner[ISO P]. When gathered into a repository or data registry, this collection of meta data assists users to achieve efficient use and reuse of data while
maximizing understanding of information both within and outside their organization.

The International Standard ISO 11179, Information Technology-Specification and Standardization of Data Elements describes a set of rules for developing naming conventions together with standards
for data classification, attribution, definition and registration. Part 5 of this standard is Naming and Identification Principles for Data Elements[ISO 5]. This article is based on the information
in that standard.

Types of Names

Data elements are ideally the result of a process of development, involving several levels of abstraction. Levels progress from the most general (conceptual) to the most specific (physical). The
objects at each level are called data element components; their names become name components. Using the Zachman Framework, for instance, the highest levels of definition are contained in the
business view; development progresses to the implemented system level.

Components are defined and combined differently at each level. Each component contributes its name, or part of its name, to the final products. The rules by which these component names are combined
are a data element naming convention. Also, one data element may have many names depending on context of use. Naming conventions must reflect this multiplicity.

After the conceptual components are developed by a process of specification from the highest conceptual level, a representation term is assigned which may in turn be derived from a structure set or
process. Components are envisioned as a set of building blocks that can be assembled into data elements, and serve to ensure that the end product, the total set of data elements, is as discrete and
complete as possible.

Names derived in this way serve as the primary means of identification for elements external to systems that process them. However, within physical systems, names are subject to constraints imposed
by software limitations. Other names may be used by reports or EDI files. Provision for identification of synonymous names is made through sets of name-context pairs in the element description.

Since many names may be associated with a single data element, it is important to also use a unique identifier, usually in the form of a number, to distinguish each data element from any other. ISO
11179-5 discusses assigning this identifier at the International registry level. Both the identifier and at least one name are considered necessary to comply with ISO 11179-5. Each organization
should decide the form of identifier best suited to its individual requirements.

Levels of Abstraction

Name development begins at the conceptual level (See Figure 1). At this stage, a set of concepts exists as entities or objects (called object classes), which, with the assignment of properties,
become data element concepts (DECs). An object class represents an idea, abstraction or thing in the real world, such as tree or country. A property is something that describes all objects in the
class, such as height or identifier.

Each of these components has its own name. When applied to data element names, these are called object class term and property term. DECs are named by combining the object class term and the
property term. From the examples above, we can form the DECs tree height and country identifier. DECs also contain conceptual domains, which are composed of value meanings. These value meanings are
defined but do not have a specific form of representation (Figure 2).

The next step in forming data element names takes place at the logical level. A complete logical data element must include a form of representation for the values in its data value domain (the set
of possible valid values of a data element). The representation term describes the data element’s representation class. The representation class is equivalent to the class word of the prime/class
naming convention many data administrators are familiar with. For example, name, code, and measure can be applied to the DECs above to produce tree height measure, country identifier name and
country identifier code.

Figure 1. Components of a Data Element
mouse-over to enlarge

Figure 2. Levels of Abstraction
mouse-over to enlarge

Notice that identifier name and identifier code are somewhat redundant. A naming convention could include a rule that eliminates redundancy by allowing the dropping of a property term in this case.
The property would still exist as part of the inheritance structure of the data element, but it would be rendered invisible in terms of the data element name. This rule would make name concision
easier to achieve.

Some logical data elements can be considered generic elements. These are data elements that have a well-established data value domain and are recognized at the organizational level or above as
useful and shared among several systems. Country name and country code are both potential candidates for designation as generic elements. ISO standard 3166, Codes for the representation of names of
countries, presents a well-established reference list of country names and codes.

Note that this is the highest level at which true data elements, by the definition of ISO 11179, appear: they have an object class, a property, and a representation.

The next level of data element development is the application level. Typically, a data element will be customized to an application by subsetting its data value domain or narrowing the definition
(or both) to include only those values of interest to the application. Changes in the name to reflect this will be accomplished by addition of qualifier terms to the logical name. For example, if
an application of Country name were to list all the countries a certain organization had trading agreements with, the application data element would be called Trading partner country name. The data
value domain would consist of a subset of countries listed in ISO 3166. Note that the qualifier term trading partner is itself an object class. This relationship could be expressed in a
hierarchical relationship in the data model.

The last type of name is the physical name. These are the names which actually appear in the database table column headers, file descriptions, EDI transaction file layouts, etc. They will have
abbreviations and possibly other accommodations to the restrictions of a particular software system, and they may also have additional information about their origin or format. For example,
trd-ptnr-3166-Eng-name may appear in an EDI transaction file. (Expanded, this name would read Trading partner ISO 3166 English name.

In a registry, each of the above names, and name components, will always be paired with a context attribute. This will serve to identify the source or usage of the name or name component. One
registry entry will serve to gather all the names of each data element, and allow users to trace all appearances of each data element wherever it occurs, no matter what name it is using at the
time.

Principles of Naming Conventions

We have seen that components of data elements have names. By combining these names in a specific way, that is by following the naming rules, standardized names are given to data elements. These
rules will vary depending on the requirements of each organization developing data elements, but the basic principles for developing rule sets are constant.

There are three kinds of rules that form a complete naming convention:

Semantic rules are based on the components of data elements described above;
Syntax rules prescribe the arrangement of components within a name;
Lexical rules concern the language-related aspects of names.

While the following naming convention is oriented to the development of application-level names, the rule set may be adapted to the development of names at any level.

An Example Naming Convention

This naming convention is adapted from Annex A of ISO 11179-5.

Semantic Rules

These are rules based on the meaning of name components.

Object class terms are based on the names of object classes, which are found in data models (entities) or object models (object classes).
Property terms are based on the names of properties, which are found in data models (attributes) or object models (properties).
Qualifiers may be added as needed to describe the data element and make it unique within a specified context. These qualifiers may be based on sub-entities of the object class that forms the
object class term.
The representation of the data value domain of the data element is described by the representation term. The representation term is taken from the controlled list. (See below for an example
representation term list.)
One and only one representation term shall be present.

Syntax Rules

These rules specify the arrangement of name components.

The object class term occupies the leftmost position in the name, unless it is the subject of a qualifier term.
Qualifier terms precede the component qualified. The order of qualifiers must not be used to differentiate data element names.
The property term follows the object class term.
The representation term occupies the rightmost position.
If a word in any term is deemed redundant with another word, one occurrence will be deleted.

Lexical Rules

These rules determine the standard “look” of names.

Nouns are used in singular form; verbs, if any, are in the present tense.
No special characters are allowed.
All words are separated by spaces.
All words are in mixed case.
Abbreviations, acronyms, and initialisms are allowed.

Representation Term List

Representation terms must be strictly controlled. Their definitions should allow the user to easily decide which term is most appropriate for each data element. This list of representation terms
and definitions has been updated from the Class Word list in Guide on Data Entity Naming Conventions[NEWT].

Amount – Monetary quantity.
Average – Numeric value representing an arithmetic mean.
Count – Non-monetary numeric value arrived at by counting.
Code – A system of valid symbols that substitute for longer values.
Date – Calendar date.
Measure – A record of the dimensions, capacity/amount (non-monetary) of an object.
Name – A designation for an object.
Number – A number associated with an object, used as an identifier.
Quantity – Non-monetary numeric value not arrived at by counting.
Rate – A quantity or amount considered in relation to another quantity or amount.
Text – An unformatted descriptive field.
Time – Time of day or duration.

In addition to these representation terms for data elements, one more term for group elements is appropriate:

Group – Indicates a designation for a set of data elements that have relationships to each other. For example: Employee Address Group.

This article is a contribution of the National Institute of Standards and Technology, not subject to copyright in the United States.

References

[ISO 5] ISO/IEC International Standard 11179-5, Information technology – Specification and standardization of data elements, Part 5: Naming and identification principles for data elements,
International Organization for Standardization, Geneva, January, 1996.

[ISO P] ISO/IEC PDTR 15452, Information Technology – Specification of Data Value Domains, August, 1998.

[NEWT] Newton, Judith, Guide on Data Entity Naming Conventions, NIST Special Publication 500-149, Gaithersburg, MD, October, 1987.

MenuMenu

Using Levels of Abstraction to Name Data Elements

Judith Newton

MenuMenu

Share this post

Judith Newton