Attribute & Column Naming Standards: The Zachman Framework Solves A Problem

History

Naming standards for data are important in maintaining consistency of structure across large systems.

IBM’s “Of’ language” standard named attributes and columns by starting with a “class word” (and more specifically a letter of the alphabet) representing a kind of data, such as “t” for
“text”, “a” for “amount”, or “c” for “code”. They then optionally added one or more modifiers. They ended the attribute or column with a “prime word” representing the entity of which
the attribute was a property. The components of the name were separated by punctuation marks, representing connecting words, such as prepositions and conjunctions. Each punctuation mark had a
different meaning, with a percent sign meaning “of”, an asterisk meaning “on”, and so forth.

Since then, the National Institute of Standards and Technology (NIST) has developed a “Reverse Of” standard, based on the standards developed at AT&T and many other companies. This reverses
the IBM order of components ¾ leading with the prime word and ending with the class word, instead of vice versa. They also standardized the base class word list, expanding the class words to
more than a single character. They also eliminated the variety of punctuation marks and insisted on either underscores, hyphens, or blanks.

The Problem

Although development of this standard has been a benefit to the system development community, several problems remain that are not handled by it.

  • The class word may be redundant.
  • The prime word is redundant in data models.
  • Technology imposes constraints on format (e.g. name length, allowed characters, and case sensitivity) for database columns.

In the first case, the class word is either unnecessary because an attribute is well understood without it or because it duplicates the meaning of a modifier. A typical example of the first
situation might be Product_Description_Text, where it is reasonable to assume that a description will always be text. An example of the second might be Customer_ID_Number, where either ID or Number
alone would seem to be sufficient.

In the second case, the prime word is always redundant in a logical data model because the attribute is always shown in the context of the entity about which it is a fact. This is true whether it
is displayed on a data model or in a report.

The third problem is that various technologies impose a variety of constraints on fields, columns, and element names. These constraints include size (the worst problem), disallowed characters (such
as punctuation marks), lack of case sensitivity, and disallowed spaces. These constraints are in conflict with the desire to create an understandable name.

All of the above can occur in the same company and cause animosity between the business users, data modelers, data administrators, database administrators, and programmers.

The first step in resolving these issues is to recognize that not one but several sets of standards are required. Standards appropriate to a data model are not the same as standards appropriate to
a database design. Very different perspectives are involved.

The key to understanding these perspectives is to use the Zachman Framework for Enterprise Architecture. It describes the constituencies and their differing needs.

John Zachman specifically recognized six perspectives on the systems development effort. As described in previous columns in this space, he recognized:

Scope (Ballpark view): Definition of the enterprise’s direction and business purpose. This is an industry view, concerned with the things that define the nature and purpose of the
business. This is necessary to establish the context for any system development effort.

Model of the business (Owner’s view): This defines – in business terms – the nature of a segment of the business, including its structure, functions, organization, and so forth.

Model of the information system (Architect’s view): This defines the information system necessary to support the business described in step 2. This will be similar to the Business
Model, but in more rigorous information system terms. Where row two described business functions, for example, as perceived by the people performing them, row three describes them specifically as
transformations of data. Where row two described all the things of interest to the enterprise, row three describes those things about which the organization wishes to collect and maintain
information and begins to describe that information.

Technology model (Designer’s view): This describes how technology may be used to address the information processing needs identified in the previous rows. Here relational databases are chosen over
network ones (or vice versa), kinds of languages are selected and program structures are defined, user interfaces are described, and so forth.

Detailed representations (Builder’s view): This is a view of the program listings, database specifications, networks, and so forth that constitute a particular system. These are all expressed in
terms of particular technology specification languages.

Functioning system: Finally, a system is implemented and made part of an organization.

Mr. Zachman also recognized that at each row, six kinds of things were being viewed: data, functions, locations, people and organizations, timing, and motivation. The standards discussed in this
article concern the data column.

The Solution – A Standard by Row

What this means is that a different set of naming standards apply to each row. In general, the standards in the upper rows emphasize readability and making the language of the model accessible to
the world at large, while the standards for the lower rows are more concerned with conciseness and database development and use.

Row 1

The purpose of naming standards in row one is to promote understanding. Here naming is entirely in industry-wide generic terms. When describing things at the scope level, we do not use:

  • abbreviations,
  • prime words (an attribute only appears in the context of its entity),
  • class words unless they are naturally understood terms, or
  • arbitrary length or punctuation limits

Row 2 – Model of the Business

As with row one, naming in the business owner’s view is for the purpose of promoting understanding. The language to be used must be familiar to the business person. In general, only the owner’s
business specific terms are used, rather than industry terms. Aliases for the same term, as used by different areas of the business, are used but kept track of. At this row, there are no
abbreviations nor are the names artificially constrained in their length. Again, no prime words or class words are used.

Row 3 – Model of the Information System (The Logical Data Model)

In row three, we see the first effort at structural standardization. Here terms are standardized for use across the business. These should still be common English terms, but they are rigorously
defined and their definitions are intended to be widely accepted. There are still no abbreviations and no prime words. (Again, attributes are always in the context of their entities.) For the first
time, the class word is required at the end. Spaces still appear between words and there is no punctuation. The length of the name may be mildly constrained, depending on the tools used.

Row 4 –Technology Model (Generic Data Base Design)

Row Four presupposes the selection of a kind of database technology ¾ relational, object-oriented, hierarchical, network, or other. For purposes of this paper, we are going to assume that
relational technology has been selected and that the entities and attributes from Row Three are converted into tables and column definitions.

The standards applied at this row begin to reflect the technology that has been chosen. Among other things, the standards insure that it is possible to keep track of an attribute as it is used in
tables other than the one derived from the entity it came from.

Since spaces are not allowed in column names, these are replaced by a standard punctuation mark, such as an underline. Since column name length may be limited by relational database management
systems, this limit is reflected in the standard.

A prime word is required at the beginning of each name. Even though the column definition is part of a database design, the prime word is not the table name. It is still the entity name. This is
useful because the column may be moved or copied into tables other than the one derived from the column’s attribute. For example, foreign key columns, pointing to a column in a different table,
are prefixed by reference to the entity of the attribute from which the target column was derived. Similarly, if the database design is de-normalized, and a column is moved to a table other than
the one originally specified, keeping the entity name makes clear where it came from and precisely what it means.

All words in the column name may be abbreviated, but these should be according to a standard list of abbreviations. .

Row 5 – Detailed representations (Builder’s view)

In Row Five, the format of a column name is constrained by the specific technology. For example, the column length allowed by DB2 is only 18 characters, while that allowed by Sybase is 32. The
class word added in Row Three and the prime word added in Row 4 remain. The objective here is to deal with technology constraints while preserving as much as possible from row 4 standard.

The names in this row may be subject to extensive abbreviation, use of acronyms, and use of codes.

Row 6 – The Database

The standards in Row Six are, in general, derived from the standard in Row Five.

Example

As an example, consider a customer’s dress size. In Rows One and Two, we can imagine the attribute “Dress Size” in the customer entity. In Row Three, this becomes “Dress Size Code”, adding the
class word, and the entity has been generalized to party. Row Four turns the attribute into a column, and appends the Prime word, producing “Party_Dress_Size_Code”. Row Four columns are also
without spaces. In Row Five, hardware constraints require us to abbreviate the column name to “PRTY_DRSS_SZ_CD. The production system in Row Six uses the same name as Row Five.”

Tool Issues

Currently CASE tool support and enforcement of standards is limited. In general, the tools will let logical model names be anything you want, although some do constrain the length unnecessarily.
Since the criteria for quality there is effective use of English, it’s not likely the tools will ever be able to help.

Some tools can convert the logical model to a physical model and append entity name abbreviations where appropriate. None recognize the difference between Rows Two and Three or between Rows Four
and Five, however.

Conclusion

The Zachman Framework for Enterprise Architecture is a powerful vehicle for identifying the reasons for conflict. It turns out that you argue with each other so vehemently over policy because you
look at the world from different perspectives. Recognize that and learn to translate them between levels and it becomes easier for everyone to work together.

Share

submit to reddit
Top