Published in TDAN.com January 2001
Now that your organization has successfully developed its repository conceptual model and selected its CASE tools’ you’re probably anxious to start using your applications-development environment
(ADE). Although the technical environment may be ready for developing new applications, you must now establish procedures and standards to ensure that the repository is populated in a manner that
promotes object reuse and facilitates application maintenance. This article will discuss the standards and procedures issues crucial to successful management of an application portfolio through a
repository; specifically focusing on names and naming standards.
Many organizations strive to develop an easy to maintain application portfolio. One way to achieve this goal is to establish an inventory of reusable application components in which redundancy is
either tightly controlled or eliminated. However, it’s very difficult to reuse something if you can’t find it. Therefore, the careful selection of concise names to represent the application
component clearly is the key to reusability.
An application component’s name should satisfy two important functions in managing information. First, it should serve as the primary search key for locating the repository objects to which it
refers. Second, the name should provide some indication of the application component’s purpose and meaning when the repository isn’t available to provide the necessary definition. For example,
when a field is encountered within a program, its name should allow a programmer to discern the field’s purpose from the terms included in the name, the abbreviations used for those terms, and the
context of the field’s use in the code.
Unfortunately, the richness of the English language as well as the flexibility of most programming languages work against us when developing a minimal set of application components. The English
language provides many different words for the same definition. For instance, consider the following definition: “A person or organization that purchases the goods or services offered by another
person or organization.” One department may say this definition describes customer, while another may prefer to use the term client. The challenge for information management is
recognizing that customer and client are names for the same object, which should be described in the repository only once.
Programming languages offer an even greater potential for defining multiple names for the same data element. A programmer can select any set of names within the scope of an individual program. Data
can pass between two programs without the requirement that the programs use the same name to refer to the data, as long as the corresponding names refer to the same displacement and physical
length. All too often, however, these separate names are implemented in the repository as different objects, rather than aliases for the same application component.
Now we’ve seen how many different names can refer to a given application component. Likewise, one name may refer to several different application components. The name customer may refer
to a subject area, an entity, and a DB2 table. Unfortunately, you must deal with this proliferation of synonyms and homonyms when managing the information asset.
One way to deal with this problem is to assign a name that uniquely identifies the application component within the repository. You should structure this name in a manner that optimizes the chance
of finding the object when the repository is searched using terms that are the same as or closely related to those included in this name. I’ll refer to this special name as the repository
name.
Naming standards are an approach for assigning names. This approach helps an organization control the proliferation of redundant application components and alias names. The standards define the
terms that can be included in a name, the rules for name construction, and the rules for abbreviating the terms to meet length constraints imposed by the repository, CASE tools, programming
languages, and DBMSs.
The selection of terms included in a repository object’s name is extremely important because it’s the primary defense against duplicate application components in the ADE. However, the careful
selection and standardization of the terms used in naming standards provides a benefit beyond their use in the repository name. These terms also represent the organization’s vocabulary.
Communications among business departments will improve as aliases and homonyms are identified, cross-referenced, and published across the enterprise.
Ironically, some alias terms are necessary because it may be impossible to change your company’s entire business vocabulary. However, if customer and client are linked as
potential aliases, repository name-searching routines can be developed to retrieve objects with repository names that include either term. For example, if end, finish, and
terminate are cross-referenced, you can retrieve objects named end-date, finish-date, and terminate-date when you search the repository using the terms “end”
and “date.”
A common naming standard for attributes and data items is the prime word /modifier/ fact /class word approach, which strives to achieve a name that satisfies the “name for clarity” objective.
This approach has the following rules:
- A. Develop a concise, descriptive definition for the related application component.
- B. From the definition, extract business terms that best describe the object.
- C. Order the terms according to the rules outlined in your organization’s naming standards.
- D. Abbreviate the repository name to meet the length requirements of the target IRDS.
Facts are the primary subjects of the definition. Prime words, then, refer to the name of the entity to which the fact refers. Modifiers further clarify the fact, and class words identify the
datatype (name, identifier, quantity, and amount). These terms can often be identified by answering a series of questions:
- What is being defined? Fact;
- To whom or what does the fact apply or belong? Prime word;
- Which fact is being defined? Modifier,
- Which class of information is being defined? Class word.
As an example, consider the following definition for an attribute: “The date on which a person’s employment ended.” You can extract the relevant business terms by answering the following
important questions:
- What is being defined? End-Date;
- Which end-date is being defined? Employment-end-date;
- Whose employment-end-date is being defined? Person-employment-end-date;
- What class of information is person-employment-end-date? Date (which is already identified as part of the fact).
From this example, you can see that an appropriate repository name for this attribute is person-employment-end-date.
Even if naming standards are followed religiously, however, you may still introduce redundant application components. The main culprit is the first naming standards rule: Develop the definition.
Obviously, different people may create different definitions for the same object. Each definition may be as valid as the one that lead to the name person-employment-end-date, but may have
the potential to lead to entirely different names. For example, the definitions “A date that identifies the last work day for which a person was paid by the company,” and “The date on which the
employee’s employment terminated” are the same as the definition “The date on which a person’s employment ended.” Therefore, your naming standards must include procedures for selecting which
name will be assigned as repository name, and which names will be maintained as local aliases.
Once a repository name has been assigned to an application component, it may have to be abbreviated to meet the length constraints imposed by the IRDS that supports your company’s repository. In a
perfect world, this type of limitation wouldn’t be required. However, for some reason, many IRDS implementors have a strong attachment to name lengths averaging 30 characters. Even some of the
popular business modeling CASE tools force us to identify business entities, attributes, and processes in less than 35 characters. This restriction has forced us to impose the use of standard
abbreviations at even earlier stages of the analysis life cycle.
You can use standard abbreviations to develop names that are based upon a meaningful name (such as the repository name) and meet the length constraints imposed by programming languages. You must
establish an abbreviation for each term in your organization’s standard vocabulary list. The abbreviation should uniquely identify its associated term, but still be easy enough to remember when
encountered in a name.
When applications were developed primarily in COBOL, one set of standard abbreviations, four to six characters in length, was sufficient. However, with SQL becoming the predominant language for
accessing data, a second set of shorter abbreviations is required to accommodate the 18-character length constraints imposed by many relational DBMSs,
You can employ several alternative techniques to abbreviate names. However, it’s important to select one approach and use it consistently. One approach is to abbreviate only when necessary. In
other words, use the full repository name unless it’s too long to meet the length constraints of the target environment. If this situation occurs, the terms in the name should be selectively
abbreviated in some predefined order (for example, first class word, then prime word, then modifiers, then fact). A second technique is to abbreviate always; that is, consistently abbreviate the
terms in the name even if the full repository name meets length constraints.
For naming standards to be effective, they must be implemented through the repository-based ADE. Each CASE tool that introduces new repository objects must be able to enforce naming standards when
the object is created, and search the repository for potential duplicates of the object.
I envision the existence of a PC-like terminate and stay resident (TSR) program that will be available for all CASE products to use in order to enforce proper name construction. Without this
automated support, companies won’t be able to achieve the level of application component reusability that’s so important in implementing easily maintainable systems.
Previously published in Database Programming & Design in January 1992.
Still as relevant as ever. 🙂 Terry, Thanks for allowing me to pull this from the vault.