Numerous concepts have been introduced into the dialog of data architecture and design over the past quarter century. Whether the concepts are those of Martin, Codd, Simsion, Yourdon, Date,
DeMarco, Booch, Gane or Chisholm, the results have been either good or bad – simply dependent upon who worked on the project. However, as automated systems within an enterprise become more
numerous and larger, the one characteristic that continually becomes more pervasive is the incompatibility of data and database designs across applications.
In fact, incompatibility is frequently the outcome when people lose contact with one another as the result of forming a new group. It does not take long for new groups to head off in their own
direction; it can occur in minutes, never mind what happens after months or years. The need for autonomy is usually explained as necessary in order to get anything done in a reasonable amount of
time. As a result, numerous vocabularies, data structures, and information architectures become created with each new project.
The good news, however, is that there is an approach that can bring people on different projects, with disparate vocabularies and different paradigms, together such that for the first time you can
avoid getting different data models from modelers when modeling the same things.
As you may have probably already guessed, the vocabulary itself is not the key. A number of companies have attempted to consolidate their vocabulary to a specific set of terms. However, no matter
how well these terms may be defined, the ability to create disparate database designs is more formidable than a vocabulary can address.
We are also not talking about the types of differences that would result from having different knowledge, business priorities, or modeling style, as style can be addressed by rigorous standards.
The differences that are created every day in data models arise from how we choose words and conceptualize abstractions: aye, there’s the rub.
“The answer resides in how we form abstractions and the level of abstraction that we choose.”
For example, if we look at the two major types of automation systems that exist, there are control systems, which live in the tangible world of physical objects, operating mechanical equipment like
flying a B-2 bomber, which is far too complicated to fly without a computer, and information systems, which cross into the world of intangible ideas and concepts.
Among these two types of automation systems, the disciplines of design and automation appear to go in completely separate ways. From a process perspective, these two types of automation systems
truly belong to distinct paradigms having about a hundred fundamental differences between them (which is a different and exciting story).
From a data modeling perspective, however, control and information systems are not fundamentally different; they just abstract things differently. More precisely, in control systems we do not have
to abstract things at all; whereas in information systems, we abstract many things at many levels of abstraction, for both process and data.
From a data modeling perspective, control systems inherit concrete physical objects that occasionally differ in their names, while information systems inherit ambiguous language-based names that
can vary widely from one individual to the next, not only with their vocabulary, but with the way in which each individual abstracts an idea before labeling it with a name.
Existing engineering disciplines, such as normalization, reorganize attributes to minimize the redundancy of data values, but offer no ability to alter an abstraction, nevermind render a consistent
set of abstractions among multiple disparate ones.
As a result, what have the great minds of architecture and design been missing? Although the answer has been elusive for a number of decades, the solution is surprisingly simple.
In the final analysis, it doesn’t even matter what particular name we assign a given thing, as long as we perform a set of basic steps to ensure that our abstractions are consistent and
correct before we begin the data normalization process. As such, the Four Rules of Abstraction are extremely straightforward.
The Rules of Abstraction
1st Abstract Form (1AF) – self dependence
Objects having business synonyms with the same business definition are combined into the most representative term denoting the synonyms (e.g., Patron, Applicant, Client, Consumer, Purchaser and
Customer)
No matter what particular vocabulary we elect to use, it will be helpful to begin by managing the plethora of words and terms to make the task easier.
2nd Abstract Form (2AF) – time dependence
Objects having discrete business definitions that represent the same underlying object at different points in time should be combined into the most representative term denoting the object (e.g.,
prospect, lead, customer, former customer, deceased customer)
No matter where the object exists in time, the object is the same; the object simply has a different type or status associated with it to designate its placement within the life cycle. There are
also many situations where the object may exist within multiple places within the life cycle simultaneously. For example, an existing customer for one product may be a prospect and a former
customer involving two other products.
3rd Abstract Form (3AF) – essential dependence
Objects having discrete business definitions that are analogous in some way to one another should be combined if the business attributes that uniquely identify them are the same (e.g., a long
term treasury and a common stock uniquely identified by a CUSIP number)
No matter how different the objects may appear, when a combination of business attributes can be used to uniquely identify one occurrence of one object from one occurrence of another object, then
the abstraction may be considered as valid.Of course, the attributes used to uniquely identify the objects must already have been accepted as attributes within the business community and industry.
4th Abstract Form (4AF) – accidental dependence
Objects that combine other objects that do not share the same set of business attributes that uniquely identify them must be separated into their discrete individual objects (e.g., vendor,
insurance policyholder, and insurance policy beneficiary do not represent an appropriate abstraction)
The unique business identifier for vendor may be their standardized business name and standardized primary business address, while for customer it may be the standardized person name and
standardized primary residential address, while for beneficiary it may be standardized person name, standardized birth date, and gender.In this example, none of the unique business identifiers can be used to match one occurrence of one type of object with another. As a result, there is no reliable way to perform matching to
determine that the owner of a vendor may also be a major customer and/or a beneficiary at the same time, which can only contribute to confusion in the resulting operational database that is
modeled in such a poorly abstracted form.
Although the rules of abstraction may not be appropriate for individual user conceptual data models where it is more important to capture the precise language of the business user, it does become
appropriate for a unified conceptual model and essential for a logical data model.
Once the rules of abstraction have been performed, tasks like mapping data elements to fields in metadata repositories and columns in databases becomes an easy exercise.
More importantly, however, the rules of abstraction can pave the way for many of the challenges we face, such as the discipline of software reuse and even SOA.
Put another way, using the rules of abstraction within information systems is a way to bring information systems one step closer to the tangible world of control systems, where
architecture and design are significantly easier in either an object-oriented or SOA approach.
In any event, please let me know if you enjoyed this article. Corrections, enhancements and suggestions are always welcome and are requested.