Defining Keys

2006 Published in TDAN.com July 2006


Introduction

We all know that keys are essential to relational structures. It is crucial that we be able to differentiate between one instance and another. Consider the following:

Person
Name
Jim Smith

Name here is inadequate as a key; consider what happens when we add additional instance with the same name:

Person
Name
Jim Smith
Jim Smith

How do we tell the differences between the two? Can we assume that we can use occurrence? The second instance was inserted before the first (just to be perverse). In a relational structure you
cannot dependent on occurrence to differentiate between instances.

The solution here is to:

  • Identify ‘natural keys’ or
  • Develop ‘surrogate keys’


Natural Keys

A natural key is a collection of attributes that basically insure that an instance is uniquely identified in all instances. Natural keys can be hard to come by. An attribute is an inherent
characteristic. For example the only inherent characteristics that uniquely identify a person are biometric in nature (e.g. DNA signatures, fingerprints etc). Unfortunately, the biometric
technology is not mature enough to use these characteristics as natural neys. As a consequence, Person entities are almost always assigned a surrogate key.

Lets us then examine an entity that has readily identifiable Natural Keys; Automobile..

ISO 3779 mandates a structure for Vehicle Identification Numbers to be used world wide to uniquely identify Automobiles. The mandated structure for a VIN is:

World Manufacturer Identifier
Vehicle Descriptor Section
Vehicle Identifier Section

The Automobile entity would look like:

Automobile
World Manufacturer Identifier
(Primary Key Attribute) Vehicle Descriptor Section
(Primary Key Attribute) Vehicle Identifier Section
(Primary Key Attribute) Model Body Style Engine Displacement Color
3VW 23G4X2 10984736 Cabroliet 2 door 1400 Red
9BW 23G4X2 10984736 Cabroliet 2 door 1400 Red

Use of the ISO VIN structure permits us to identify 2 occurrences of an Automobile that are otherwise exactly identical.

Natural Keys Selection Rules

  • Must ensure uniqueness
  • Cannot be null in whole or in part
  • Must have an invariant value
  • Must have an invariant format

Must ensure uniqueness

The previous discussion dealt with why occurrences of attributes must be unique. To re-iterate, we have to have a mechanism that allows us to differentiate between two otherwise identical
occurrences.

Cannot be null in whole or in part

In a relational structure, a null value equates to an ‘unknown’ value. If one allows key attributes to assume null values, once the value for the instance becomes ‘known’, then a the
possibility of a resulting duplicate exists.

Must have an invariant value

Once a value is assigned to a primary key attribute, it cannot change. Typically RDBMS’s do not handle changes in key values well. Furthermore, changes in primary key values will cause cascading
change to foreign key values in related tables. Consider the following

Invoice
Invoice Number
(Primary Key Attribute) Invoice Date
123
3/28/2006

Invoice Line
Invoice Number
(Primary Key Attribute)
(Foreign Key to Invoice)
Product Number
(Primary Key Attribute)
(Foreign Key to Invoice)
Quantity

Let us change the value for the Invoice instance from 123 to 456. In order to maintain the relation integrity, we have to change the foreign key reference in Invoice Line. Not that invoice number
is part of a concatenated primary key. As a result of changing the value of the invoice number foreign key reference in Invoice Line, we also have to change any foreign key references to Invoice
Line and so on and so forth.

Must have an invariant format

The comments made regarding invariant values also apply to the format for the primary key attribute.


Surrogate Keys

A ‘surrogate key’ is an arbitrary artificial attribute / column. In tables they quite often are implemented as identity columns

So when does one use a ‘surrogate key’? There are two circumstances where the use of a ‘surrogate key’ is required:

  • No suitable natural key for the entity exits or
  • A concatenated key is so lengthy that performance is adversely affected.

No suitable natural key exists

Let us consider our Person entity again.

Person
Name
Jim Smith
Jim Smith

As mentioned previously, Person does not typically have a suitable natural key attribute. So what we do is assign a arbitrary attribute to the entity to ensure that we will always have a unique
value.

Person
Person Id
(Primary Key)
(Identity)
Name
1
Jim Smith
2
Jim Smith

Too Long a concatenated key

Let us assume that an entity has a concatenated key {CK1, CK2, CK3, … CK255}. A table implemented with 255 columns in its primary key index would exhibit unacceptably poor performance in most
RDBMS’s. So quite often a ‘surrogate key’ column is created for the table and the concatenated key becomes a unique alternate index. At what point a surrogate key becomes necessary varies from
RDBMS to RDBMS. The decision to define a surrogate key column should belong to the DBA who is better equipped to evaluate the performance issues.

Copyright © Maximum Business Solutions

Share

submit to reddit

About James O'Brien

James O'Brien is a IT consultant with over 30 years of experience in the industry. For the last 10 years he has specialized in Data Architecture. His clients included numerous fortune 500 clients both in the United States, Canada and Europe. As a certified product instructor he taught classes in Data Modeling and Model Management all over the United States.

Top