Using Check Constraints to Simulate Domains

All of the popular RDBMS products provide table check constraints: Oracle, Informix, DB2, Sybase, and Microsoft SQL Server. Check constraints enable enhanced data integrity without requiring
procedural logic (such as in stored procedures and triggers). Let’s examine the basics of relational table check constraints.

A constraint is basically a restriction placed upon the data values that can be stored in a column or columns of a table. Of course, RDBMS products have provided several different types of
constraints for a long time now. This includes referential constraints (to define primary and foreign keys) and unique constraints (to prohibit duplicates).

Check constraints place specific data value restrictions on the contents of a column through the specification of a Boolean expression. The expression is explicitly defined in the table DDL and is
formulated in much the same way that SQL WHERE clauses are formulated. Any attempt to modify the column data (i.e. during INSERT and UPDATE processing) will cause the expression to be evaluated. If
the modification conforms to the Boolean expression, the modification is permitted to continue. If not, the statement will fail with a constraint violation.

This functionality is great for simulating the relational concept of a domain. A domain is basically the set of valid values that a column or data type can take on. Check constraints only simulate
domains, though, because there are other features provided by domains that are not provided by check constraints. One such feature is that columns pooled from separate domains should not be
permitted to be compared or operated on by expressions requiring the same type of data (such as arithmetic). For domains to truly be supported the DBMS should support both check constraints and
user-defined data types with strong type checking. This prohibits allowing ridiculous operations, such as comparing IQ to shoe size or adding French francs to Euros.

Forming Check Constraints

Check constraints are written using recognizable SQL syntax. This makes them easy to implement for anyone who has even a passing familiarity with SQL. The check constraint usually consists of two
components: a constraint name and a check condition.

The constraint name is an SQL identifier and is used to reference or identify the constraint. The same constraint name can not be specified more than once for the same table. If a constraint name
is not explicitly coded, the DBMS typically will create a unique name automatically for the constraint.

The check condition defines the actual constraint logic. The check condition can be defined using any of the basic predicates

as well as BETWEEN, IN, LIKE, and NULL. Furthermore, AND and OR can be used to string conditions together.

There are, however, restrictions on how check constraints are formulated. These restrictions differ from DBMS to DBMS, but typical restrictions include:

  • Limitations on the entire length of the check condition.
  • Other tables may not be accessed in the check condition.
  • Limited SQL options are permitted usually resulting in subselects, column functions, host variables, and other DBMS-specific features to be prohibited in a check constraint.
  • One of the operands (usually the first) of the check constraint must be the name of a column contained in the table.
  • The other operand (usually the second) must be either another column name in the same table or a constant value.
  • If the second operand is a constant, it must be compatible with the data type of the first operand. If the second operand is a column, it must be the same data type as the first column
    specified.

Check Constraint Examples

Check constraints enable the DBA or database designer to specify more robust data integrity rules directly into the database. Consider the following example:

The CREATE statement for the EMP table contains three different check constraints:

  1. The name of the first check constraint for the EMP table is CHECK_EMPNO. It is defined on the EMPNO column. The constraint ensures that the EMPNO column can contain values that range from 100
    to 25000 (instead of the domain of all valid integers).
  2. The second check constraint for this table is on the EMP_TYPE column. This is an example of an unnamed constraint. Though this is possible, it is not recommended. It is best to always provide
    an explicit constraint name in order to ease identification and administration. This specific constraint restricts the values that can be placed into EMP_TYPE as: ‘TEMP’, ‘FULLTIME’, and
    ‘CONTRACT'; no other values would be accepted.
  3. The last check constraint on this table is named CHECK_SALARY. It effectively ensures that no employee can be entered with a salary of more than $50,000. (Now who would want to work there?)

Column vs. Table Level Constraints

The check constraint examples depicted thus far have shown column-level constraints. However, check constraints also may be coded at the table-level. A column-level check constraint is defined in
the DDL immediately after the column. Appropriately enough, a table-level check constraint is defined after all of the columns of the table have already been defined.

It is quite common for business rules to require access to multiple columns within a single table. When this situation occurs, it is wise to code the business rule into a check constraint at the
table-level, instead of at the column level. Of course, any column-level check constraint can also be defined at the table-level, as well. In terms of functionality, there is no difference between
an integrity constraint defined at the table-level and the same constraint defined at the column-level.

Let’s augment our sample table DDL to add two table-level check constraints:

The CREATE statement for the EMP table has been modified to contain two table-level check constraints having the following ramifications:

  1. The name of the first table-level check constraint for the EMP table is COMM_VS_SALARY. This constraint will ensure that no employee can earn more commission than salary.
  2. The second table-level check constraint is named COMM_BONUS. This constraint will ensure that every employee either earns a commission or a bonus (or possibly, both).

Check Constraint Benefits

So what are the benefits of check constraints? The primary benefit is the ability to enforce business rules directly in each database without requiring additional application logic. Once defined,
the business rule is physically implemented and can not be bypassed. Check constraints also provide the following benefits:

  • Because there is no additional programming required, DBAs can implement check constraints without involving the application programming staff. This effectively minimizes the amount of code that
    must be written by the programming staff. With the significant application backlog within most organizations, this can be the most crucial reason to utilize check constraints.
  • Check constraints provide better data integrity. As check constraints are always executed whenever the data in the column upon which they are defined is to be modified, the business rule is not
    bypassed during ad hoc processing and dynamic SQL. When business rules are enforced using application programming logic instead, the rules can not be checked during ad hoc processes.
  • Check constraints promote consistency. Because they are implemented once, in the table DDL, each constraint is always enforced.
  • Constraints written in application logic, on the other hand, must be executed by each program that modifies the data to which the constraint applies. This can cause code duplication and
    inconsistent maintenance resulting in inaccurate business rule support. Typically check constraints coded in DDL will outperform the corresponding application code.

The overall impact of check constraints will be to increase application development productivity.

Check Constraints, NULLs, and Defaults

An additional consideration for check constraints is the relational NULL. Any nullable column also defined with a check constraint can be set to null. When the column is set to null, the check
constraint evaluates to unknown. Because null indicates the lack of a value, the presence of a null will not violate the check constraint.

Additionally, most DBMS products provide the ability to specify defaults for table columns. The DBMS may provide system-defined defaults (pre-defined and automatically set by the DBMS) and
user-defined defaults. When a row is inserted or loaded into the table and no value is specified for the column, the column will be set to the value that has been identified in the column default
specification. For example, we could define a default for the EMP_TYPE column of our sample EMP table as follows:

If a row is inserted without specifying an EMP_TYPE, the column will default to the value, ‘FULLTIME’.

A problem can arise when using defaults with check constraints. Most DBMS products do not perform semantic checking on constraints and defaults. The DBMS, therefore, will allow the DBA to define
defaults that contradict check constraints. Furthermore, it is possible to define check constraints that contradict one another. Care must be taken to avoid creating this type of problem.

Examples of contradictory constraints are depicted below:

CHECK (EMPNO > 10 AND EMPNO <9)

In this case, no value is both greater than 10 and less than 9, so nothing could ever be inserted.

EMP_TYPE CHAR(8) DEFAULT ‘NEW’
CHECK (EMP_TYPE IN (‘TEMP’, ‘FULLTIME’, ‘CONTRACT’)),x

In this case, the default value is not one of the permitted EMP_TYPE values according to the defined constraint. No defaults would ever be inserted.

CHECK (EMPNO > 10)
CHECK (EMPNO >= 11)

In this case, the constraints are redundant. No logical harm is done, but both constraints will be checked, thereby impacting the performance of applications that modify the table in which the
constraints exist.

Other potential semantic problems could occur if the constraints contradicts a referential integrity DELETE or UPDATE rule, if two constraints are defined on the same column with contradictory
conditions, or if the constraint requires that the column be NULL, but the column is defined as NOT NULL.

Other Potential Hazards

The manner in which the DBMS enforces constraints differs from product to product. For example, certain utilities, such as the data loading may be permitted to bypass constraint checking. If this
is the case, constraint violations may occur that will need to be corrected after the load finishes. This may be simple if the DBMS provides a utility such as the DB2 CHECK utility to examine data
values and discard constraint violations. However, a utility of this type is not provided with every DBMS.

Other implementation differences exist from DBMS to DBMS, but the basic functionality of the check constraint remains the same: to validate data values that are being stored in the database.

Summary

Check constraints provide a very powerful vehicle for supporting business rules in the database. They can be used to simulate relational domains. Because check constraints are non-bypassable, they
provide better data integrity than corresponding logic programmed into the application. It is a wise course of action to use check constraints in your database designs to support data integrity,
domains, and business rules in all of your relational database applications.

Share

submit to reddit

About Craig Mullins

Craig S. Mullins is a data management strategist and principal consultant for Mullins Consulting, Inc. He has three decades of experience in the field of database management, including working with DB2 for z/OS since Version 1. Craig is also an IBM Information Champion and is the author of two books: DB2 Developer’s Guide and Database Administration:The Complete Guide to Practices and Procedures. You can contact Craig via his website.

Top