I always enjoy reading Scott Ambler’s books and articles on Agile Development, and as a DBA, I’m particularly interested in his “agile” approach to data administration (as expressed, for
example, in his book Agile Database Techniques). However, there is an important technique for creating truly agile databases that I haven’t seen in his writings; one which is well-known
to DBAs, but little understood in the application development community – the use of database views to create a layer of abstraction between the normalized relational tables and the applications
that use the data.
Let’s start with what Mr. Ambler calls the “object-relational impedance mismatch”. What does this mean, and why does it exist? Data structures in a relational database are designed and built as
a normalized association of base tables. This is necessary to ensure data integrity, eliminate data redundancy, improve the performance of database updates, and promote the reuse of data by
multiple applications. But most application objects (class objects) aren’t normalized; they see data in more generalized terms. Take, for example, the Karate School data model in Scott’s May 2005
article in Software Development Magazine (incidentally, I have to quibble with his terminology here; I think this is a logical data model, not a physical one). The model shows the data
split into several tables, such as PaymentType, BeltAttempt, and MembershipHold. But it’s safe to say that the application objects aren’t set up this way; I would venture to guess there are only
two object classes here – a Student class and a Payment class. The “impedance mismatch” results from trying to map the non-normalized view of data in the class model to the normalized structures
in the data model.
Developers find having to work with normalized data structures awkward and time-consuming, since it involves coding complex SQL queries that join data from multiple tables. But storing data in
normalized form is necessary, for the reasons mentioned above. There’s another reason as well: “refactoring” non-normalized data structures into normalized ones after the fact is always
extremely difficult and labor-intensive, and sometimes isn’t even possible (because data in non-key fields must be “refactored” into key fields, and data in these fields may have missing or
Now here’s the good news: you don’t have to work with the normalized tables! In fact, it isn’t necessary for applications (and developers) to even be aware of their existence. It is
very easy (at least for DBAs) to create an abstraction layer of joined views on top of the normalized data tables, putting the base tables completely “behind the scenes”, and out of sight.
A view is a database object that looks and functions exactly like a table, except that it can contain data from multiple base tables. Data in a view can be reformatted and renamed (there’s a way
around those pesky DBA naming standards!), creating a data object that can map exactly to an object class – hence, no more “impedance mismatch”!
Figure 1 shows a simplified example of a normalized data structure for a timekeeping and task scheduling application. Figure 2 shows a view that merges data from these tables into
a single virtual “table”. Figure 3 shows the SQL code that creates the view (Note: the work of creating the views should be done by DBAs when creating the database, based on the data
requirements of the class model, before the database is turned over to developers). As you can see, many of the “columns” (fields) in the view have been renamed, and transformations have been
applied to some of the data formats (for example, the OTApproved field has been changed from a binary bit to a “Yes” or “No” text field). The database identifier keys have been masked
in the view, and replaced with actual data values (e.g., ProjectID has been replaced with ProjectDescription). The various components of the EmployeeName
(EmployeeFirstName and EmployeeLastName) have been joined together. Figure 4 shows the mapping from an EmployeeTask object class to the new
EmployeeTasks view; note the disappearance of the “impedance mismatch”!
Views create a layer of abstraction in the database that enables applications (and developers) to avoid having to deal with the “nuts and bolts” of the actual implementation and maintenance of
the data (probably the greatest single source of friction between developers and data people). In object terms, views provide a means of encapsulating data access without introducing coupling
(i.e., denormalization) or diminishing cohesion in the database. They have numerous other advantages as well:
- A query plan is created in the database for each view; the joins are pre-optimized for maximum speed of access. I’ve been able to improve the performance of applications significantly by
replacing complex joins in SQL code with references to views.
- Views will make use of indexes that have been created on the underlying base tables. You can also create indexes directly on views (although the DBMS vendors need to do a better job of
supporting this feature).
- You can create multiple application-specific views of the same data, enabling a single database to easily support multiple applications and other reuse of the data (such as business analysis,
CRM, and quality improvement).
- Applications need no longer be constrained by database naming standards and conventions.
- Different groups of users can have different views of the data for increased security. For example, in a Human Resources application, data pertaining to employee salaries and employee training
can be encapsulated into separate views, with different access permissions for each.
- Queries on views are much simpler to write (no joins!); stored procedures also become much easier to write. Support for data encryption is improved (since most encryption tools work through
columns on views, not base tables). Views provide a means of cleansing, reformatting, and transforming legacy data without the expense of data re-engineering, promoting the reuse of legacy data.
- Views can be used to reformat relational data into hierarchical (XML) format (see Figure 5), making it easier to create XML-based applications.
Most importantly, views provide a means for making databases truly “agile”. With views, you can change the way application objects see, and work with, data on the fly, without the time and
expense of costly “refactoring” efforts. If you want to add, remove, or rename a column, change a column format, or bring in new data from another source, it’s as easy as editing and recompiling
the view (or creating a new view, if you don’t want to interfere with other developers and applications). Applications can be migrated over time from one view to another, with both views supported
in the database during the deprecation (transition) period.
Of course there are some issues, at least for DBAs, in this approach: views have to be written and maintained (possibly written by DBAs and maintained by developers?), and all major database tools
(including the vendors’) are table-based. You can easily generate DDL to create tables, indexes and triggers from data models and create change scripts to implement table changes in multiple
environments, but vendor (and tool) support for views is minimal, which makes working with views more labor-intensive (and costly).
Views have other limitations as well: although you can combine local data with data in other databases (called foreign views), you can’t update local data through views that reference non-local
data (why not, I wonder?), nor can you create primary key / foreign key relationships involving views (again, why not?).
Another major issue for DBAs is the problem of updating the base tables in the database through updates on the views. Where I work, the general approach to database updating involves the use of
stored procedures (what we call fundamental stored procedures, or FSPs), the code for which can be automatically generated from database tables, and I am currently working on extending this
capability to views. We also created a .NET component to encapsulate the ADO.NET code required to invoke these procedures, in order to make their use easier for developers.
It is also possible to handle the table updates directly from updates on the views, through the use of what are called Instead-Of triggers. Instead-Of triggers are pieces of SQL code that function
like stored procedures, except that they are invoked automatically whenever an INSERT, UPDATE, or DELETE command is executed against a view. Unlike normal database triggers, Instead-Of triggers
actually do the work of performing the database update (that’s why they’re called “instead-of” triggers), and they can take the data from the view and split it into separate updates for each of
the base tables involved. Instead-Of triggers can also handle transaction management (making sure that all table updates are either committed or rolled back as a single unit of work), concurrency
control (handling multiple updates on the same view from multiple users), and error reporting (throwing exceptions back to the application if a database error occurs), the same way our FSPs do.
However, these Instead-Of triggers have to be coded manually; currently, we do not have an automated means of generating this code. This involves more DBA time and work (which can impact
development schedules and budgets), and introduces the possibility of coding errors. The trigger code must also be manually maintained over time, to match changes to the views and/or base tables.
However, this does at least transfer the work of dealing with the “object-relational impedance mismatch” from the developers to the DBAs, where it belongs.
In spite of the issues and obstacles involved, I firmly believe that the use of views in databases affords the best hope of bringing developers and DBAs together, facilitating the application
development process, and bridging the object-relational gap. I encourage developers and DBAs everywhere to seriously consider this approach.