Conquering the Logical-Physical Divide – August 2009

In my previous article1, I described an approach to database design and development that depends on twocentral tenets:

A separation between the requirements-oriented activity of logical data modeling and the design-oriented activity of physical database design (in other words, recognizing that the logical data model and the physical database design are two separate artifacts, created for two different reasons).
The concept of a Virtual Data Layer (VDL), a collection of virtual data objects (such as views, functions, stored procedures and triggers) that abstracts and encapsulates data from the underlying (normalized) tables.

The intent of this approach is to avoid creating a database that is tightly coupled to a specific application (or an application that is tightly coupled to a database schema). This tightcoupling of application code to the (normalized) database schema can lead to the following problems:

The application has to do a lot of work in order to map the data properties of application objects to the normalized database schema (the so-called “object-relational impedance mismatch”).
The database schema can’t be changed without breaking the application.
It becomes difficult to extend the database design to support new application functionality.
The business fails to realize the economic value of data reuse.

When I’ve presented this approach in articles and talks, I’ve had to respond to a certain number of objections (after all, there is no perfect approach in an imperfect world!). Theseobjections can be sorted out into two general areas of concern:

Concerns about the advisability of putting “intelligence” into the database.
Advice that technology now exists to perform data virtualization within the application (using, for example, Microsoft’s LINQ and Entity Framework, Java’s JAXB, XMLBeans, etc.), rather than in the database.

My response to both of these concerns can be summarized in one of my often-quoted maxims: Let the database do the data work! In other words, make use ofthe resources on the database server (memory, CPU, etc.) and the capabilities of the DBMS (e.g., the query optimizer) to split your application workload and improve both the performance and thescalability of your application (the N-tier approach). Taking this approach:

Improves performance by taking advantage of the DBMS’s ability to pre-define query plans and table joins (and optimize them for maximum efficiency).
Helps ensure data integrity, by enforcing business data rules at the database level (rather than trusting the application code to enforce these constraints over time).
Reduces application development and maintenance costs, by making it unnecessary to replicate and maintain database code across all the applications that use a particular database. For example, by enforcing primary key (PK) and foreign key (FK) constraints in the database, applications don’t have to worry about enforcing these constraints consistently in the code.
Reduces network bandwidth (and improves both performance and scalability), by doing data retrieval and filtering on the database server, and sending the application only the data it actually needs.
Helps ensure that database updates are done consistently and that considerations of concurrency (making sure that two different users can’t update the same record at the same time) and transaction management (making sure that all updates related to a single unit of business work are either committed to the database or rolled back in their entirety) are managed correctly.
Helps ensure data security by restricting access to database objects (views, tables, functions, stored procedures) to defined roles (or groups). This also helps defend against hacking and SQL injection attacks.
Improves the maintainability of applications by making it easier to identify and resolve errors and performance issues related to database code. Most DBMSs provide many tools (such as Microsoft’s SQL Profiler, Database Tuning Advisor, and SQL Query Analyzer) to help quickly and easily resolve such issues.
Makes it easier to change code and data interfaces “on the fly”, in response to problems or changing application requirements. It is much easier to refactor a view, function, or procedure (in most cases) than it is to refactor application code or a database schema!

With all these advantages to consider, why would anyone not want to put data-related code on the database server? One objection I’ve heard is thatremoving all intelligence (code) from the database makes it easier to port it to a different DBMS. However, I’m not sure that this is really the problem we should be trying to solve! Moving from oneDBMS to another is not that difficult, and the various DBMS vendors are more than happy to provide tools, techniques and consulting help to assist with moving from a competitor’s product. The realissue, I think, is whether you regard a database as a business asset, or as simply an application “persistence” mechanism. In either case, you have to think about what makes the mostsense economically. If you think of a database as a business asset, then the economic problem lies in making data easily reusable, and in making databases economically viable. As I’ve pointed outbefore, a database that supports only a single application is probably costing the company more money to design, build, and maintain than it is deriving in economic value (i.e., most companies areprobably losing money on their databases!). The only way to derive significant economic value from data is to continually find ways of reusing data to drive innovation and create value. You can’t dothat (at least, not economically) when the data is tightly coupled to a single application, or when you’re relying on applications to ensure the quality and business relevance of the data. Askanyone who’s ever worked on a data warehouse or BI project!

However, if you regard the database as merely a persistence mechanism for an application, you have a different economic problem. DBMSs (most of them, at least) are quite expensive, and a databasethat supports only a single application (and whose data isn’t, or can’t be, reused) will have almost zero ROI. This may be OK for, say, a web application using an open-source databaseproduct like MySQL; but one of the groups I’ve talked to, from a company that was adamant about not putting any “intelligence” into their databases, was using Oracle! That’s alot of money to spend for a simple “persistence engine”!

In our organization, we work hard to maximize the economic value of our databases at the lowest possible cost. For example, we can auto-generate the database schema (including all business ruleconstraints, indexes, triggers, etc.) and database views directly from our modeling tool. We also have a program that auto-generates CRUD (Create, Read, Update, and Delete) stored procedures from thedatabase schema; these procedures can be used by any application and ensure both transactional integrity and concurrency with minimal locking. This eliminates the cost of writing, testing, andmaintaining application code (in every application that uses the database) to perform these functions, and helps ensure data integrity (and reusability) at minimal cost.

The bottom line is, it makes more sense to do the data work (at least, most of it) in the DBMS. By enabling this work to be done quickly and easily, at low cost, in a way that makes data easy toconsume and reuse, DBAs and other data professionals can add significant business value to their organizations.

NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me. Thanks for reading!

End Notes:

Larry Burns, The Logical-Physical Divide, published May 1, 2009 in TDAN, The Data Administration Newsletter.

MenuMenu

Share this post

Larry Burns