Agile Data Design August 2013

As I mentioned in my article No SQL and the “Data Dump”, I’ve been having numerous conversations with our developers about enforcing domain values in our databases. I’ve told the story of how some of our developers requested an “Entity Attribute”-type data structure to store data in. However, when I told them we would be using a regex validation trigger to enforce domain values on this data, the request was withdrawn. We went back to a traditional structure in which each attribute was defined as a separate column in the database table.I also get lots of requests to store all data, even dates and amounts, in nullable varchar (text) fields. The developer argument is that all their application needs to do is to store and display the data, and doesn’t care what the domain of the data is. My argument is always that the business needs (or may need) this data for other purposes, and needs it to be in a form that is both correct and consistent with the business view and definition of the data.

The latest installment in the Great Domain Debate surfaced just last week: I’m now being asked to allow developers to enforce data validation constraints in their applications, rather than enforcing them in the database. Their argument is that database constraints represent a potential failure point for the application that is outside their control. Having data constraints coded in the application would give them more control over how data validation errors are detected and handled.

My initial response was that this is precisely why we want data constraints defined and enforced in the database. What if the application is coded to simply ignore any data validation error that occurs, and write the data to the database anyway?

My other objections to doing data validation in the application (only) and not in the database are as follows:

  1. Application code can quickly become outdated, or degrade over time as people work on it. Code that works correctly in one version of the application may not work correctly in the next version.
  2. There would be no data validation for updates that were done outside the application (e.g., ad hoc or batch updates).
  3. Applications (and application developers) are more properly concerned with application functionality, not with data quality. No developer in the history of IT has ever been recognized or rewarded for his or her contributions to data quality! My book1 gives an example of an application that failed because critical data requirements were not captured, even though the functional requirements were captured correctly. The bottom line is that data quality concerns need to be addressed by people who are concerned (and knowledgeable) about data quality.
  4. When projects run short of time and money, and project scope needs to be reduced, what’s more likely to be cut – application functionality or data quality?
  5. All applications that update the database would have to have the same data validation code. This code would have to be tested for all applications, and changes to the code would have to be synchronized (and retested) for all applications. For this reason (as I point out in my book), putting the data validation in the database actually saves developers time and effort, and speeds up projects!

As noted at the beginning of this article, the basic premise of these requests is flawed. It assumes that the only purpose of a database is to provide a repository for application-specific data (a persistence store, in developer terms). As my company’s data architect, I hold fast the basic premise of data management – that all organizations have data and information needs that lie outside the boundaries of specific applications (e.g., market analysis and segmentation, process improvement, supply chain management, customer relationship management, quality improvement and so on). This means that our data must be meaningful (corresponding to a business meaning and purpose), correct (having a value that corresponds to a valid business state), and accessible (quickly and easily, to authorized users).

This begs the question, though, of what degree of domain definition needs to be enforced in the database. Do we need to validate all possible values of an attribute? To use an example from my previous article, if you’ve defined a License Plate Number attribute, and the possible values are of the form ABC123 (three capital letters followed by three numeric digits), do you need to validate incoming values using a regex pattern?

My stance on this is that the level of validation you must do in the database depends on the business definition of the attribute. If the business definition of the attribute requires that values be constrained to a fixed domain of values, then you can enforce this definition in the database using a code table, rule, or check constraint. If the business definition of the attribute requires that all valid values match a business-defined pattern, then you can enforce that using a regex validation trigger. Otherwise, you only need to enforce the standard attribute definitions of data type, data length, and nullability.

To put it another way, imagine a business user looking at the data for this attribute. What values will this user expect to see? What (erroneous) values would violate the user’s understanding of the business meaning of this attribute?

If you care about the shareability and reusability of your enterprise data, do not entrust the quality of your data to the applications that create it. Make sure that proper data validation constraints are defined and enforced in the database.

NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me at Larry_Burns@comcast.net. Thanks for reading!

References:

  1. Burns, Larry. Building the Agile Database (Technics Publications, 2011).

Share

submit to reddit

About Larry Burns

Larry Burns has worked in IT for more than 25 years as a database administrator, application developer, consultant and teacher. He holds a B.S. in Mathematics from the University of Washington and a Masters degree in Software Engineering from Seattle University.  He currently works for a Fortune 500 company as a database consultant on numerous application development projects, and teaches a series of data management classes for application developers.  He was a contribut0r to DAMA International’s Data Management Body of Knowledge (DAMA-DMBOK), and is a former instructor and advisor in the certificate program for Data Resource Management at the University of Washington in Seattle.  You can contact him at Larry_Burns@comcast.net.

Top