Data-Driven and Agile Methods: Can We Get Along?

“Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be
obvious.”

Frederick P. Brooks Jr.
The Mythical Man-Month: Essays on Software Engineering
 

Anyone in our profession who has not yet read this classic book should stop reading this article immediately and order the book from Amazon. Brooks was none other than the lead architect for a
modest IBM mainframe software product originally known as MVS.

Brooks wrote this more than thirty years ago. Fast forward to TDAN’s tenth anniversary in 2007, and ”flowcharts” could be replaced with “state-transition diagrams, use case
diagrams, activity diagrams, sequence diagrams”, etc., etc.

This twenty-first century has also seen some competition – if not outright contentiousness – between data management principles and latest-and-greatest application development practices
such as Agile methods. The Agile Manifesto (www.agilemanifesto.org) espouses the following principles (with editorial elaborations by the author):

  • Early, frequent, and continuous delivery of valuable [data and enabling] software
  • Assimilation of changing requirements, even late in development
  • Business people and developers working together daily
  • Working [data and enabling] software as the primary measure of progress
  • Sponsors, developers, and users able to maintain a constant pace indefinitely [i.e., consistency, repeatability and predictability]
  • Continuous attention to technical excellence and good design [of data and enabling software]
  • Simplicity – the art of maximizing the amount of work [e.g., accidentally complex data and software] not done – is essential

Anyone who has built an application using Microsoft Access (come on, admit it, you have) has more than likely applied data-driven methods1. One can develop screens and reports in Access
without having first defined the underlying tables; but based on the development “flow” built into Access, that would seem backwards and would actually be significantly more difficult.
Steps in developing an Access application usually go somewhat like this:

  1. Create the database tables based on the data required for the application.
  2. Build queries over the database tables.
  3. Build screens and reports over the queries.

It really is that simple, for a basic, stand-alone application. Chalk one up for supporting the Agile requirement for early delivery of [data and enabling] software.

Things get a little more complicated when scoping up to a true enterprise-class application. The early complicating factors that arise even before moving on to implementation-platform, security and
deployment issues relate to business rules and data logistics2.

It is quite advantageous to think of all business rules as data constraints, either mandatory or conditional3, that can be specified as part of a data model. Ron Ross, Barbara von Halle,
David Hay and Malcolm Chisholm have written extensively and convincingly on how the conventional entity-relationship metamodel can be extended to capture business rules. Regarding business rules as
purely data constraints allows rules to be specified very precisely and clearly integrated with “structural” data specifications. Adding rules/constraints to our example Access
application as they are discovered supports the Agile goal of frequent and continuous successive deliveries of valuable [data and enabling] software, as well as the goal of responsive assimilation
of changing requirements.

And how would we be discovering these rules/constraints over time? By means of another Agile principle – business people and developers working together daily, specifically by collaborating
on iterative walkthroughs of an evolving prototype of our Access application. The order-entry screen shown in Figure 1 is an example of how rules can be captured and implemented directly in the
application as data constraints.

Figure 1: Rules Captured and Implemented as Data Constraints

The rule might be stated as “each order line must reference a product included in the current product list.” This rule is, of course, implemented initially as a database referential
constraint; but it can then be further defined, directly in the database, by constraining the foreign key column in the Order Line table to a value list taken from the Product table. This same
value list is then available to any screen utilizing the Order Line table.

Other column value constraints include, for example calculations defined in queries/views, as shown in Figure 24:

Figure 2: Calculations Defined in Queries/Views

Data entry screens that include field validations and calculations (defined in the database, of course) support the Agile goal of using working [data and enabling] software as the primary measure
of progress.

Using these and other data-driven methods, sponsors, developers and users are able to maintain a constant pace indefinitely – enabling projects to achieve consistency, repeatability and
predictability.

Another Agile principle is that of favoring working software over comprehensive documentation. In a data-driven application such as our example, we require no use case diagrams, activity diagrams,
sequence diagrams – just an enhanced data model. It’s all in there. Other data-driven tools even enable text definitions from the data model to be displayed as help text on screens
– an example of comprehensive documentation actually becoming working software. (I haven’t yet been able to figure out how to do this with Access, but I’m definitely sure
it’s possible.)

The little Access example here (the Northwind database that comes with the product) is, of course, somewhat tongue-in-cheek. But I hope to have demonstrated that not only is it possible for
data-driven principles to be reconciled with those espoused by Agile enthusiasts, but also that these data-driven methods are agile in their own right. And if – as Fred Brooks suggests
– the usefulness of multiple models is questionable above and beyond an enhanced data model, we’ve upheld the Agile preferences for working software and maximizing the work not done,
and at the same time embedded the essential data documentation in the application where it can achieve the greatest payback.

Endnotes:

  1. Before going any further we should probably dispense with some irrelevancies. Of course, MS Access is not taken seriously by either developers or data professionals as an application
    development platform, primarily because it does not typically produce scalable applications. But the reasons for this are purely technical, having nothing whatsoever to do with its implicit
    underlying development method.
  2. A topic for at least one other article (if not a book).
  3. Several business rules classification schemes have been put forward by the folks who write extensively on the subject. I would argue that anything considered a business rule that is actionable
    in any relevant way by a computer system can be classified as either a mandatory or conditional data constraint on a dependent variable or relation.
  4. A calculation (or derivation) being a constraint upon the possible values taken by a dependent (output) variable, defined as a function of independent (input) variables.

Share

submit to reddit

About Bill Lewis

Bill is a Data Architect with IBM Global Business Solutions. His more than 25 years of information technology experience span the financial services, energy, health care, software and consulting industries. In addition to his current specializations in data management, metadata management and business intelligence, he has been a leading-edge practitioner and thought leader on topics ranging from software development tools to IT architecture. He has contributed to numerous online and print publications, and is the author of Data Warehousing and E-Commerce. He can be reached at lewisw@us.ibm.com.

Top