Conquering the Logical-Physical Divide Feb 2010

I’ve received good news (of a sort) and bad news regarding our current Agile project: the good news is that we’ve found someone other than me to do the logical data modeling (see my previous article). The qualifier is that this will be the same person who is in charge of all our business intelligence (BI) and master data management (MDM) initiatives, so getting a sufficient amount of her time is going to be a challenge. The bad news is that the project has been having a hard time getting the requirements nailed down, and we still don’t have a data model! Without a data model, we don’t have a database; without a database, the developers can’t code. After weeks of meetings, enough of the functional specifications have emerged to enable the developers to mock up some screens; now we’re trying to backtrack from the functional specs to come up with the data model.

Many of you are probably thinking that this is getting the cart before the horse, and you are absolutely correct. Our current situation stands in stark contrast to the last Agile project I worked on. In that project, we were able to come to a sufficient understanding of the business data domain during the initial requirements workshops (and a couple of meetings with some domain experts afterward). I was able to produce a first draft of the logical data model during the initial pre-development Sprint1; that model then became the basis for all subsequent application analysis and design sessions during the development Sprints.

Many developers would probably shake their heads at this; but, in point of fact, driving the development process from the logical data model worked very well for us on that project. Far from being a useless design artifact, or an example of “Big Design Up Front,” the logical data model was used as a basis for discussion to make sure that everyone was talking about the same thing, using the same terms. It also became the means to document assumptions and decisions that were made in the course of the analysis and design sessions. Yes, these assumptions and decisions changed over the course of the project, but the logical data model enabled us to answer questions like:

Why did we think this was needed?

Why did we decide to do it this way?

We used the entity and attribute comment fields in our ERwin data model to capture these assumptions and decisions, along with the entity and attribute (business) definitions, which user story (or stories) they applied to, and sample data. We then used a homegrown application to extract this data, put it into document form, and write it to a PDF file. We posted each iteration of this document file (along with a PDF of the current data model) on the project’s SharePoint site so that people could easily print them out, or project them during meetings and discussions. We also made sure that printed copies were always available in the team room.

The difference between these two projects, I think, is that the previous project involved existing and well-understood business processes, while the current project is designed to support a new business process that, as yet, is not very well understood. This is actually the sort of project that the Agile methodology is designed for; the development of the application design and implementation is supposed to progress iteratively as the team’s understanding of the underlying business process grows. This project, however, has an inflexible delivery date, and the time spent trying to develop and understand the business process is severely impacting our schedule. This puts an even greater pressure on our group to get a first-cut data model (and database) delivered as quickly as possible.

Working on an Agile project is probably more difficult for data modelers than for DBAs. DBAs are more used to working iteratively, even on traditional “waterfall” projects (I’m not aware of any DBA who has implemented one and only one version of an application database!). However, the instinct of a data modeler is to “stop the presses” until all aspects of a business process are completely understood and correctly modeled. This will cause conflicts with a development team that may have only two or three weeks to implement some subset of application functionality.

It’s important, I think, for data modelers in particular to understand a couple of things: first, there has to come a point in each Sprint where the team as a whole says, “We don’t understand everything, but we understand enough to be able to begin the current set of user stories.” As I say in my book, “There are two ways to do anything – do nothing until you completely understand the problem, or do something with the understanding you have, and let your understanding of the problem develop as you solve it.” Those of you who do carpentry or landscaping (like I do) will understand what I’m talking about.

The second thing to understand is that the logical data model (as opposed to the database or the application code) is comparatively easy to refactor (i.e., change). It doesn’t take much time (in a tool such as ERwin) to add or delete entities, move or copy attributes, redraw relationship lines, etc. This makes it easy (and relatively painless) to treat the logical data model as a living document, and increases its usability as an application development artifact.

By being willing to develop the data model iteratively, instead of entirely up front, it reduces the risk of having the application design get out in front of the data design (i.e., getting the cart before the horse). The data model, as a living and iteratively developing artifact, becomes much more useful to the dev team as a basis for discussion, analysis, design, decision making, and implementation.

By approaching data modeling in an Agile fashion, and by demonstrating the value of the data model as an application development artifact to the project team, data modelers can greatly increase their value to the development effort and help ensure that the data design drives the application design, not the other way around.

NOTE: I’d like to make this a dialogue, so please feel free to email questions, comments and concerns to me. Thanks for reading!

End Note

  1. The initial Sprint of an Agile project is sometimes, but not always, referred to as “Sprint 0.” Regardless, this first Sprint is usually dedicated to defining the application architecture, agreeing on development standards and tools, and creating the development environment. Since little or no actual development work occurs during this Sprint, it provides an excellent opportunity to produce a first cut of the conceptual and logical data models.


submit to reddit

About Larry Burns

Larry Burns has worked in IT for more than 25 years as a database administrator, application developer, consultant and teacher. He holds a B.S. in Mathematics from the University of Washington and a Masters degree in Software Engineering from Seattle University.  He currently works for a Fortune 500 company as a database consultant on numerous application development projects, and teaches a series of data management classes for application developers.  He was a contribut0r to DAMA International’s Data Management Body of Knowledge (DAMA-DMBOK), and is a former instructor and advisor in the certificate program for Data Resource Management at the University of Washington in Seattle.  You can contact him at