Tales & Tips from the Trenches: Agile Principles in Data Projects, Part 2

COL01x - image - EDThe Agile Data Disconnect

Why do data practitioners, and (especially) data modelers typically run away from Agile?

In traditional projects that use Relational Database Management Systems (RDBMSs), the data model is the foundation upon which the software is built. Developers need a place to store the data. Because the data model is the foundation, the result of a data model change is what programmers call “refactoring”, which is a term for multiple changes. Developers are unhappy with refactoring because often the functionality remains the same, but because the underlying data model has changed, they still have to change their code. They may perceive the structural change as unnecessary.

What this means is that data model changes have a domino effect, or, for an even more visual analogy, a Jenga effect. Remove a block near the base of the Jenga structure, and the whole thing will come crashing down! See Figure 1.

Figure 1. Pulling the Jenga block from the bottom of the structure [2]

Figure 1. Pulling the Jenga block from the bottom of the structure [2]

Ironically, the RDBMS is much more flexible than other data structure types of yesteryear. Hierarchical data structures like IBM’s Information Management System (IMS) database were extremely difficult to change. If there was a change to the structure, it meant the whole application had to change. Relational structures are more resilient, and it is possible to isolate the change easier in an RDBMS than in databases that use a hierarchical structure. Even so, it still results in code changes.

Additionally, there’s the developer/data modeler wall, also called the Relational/Object Impedance. Developers typically don’t like relational data models very much because their programs are written in an object-oriented language such as Java. They have to “translate” the relational model into objects to write their code. This adds extra time to their task, making it easy to see why they wouldn’t be favorably disposed toward data models.

Because they see the world so differently from data modelers, object-oriented developers often minimize the importance of understanding data nuances. Data is often an afterthought for developers. They are heard in scrum meetings saying things like:

  • “Oh yeah! We probably need some data here…”
  • “We can add it anytime…”
  • “It’s just data…”

However, as most data modelers know, designing data “on the fly” usually doesn’t work very well; it never seems to be right the first time. Often modelers must “break all the data modeling rules” to put data in at the last minute. Data modelers are therefore forced into making changes that they would never do under normal circumstances, and do not want to be recognized as their author. There must be a better way for data modelers and developers to peacefully coexist in an Agile world.

Stay Ahead

One of the reasons this seems to occur is that data modeling on Agile projects is often not completed in advance. Many of these kinds of headaches can be eliminated by performing data modeling ahead of coding: stay a sprint ahead. How many sprints ahead? Many of my colleagues agree with the advice: “stay as many sprints ahead as you can!” Some say one, most say two or three. This provides a buffer for discussion and “trying out” the data model. Either allow developers the opportunity to create dummy data and take the data model out for a test drive, or at least to discuss it and see if it fits as many cases as they can think of.

One way to do this is to bake data modeling into Sprint Planning. Try to model in chunks, perhaps in three-month iterations. Have data visualization be the main focus for the sprint(s). Discuss the model requirements first. The model can be built, and maybe as mentioned above a developer can test it. But if this is done, you must be focused! The model should not be used by developers until it is discussed first. A project I was involved in had a very astute logical modeler. This modeler was extremely good at developing a very comprehensive logical model to try out ideas first. “What if” scenarios can be tested (no code, just white board discussion) against the model. Does it handle the data correctly? Is the data connected to other data properly? This kind of planning up front can help avoid falling into the trap of having to “fix” the model later. But it requires discipline! The developers must wait until it is discussed. Therefore, staying a few sprints ahead is a good idea.

Module Dependencies

Earlier in my career, I worked on a very large Agile project that involved a multi-year, new re-engineering effort. The client was developing a very large software system to house all the data for their major operations—the very heart of their business. The system was scheduled to take five years to build. I joined midstream: two and a half years into the project.

This particular project consisted of many modules, and quite a few turned out to be dependent in some way on other modules. They had not done any architecture planning in the beginning, nor had they done any dependency analysis. In addition, they had several versions in play at the same time, with different versions of the data model. It turned out to be very difficult to keep track of the versions, and making a change to the data model was extremely complex. The developers were eager to make their changes, and were forging ahead, forcing changes to the model to be made without understanding all the downstream dependencies. The data model could not keep up, and instead of ensuring that we were modeling the changes properly, we had to retrofit the model without regard to preserving integrity.

This project turned out to be unsustainable. What can we learn from this? Are some projects with many interdependencies not suitable for Agile?

There are Agile methodologies that help alleviate some of these challenges. One example is called SAFe: the Scaled Agile Framework. Here’s a link to their website. Scaled Agile Framework bases development on 9 Principles, two of which highlight dealing with interdependencies:

  • Principle #2: Apply Systems Thinking: “…problems faced in the workplace were a result of a series of complex interactions that occurred within the systems the workers used…it’s a system problem that requires systems thinking.[3]” Systems thinking can help visualize the overall context of the value chain and therefore account for them in software construction.
  • Principle #7: Apply cadence, synchronize with cross-domain planning: “…Synchronization causes multiple perspectives to be understood, resolved and integrated at the same time.” My colleagues that have used SAFe say that its proactivity helped immensely in the reconciliation of interdependencies within projects. This methodology “bakes in” dependency planning. Therefore, if you are developing a very large, interdependent system, make sure you are using SAFe.

I have worked on many Agile projects, none of which has used SAFe. I was gratified to hear that this methodology has been engineered to address the apparent shortcoming of the Agile approach. Practitioners who have used SAFe report success and testify that large projects with interdependencies can be managed effectively with this framework. Use of a methodology that manages dependencies can have a big effect on the data model. It would allow the model to build in the dependencies proactively instead of reactively.


We have discussed the benefits and promise of Agile frameworks and methodologies, and why data practitioners are concerned about whether data can effectively be incorporated into projects managed with an Agile approach. We have shared a few instances where data modelers have a difficult time with Agile, and have offered some ideas to prevent these difficulties. We have discussed the merits of the data modeling effort staying one or more sprints ahead, and how this proactive thinking can help troubleshoot model changes before the developers use them – baking data modeling into sprint planning. We also referenced one specific Agile methodology called SAFe, which is an effective approach for managing module dependencies.

What’s Next: Agile Data Modeling Techniques

Even if you can stay several sprints ahead, changes will come. The next column in this series will discuss various data modeling methods that can be used in Agile projects to respond to change more easily.


[1] The author’s affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.

[2] Cartoons are governed by Creative Commons License. See https://creativecommons.org/licenses/

[3] SAFe Principles: http://www.scaledagileframework.com/safe-lean-agile-principles/



Approved for Public Release; Distribution Unlimited, Case # 17-1619.

©2017 The MITRE Corporation. All Rights Reserved.

The author’s affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.


submit to reddit

About Bonnie O'Neil

Bonnie O'Neil is a Principal Computer Scientist at the MITRE Corporation, and is internationally recognized on all phases of data architecture including data quality, business metadata, and governance. She is a regular speaker at many conferences and has also been a workshop leader at the Meta Data/DAMA Conference, and others; she was the keynote speaker at a conference on Data Quality in South Africa. She has been involved in strategic data management projects in both Fortune 500 companies and government agencies, and her expertise includes specialized skills such as data profiling and semantic data integration. She is the author of three books including Business Metadata (2007) and over 40 articles and technical white papers.