Pragmatic Data

Please welcome William Brooks and his new quarterly column to the pages of This is the first in the series titled Pragmatic Data that will address how to solve data issues in a pragmatic way.

Data Modeling has always been Agile

Or rather, “Good data modeling has always been Agile.”

Much has been said and written about the demilitarized zone between data modeling and Agile software development. Over the last decade or so, adherents to one approach or the other often insist on the superiority of their preferred approach. A number of very smart people have studied Agile development and data modeling and proposed ways to reconcile them. I have spent the couple of years with some of them developing and implementing an SDLC to bring data artifacts to Scrum. And yet, data modelers and Agile teams still seem at odds, or at war, or at an awkward family event.

What is getting in the way?

Sure, there are technical challenges working with normalized data structures in object-oriented languages – the “Object-Relational Impedance Mismatch.” And yes, Agile methodologists sometimes cast data models as costly design artifacts to be avoided. But there is more to the conflict than complicated joins, budgets, and bottlenecks of either performance or process. Misperceptions and poor practices from both architects and agilistas quickly strip away the value of the modeling process. Cohesive teams divide and turn down separate paths; technical debt develops, modelers struggle to be heard, and data begins to corrode away behind beautiful facades.

A few years ago, Scott Ambler lamented how few books about agile data were available (ironically, of course, since Ambler’s original work on the topic is excellent, and his lament was in the foreword to Building the Agile Database, by Larry Burns). Ambler, Burns, David Hay and other authors have proposed some well-tested approaches to address the challenges of database refactoring; bridge object-oriented code to relational databases; and combine entity-relationship diagrams (ERDs) with UML object models. Why, then, do developers so frequently still see data modeling as a bottleneck? Data architects and DBAs have proven for decades that solid data models can improve flexibility and reduce long-term cost. Why do Agile teams still get bogged down in technical debt that good data modeling could prevent?

Bluntly put, it’s because many developers, not to mention DBAs, modelers, and architects, have forgotten that data modeling is – and has always been – an agile (and Agile) process. Project teams that are using Agile still often expect a data model to appear at the beginning of a project and be perfect and unchanging, finding it easier to justify a workaround than a database refactoring effort. Many modelers are driven to create a fully attributed, normalized, defined data model of the entire project scope before a database table is ever created. A retrospective is in order. Let’s reexamine what good data modeling practice looks like. Let’s look at why Agile works. And if we have to, let’s change our approach – just a bit – to get the best of both iterative development and thorough analysis.

The Agile Manifesto holds that interactions are valued over processes and tools, and that collaboration is valued over contract negotiation. Data modeling, like software development, can certainly be done in a back room, with the door closed, and still turn out an artifact that DBAs can turn into a database. Good data modeling, on the other hand, (like true Agile development) is a team effort. A good data modeler leads a dialogue with, and often between, the people who really know how the business works. The boxes and lines of a logical data model provide both a validation and a common point of reference. Saying what each relationship means when it’s implemented in code (or in real people’s actual jobs) fuels discussion. If the entities in a data model don’t match the real world, the modeler should adjust until they do, not fight to redefine the business in the form of a model. I’m not saying that a modeler shouldn’t seek to disambiguate terms and call out inconsistencies in vocabulary. There is value delivered when a model is accurate, but we also deliver value when a model is wrong and the ensuing “enthusiastic debate” elicits requirements that would have been taken for granted by the product owners and missed by the designer. When the whole team is actually in the room – and they should be – a model on the wall can evolve as quickly as the conversation.

Good data modeling delivers software. Not lines of code or apps, but real, working tools for communicating the way the real world is (or should be). Agile methodologies recognize that the only way to ensure a viable product is frequent delivery of real, working software to the people who will actually have to use it. Agile accepts (embraces!) the fact that there is no realistic way to deliver an application that gets everything right after months or years of hunched shoulders and headphones. Instead, they break down the product into small, manageable chunks that can be discussed quickly, developed rapidly, and demonstrated independently, constantly engaging with the product owner and eventually building to at least a minimum viable product (MVP).

What constitutes a viable model can vary depending upon the where a project is in its lifecycle. Alec Sharp, a business process and facilitation expert, is a proponent of using conceptual modeling for requirements elicitation. Karen Lopez, a data modeling expert well-known in data modeling and DBA communities, has shown that focused logical modeling can reveal business rules important far beyond database design. Good data modeling does not seek to first deliver a complete, fully-attributed logical model, and then stop for validation. Instead, it delivers enough scope and detail to be viable, delivers the right scope and level of detail, and then builds on that foundation.

By using a common visual vocabulary, like Barker or IDEF1X, and consistently using modeling software to automate routine tasks and make models easily accessible, data models become more than just an artifact to throw away. In Agile methodologies, whiteboard sketches are often considered “good enough” documentation. However, contrary to popular interpretation, “Agile,” does not mean “no documentation.” Although sticky notes and whiteboards are often used to keep track of projects (e.g., tasks in Kanban, backlog and burndown in Agile), teams often use purpose-built tools to turn those throw-away, hard-to-share artifacts into long-term requirements documentation and generate metrics teams can use to improve their performance. Data modeling of some kind is taking place, even without a data modeler involved, but whiteboard sketches and Microsoft Visio diagrams make iterative modeling difficult and make reuse next to impossible. Even when physical database design differs sharply from the logical model, both become far more valuable when stored in the same library and using the same conventions of design across projects and teams.

All that’s left is to make more modelers available to use the tools and perform the logical design. Agile methodologists (Scrum, in particular) strongly recommend that the entire team, including the product owner, be 100% dedicated to the project and rightly see most processes external to that team as impediments to forward progress. Of course, dedicated data modeling resources are often easier to ask for than to actually get funded. Agile recognizes that it is nearly impossible to have every role on a project team staffed by a dedicated resource, and suggests instead that team members be “generalizing specialists.” A great Java developer can also be a good Unix admin. A strong SQL developer might also need to be able to write C# well enough to get by in a pinch. Don’t have anyone who knows Angular? Train a few of your existing developers. The same should be true for data modeling.

While there are sometimes dozens of Scrum teams in a shop that uses that methodology, it would be rare for the same firm to have dozens of full-time data modelers it can dedicate to a project. When one of those teams must wait for the lone data architect to be available, the data modeling becomes an impediment – or worse, gets poorly done or skipped altogether. With standard conventions and a standard development environment, we should be able to train existing team members as modelers – at least, good enough modelers to get by. Just as with any programming language or framework, some projects will need “good enough” modelers, and others will still need dedicated data modelers or data architecture specialists. Just as with any other specialty, those specialists should offer a bit more than one skill. Data modelers often bring DBA, SQL, visualization or analytical skills, to name a few. Don’t have standards yet? Get some! Modeling in one-off departmental tools? Standardize! Some of your developers just aren’t able to do both? Get new developers. Or become one.

Training and deputizing developers as data modelers (or becoming a developer) may seem like a leap, and it’s exactly the sort of leap required for data modeling to continue providing its well-established benefits in this rapidly-changing environment. Democratizing data modeling and pivoting from command-and-control to providing guidance and transparency will broaden the reach of your practice, improve code and database quality, and bring transparency to database design. And if that seems like a tough sell, just wait for my next column about where big data should live.


submit to reddit

About William Brooks

Bill Brooks has been modeling, managing and integrating data since 1995, beginning at CID Associates developing application databases, then at Children's Hospital Boston as manager of the Decision Support Systems Group. He managed data integration before becoming Enterprise Data Architect for MFS Investment Management. Bill is now Chief Data Architect at Mercer, where he is developing a firm-wide data architecture practice. Bill's background includes traditional relational database design, data warehouse design and implementation, and enterprise application integration using a variety of ETL, message broker and service bus approaches.

  • Richord1

    The challenges with data models have little to do with the methodology used to design the data model and more to do with the lack of Data Literacy. Data modeling is driven from a technical rather than socio technical perspective. Data is designed to optimize the database technology rather than a human’s ability to gain knowledge from the data.
    Data is the language of the organization yet we continue to view it as a technical construct. We wouldn’t try to write a book without being literate but modelers design databases without understanding the semantics, and pragmatics of data. Arguing whether Agile is a better process has not improved the quality of data. It’s time to change the conversation and look at why we continue to build handicapped databases and discuss the fundamentals of data. Design data first and you will create fountains of knowledge rather than silos of trash.

    • William Brooks

      You hit the nail on the head Richord1.

      In finance, businesses solve this problem by giving financial management a C-level office, formalizing its standards and requirements globally, and staffing its support within the organization with well-trained specialists. Then they train and monitor to ensure that everyone from senior leaders to line managers to AR accountants understands how the basics apply to them in their day-to-day jobs. Doing the same for data literacy sure makes sense to me!

  • tfeltz

    Great metaphor, Richord1:
    Design data first and you will create fountains of knowledge rather than silos of trash.