MDD and Analysis vs. Code Models
In the first article, I introduced and explained the approach to application development called Domain-Driven Development (or DDD), explained some of the Data Management concerns with this approach, and described how a well-constructed data model can add value to a DDD project by helping to create the Ubiquitous Language that defines the Bounded Context in which application development occurs.
In the second article, I talked about the importance of modeling and persisting data at as high a level as possible, to ensure that services don’t have to make multiple calls to different subdomains to get fragments of data for a canonical data entity and then translate that data into a useable form for the application.
In this article, I’d like to explore the specifics of physically implementing data entities and attributes (or, if you prefer, objects and object properties) into a physical database or persistence store.
The first point I’d like to make is that there needs to be a distinction between the logical view of data (the logical model that describes the business domain, business rules, and the Ubiquitous Language) and the physical view of data (the physical model that describes how the data will be physically persisted in the data store).[i] Most developers are confused about this, and most books on application development, if they mention data modeling at all, are referring only to the modeling of the physical persistence stores, not the modeling of the data itself.
Even when object developers admit to the possibility of two different models— Eric Evans mentions “Explanatory Models” vs. “Implementation Models” and Scott Millett uses the terms “Analysis” and “Code” models their texts make it clear that they regard these as tightly-coupled. Millett, for example, says this:
DDD treats the analysis and code models as one. This means that the technical code model is bound to the analysis model through the shared UL (Ubiquitous Language). A breakthrough in the analysis model results in a change to the code model. A refactoring in the code model that reveals deeper insight is again reflected in the analysis model and mental models of the business.[ii]
This is wrong. The logical data model should never change unless something in the business domain (such as the Ubiquitous Language, the scope of the business domain, business rules or our understanding of them, etc.) changes. The physical (implementation) model changes when application data requirements change.
Not only is Millett’s view incorrect, it is also not Agile. Having to refactor the logical data model in response to every application code change is a sure-fire way to bog down a development project (I’ve seen this happen!). Moreover, tightly-coupling the logical and physical (or analysis and code) models together results in databases (or physical persistence stores) that are application (or subdomain) specific, and which can not be used for any other purpose. I’ve already talked about some of the issues surrounding subdomain-specific data stores, including the effort and expense of supporting organization-wide uses of the data (including data analytics and reporting, process improvement, etc.) and the difficulty of managing and supporting these disparate data stores.
Nevertheless, each model must be linked through the Ubiquitous Language, and must reflect the team’s shared understanding of the business domain and its business rules. It should never be possible to physically persist or represent data in ways that violate the business’s understanding of the data. We also need to be able to quickly implement changes to the physical model (and persistence store) in an Agile manner, in response to either changes in application requirements or changes in business requirements.
To do this, we make use of a technique called Model-Driven Development (or MDD). In MDD, models are used not only to discuss, understand and agree on a solution of a problem; the models are also used to implement at least a portion of the solution.[iii] This means that our data modeling tool must be able to switch quickly between logical and physical views of the model, perform the necessary logical-to-physical transformations (e.g., collapsing supertype/subtype structures), and generating the necessary DDL code to create the physical persistence structures, data constraints, keys, etc. My data modeling tool contains a macro language that I can use to “massage” the generated code into DDL that meets all of our organization’s standards and naming conventions and which will run on any database without manual editing.
In Domain-Driven Development, it is assumed that each subdomain has its own persistence store which, in most organizations, translates into a separate dedicated database for each subdomain. But this is not very practical, as it increases storage and support costs for whichever team has to manage the databases (in most organizations, databases are managed and supported by a central DBA group, not by the individual application development teams). And, as I’ve already pointed out, canonical data needs to be persisted at a higher level (in its own database, say, or on the ESB Hub) rather than in a subdomain data store.
One way to address this in DBMSs such as Oracle or Microsoft SQL Server is to create a single database for the application and use a separate schema name for the subdomain-specific data. For example, in one application, we created separate schemas in the database for different company Divisions (including Corporate) and then we used the ‘dbo’ schema for the domain data that wasn’t Division-specific. Application teams can be given update access to the schemas while the DBAs manage the database as a whole.
Another important point is that application data (class) objects should never be tightly-coupled to a database schema. This creates a situation where a change to the database schema can break the application while a change to the application can break the database. It’s a good idea to insulate the application data layer from the database schema using what I call the Virtual Data Layer (VDL).[iv] In relational databases, you can use database objects such as views, stored procedures, user-defined datatypes (including table-valued datatypes) and user-defined functions (including table-valued functions) to create the VDL. Another option to consider is a data virtualization product such as Cisco’s Composite or Denodo.
The Data Services Stack showing the Virtual Data Layer (VDL)
When using an ORM (Object/Relational Mapping) tool such as Microsoft’s Entity Framework, I recommend mapping the application object class to a view in the database, rather than a table or set of tables. This provides better performance (as the view code is precompiled and pre-optimized in the database) and helps insulate the object class from any changes in the underlying database schema. Similarly, use stored procedures to implement the class methods, including data persistence and updating. This is much more efficient than trusting to the ORM-generated SQL.
This brings me to my final point: As I mentioned earlier, a lot of developers rail against what they call “Intelligence” in databases, preferring all logic to be coded in the application.[v] But this is counter-productive in a number of respects. First, it drastically impacts both the performance and scalability of the application. Rather than bring a whole mass of data across the network to the application server for processing, it’s much more efficient to process that data on the database server and then send the application only the data it actually needs. For example, for the Warranty application I worked on, I created stored procedures that handled tasks such as identifying applicable warranties that could be sold with a specific make and model of product with particular options, and identifying warranties that could apply to a particular type of product defect. The developers tried to implement these functions in application code, but the code couldn’t be made to perform acceptably. In the database, these complex procedures executed in a second or two.
The advantage to putting this sort of code in the database rather than the application is that relational DBMSs can process large sets of records very quickly. In application languages such as Java, sets of records must be read into a cursor and processed one record at a time. This is very inefficient! One application I saw that used Entity Framework was so inefficient that it required its own dedicated database server; its database couldn’t be on a shared server with other databases!
It also makes sense, from a business point of view, to specify complex data-related domain rules, constraints and processes in the database to ensure that data cannot be persisted or represented in ways that violate the business’s understanding of the data. You don’t necessarily want to entrust this to application developers, who may not understand or appreciate their importance. Also, this code can be automatically generated from the data model, whereas in the application the code would have to be manually coded and tested, thus increasing development time.
Finally, putting data-related code in the database rather than the application makes maintenance and support of both the database and the application easier. Performance issues, for example, are much easier to troubleshoot in the database than in the application. And it’s also much easier to refactor and deploy database code quickly than application code.
To sum up what we’ve covered so far:
- Use a logical data model to assist project Stakeholders (including developers, business analysts, domain experts and project managers) in understanding the business domain and business data requirements (including data rules and constraints), defining the Bounded Context of the application and creating the Ubiquitous Language used for project discussions and agreements.
- Model data entities at as high a level as possible and then bring them down into the Bounded Context analysis model as needed. Use data model patterns and reuse data model entities and attributes as much as possible.
- Associate the logical data model with a physical model that can be used to auto-generate the DDL needed to create physical persistence structures (and their associated rules and constraints) in an Agile fashion. The logical data model should change only in response to changes in the business domain; the physical model changes in response to changes in application requirements.
- Persist data at as high a level as possible, so that applications and services can easily get a coherent view of the data without requiring endless translations. Master data should be maintained in an MDM Repository. Domain and Canonical data can be managed in a single dedicated database and/or a single persistence store on your ESB Hub.
- Use data virtualization (the VDL) to insulate application class objects from your database schema. Avoid tight-coupling of application code to the database.
- Perform data-intensive processing on the database server, not in the application, for greater performance and scalability.
Keeping these principles in mind will help Data professionals add value to whatever project they’re working on, regardless of which application development methodology (Agile, DDD, XP, Waterfall, etc.) is being used.
[i] Burns, Larry. Building the Agile Database (New Jersey: Technics Publications LLC, 2011), Chapter 5.
[ii] Millett, Scott. Patterns, Principles and Practices of Domain-Driven Design (Indianapolis, IN: John Wiley and Sons, 2015), p. 11.
[iii] Burns, Larry. Data Model Storytelling (New Jersey: Technics Publications LLC, 2021), pp.135-136.
[iv] Building the Agile Database, pp. 108-118.
[v] Building the Agile Database, pp.81-84.