This article attempts to answer two questions about database design:
- Where do we start?
- How do we know the design is correct?
The answers lie in something called a context model. A context model is model that shows how IT applications fit into the context of the people and the organization they serve. Context models are
sometimes called enterprise architecture models, sometimes high-level design models and sometimes conceptual models. I like the name “context model” because it says what it is.
But first – what’s a model? A model is a simplified representation of something real or imaginary. Even if you don’t use a modeling tool or haven’t written anything down that describes the IT
application’s context, you still have a context model, but it is in your head. You must have a simplified mental model of how IT applications and databases work in the real world because the only
alternative is to memorize every line of code.
Models are used primarily to help others understand the thing you are modeling. This is obviously true for scale models of ships or buildings but is just as true for models of IT systems. Models
have another use which, to my mind, is just as important. They can be used to test and verify the design. For instance, a model of a plane can be tested in a wind tunnel; one of my aims is to build
the equivalent of wind tunnels for IT systems.
Unfortunately modeling IT systems in general, and context modeling in particular, has been rather unsatisfactory. One problem is that they don’t have the property of scaling in and scaling out. We
miss the ability to change the scale and see high-level (wide area of view, little detail), medium-level and low-level (narrow area of view, much detail) views of the same thing. The nearest I can
find is a hierarchy of process flow. At the highest level – business functional areas. Next down – business processes. Next down – user dialogue message flows. Next down – message flow between
software components. Bottom level – code. The trouble is, this concentrates on processing only which is like a map that has only railway lines; useful if that’s all you are interested in. We seem
doomed in IT to have many different kinds of diagram showing different aspects of the system at different levels of detail. This is partly addressed by having a common data store so the same
information in different diagrams has one source. However the data must be structured according to the logic of the system being modeled rather than follow the structure of the diagrams, otherwise
we never get real data sharing between the different views.
I have tried to organize the multiplicity of diagrams using a framework of models, which is shown in figure 1. Context modeling is the highest level. Logical models takes the context model and
completes it with detailed logical data structure and processing rules. At the same level as the logical model is the application architecture – the arrangement of application services, databases
and tiered structure – and the technical architecture – the choice of technology, security and systems management design. At the bottom, is the detailed design. The context model is crucial to the
whole picture – without it, it’s hard to see how the pieces fit together.
Today we do context modeling using business process modeling (for instance using UML activity models or BPMN) supplemented by Use Cases to flesh out the detail of the tasks. In the past it was
dominated by using data flow diagrams for process modeling. This brings us to the second problem with modeling; all these techniques are poor at modeling data. While IT applications do control
workflow and occasionally make a few business decisions, they overwhelmingly do little more than store and provide information. The fact that today’s context modeling misses this aspect of the
system is, to my mind, a crucial failing. While they sometimes show data flow between activities they rarely distinguish between shared persistent data (data stored in a database) and message data.
Furthermore they are poor at modeling the dependencies between data flow and control flow (control flow being how the steps in a business process are ordered.)
So I have come up with a form of context modeling call box bag modeling. To explain it, imagine that behind your screens are demons (hot from a physicist’s thought experiment) or house elves (if
you are a Harry Potter fan), one demon for each task. The context model would explain how the demons cooperate to implement the application and the logical model would be the demons’ exact orders.
To have demons model persistent data we give them the ability to create information boxes and bags. An information box is simply a container with a label so other demons can find it later.
Information bags hold information and can be placed into information boxes. For instance in an order processing system we may have a box for an order entry and bags for:
- Order details – used by the create package activity.
- Financial details – used by the billing activity.
- Location details – used by the delivery activity.
In the context model, the structure of the data in the information bags is unspecified because the purpose of the context model is to show that sharing exists rather than the details. The bag’s
data structure is defined in the logical model. In a sense the bag’s data structure is the result of a negotiation between the providers of the information bag and the consumers of the bag. While
boxes look rather like rows in a table, bags can have complex data structures so design techniques like normalization are needed to complete the logical database design. In the final design, boxes
and bags can best be thought of as views on the database or databases.
Note that because of complex mapping between boxes and bags and information structures, building the logical database model provides essential feedback on the correctness of the box bag model, and
consequently on the feasibility of the business processes. This puts logical database modeling into much more pivotal role in application design than is commonly done today.
A third issue with modeling as practiced today is the lack of analysis tools. One reason for the lack of analysis is that most models are single view models – for example, they define a system for
data or for process but not both. Analysis is most fruitful when you look at the interaction between elements. In a context model I can analyse the relationships between users, activities, data,
external parties and resources.
For a model to be susceptible to analysis it must have clear, unambiguous meaning. The box bag model comes with a set of (evolving) criteria for correctness. For example, in a process diagram, if
you note that an “order entry” process is followed by a “billing” process then a criterion for correctness is that the billing process must be executed once and once only for each order
There are many benefits to box bag modeling but I want to touch on just two. The first is that it aids the analysis of error recovery, in particular what happens to the data when something fails. A
box bag model has a fighting chance of answering this question because we have a high-level view of activity inputs and outputs. As an aside, defining recovery action is a great deal easier if only
one activity creates a bag and no other activity updates it thereafter. Undoing an activity is then only deleting the boxes and bags the activity created.
A second benefit is designing integration with existing systems. By building a box bag model for an existing application you can easily see what needs to change and, perhaps more importantly, what
can stay as it is without further analysis.
So to answer the two questions I posed at the start of this article. How do you start a database design? Build a context model. How do you know a database design is correct? You know it’s correct
if it supports the context model, the logical model and the criteria for correctness for these two models.