Data-Driven Molecular Specifications, Part 1

Many readers may be too young to remember the TV adventures of Rocky and Bullwinkle, and the little mustachioed janitor sweeping up after the parade behind the closing credits. Perhaps Mike Rowe’s Dirty Jobs might be a more current metaphor. In any case, data management professionals often seem relegated to a similar cleanup role in the current business IT world: encouraged to do design – but only for data, and only if it is “persisted”; reverse-engineering data models and data dictionaries after the fact; attempting to chase down data lineage; and so on.

Business analysts and software architects and have largely retained the privilege of specifying and designing the actual assignment of values to variables, and data management professionals have had little alternative but to abide by this separation of responsibility – many being quite content with the status quo.

A consequence of this state of affairs is that, in contrast to more mature data management disciplines, functional specifications and their software implementations continue to be predicated on unquestioned assumptions. Data-manipulation operations remain the center of attention, relegating data to a role of nothing more than a by-product. Both specifications and programs are assumed to be expressed with sufficient precision only through the medium of narrative text – known in other quarters as unstructured data. While there have been sporadic attempts at other approaches – structured programming, executable UML, business rules – mainstream software development practice continues to be a practice of writing specs, followed by writing code – not much different at all from writing novels and articles.

The goal of Data-Driven Application Engineering (DDAE) is to provide a practical and beneficial alternative to narrative application composition by enabling the construction of complete, modular applications from metadata, rather than the reverse. This can be achieved by describing data as specifically as possible, and at the same time expressing functions in the most abstract manner possible.

The building blocks of DDAE are simple and few: Types, Variables and Operations. For expressing Types and Variables, DDAE draws directly from Chris Date’s and Hugh Darwen’s The Third Manifesto – and TDAN readers are encouraged to pursue details on these subjects at In this article, we’ll specifically dig deeper into DDAE’s treatment of operations.

Operations, Conditions and Gates (not Bill)An operation is the fundamental unit of both process specifications and executable modules. An operation of any kind exists for no reason other than to assign a value to a variable. The variable undergoing this assignment is known as a dependent variable because its value depends on the operation. An operation also includes one or two independent variables, and of course an operator. For example in the operation:

A + B = C

A and B are independent variables, C is the dependent variable, and plus is the operator. (Some operators, such as MAX, MIN, SUM and AVG, require only one independent variable. The second is effectively NULL.)

That’s the easy part; not too much controversy there. The alert reader may question at this point how conditions (i.e., constraints, business rules) fit into this scheme. Because conditions result in the assignment of a value to variable, like any other kind of Operation, DDAE treats conditions as Operations.

The operator of a condition is always a predicate, such as EQ, GT, LT, GE or LE, that specifies how its two independent variables should be compared. A condition’s dependent variable is always of type Boolean (off/on, 0/1, yes/no, true/false).

In conventional narrative programming languages, the Boolean dependent variable of a condition is almost never explicitly named.1 In contrast, DDAE requires that this variable be explicitly declared (i.e., named). This allows it to be subsequently referenced in a consistent manner by other operations, including Gates—another type of Operation.

A Gate2 also has a dependent variable of type Boolean. It has two mandatory independent variables, both of which are of type Boolean, and an operator which is either additive (OR) or multiplicative (AND).

Tinker ToysSo what are the benefits of this exercise in abstraction, requiring the explicit declaration of these “extra” dependent variables for conditions and gates? Declaring them allows all operations, conditions and gates (shown in the columns below) to be specified using exactly the same simple pattern. The pattern for all includes exactly four variables that are connected by an operator (shown in the rows below):

This pattern enables a precise, consistent, unambiguous framework for expressing business requirements at is at the same time detailed and scalable.

It should be straightforward to build code generators to output conventional programs – and even compliers to output binary executables – to consume input captured in this simple, consistent pattern. Applications built by combining modules conforming to this standard would exhibit a very high degree of modularity, interconnectivity, reusability, traceability, predictability, consistency and transparency in both specification and implementation.

This modular pattern is easy to visualize, and visualizations of this type have been used for many years in other disciplines. The image below (courtesy shows a 3D model of tetrahedral molecular geometry.3 The labeled arrows point to parts of the model analogous to parts of the operation pattern we’ve just described.


ExamplesWe can refer to the earlier TDAN article “Data Lineage: The Next Generation” for an example. The article exhibits show that the value of ORDER_LINE_NET_AMOUNT is dependent on PRODUCT and ORDER_LINE. If we drill down to the data element level of detail in our specification, this operation becomes evident:

Think back to the molecular model in the above picture, and visualize the boxes in the figure below as atoms, and the arrows as molecular bonds. Each arrow represents a variable which has “roles” of dependent variable and independent variable. The condition Boolean is shown in yellow.

In the diagram below, another condition is added which compares ORDER_LINE_UNITS_COUNT to PRODUCT_UNITS_ON_HAND_COUNT. This and the previous condition are connected to the operation with an AND gate. The AND gate is shown in green, and its BOOLEAN dependent variable is explicitly named (imaginatively) as PRECONDITIONS_MET.

This article has described how rigorous data definition and operation abstraction can increase the quality and precision of requirements specifications. Future articles on Data-Driven Application Engineering will describe a meta-model showing how this pattern fits into DDAE overall, a user interface for specifying application requirements based on this pattern, and the process of transforming these requirements models into executables.


  1. There are some notable exceptions. In the Perl language, a program can be written to emulate a single condition, and its Boolean result can then be referenced by name (specifically the name of the program itself) in other programs. Also, in COBOL, conditions can be named (also known as “88-level data items), e.g., PERFORM UNTIL THE-COWS-COME-HOME.
  2. The label is taken from the concept of logic gate.
  3. In a tetrahedral molecular geometry an atom [i.e. operation] is located at the center with four substituents [i.e., variables] that are located at the corners of a tetrahedron. (apologies to Wikipedia)


submit to reddit