This article contains material from the book Principles of Data-Oriented Application Engineering, currently in progress.
What is Data-Oriented Application Engineering?
Surveys of software development organizations relentlessly deliver the message that an acceptable project success rate still eludes us. Model-driven methods such as the Object Management Group’s
Model Driven Architecture (MDA) promise improvements in this state of affairs, but have yet to achieve widespread adoption, and could fall prey to the same lack of impact experienced by CASE back
in the preceding century.
An approach that could yield greater success would be not only model-driven, but data-model-driven. Applying a data-oriented approach to application development projects has many advantages as an
alternative to other contemporary application development methods. The term Application Engineering acknowledges this as well as earlier attempts to bring a level of discipline and rigor to the
practice of transforming business requirements into computer applications.
A data-oriented approach would utilize a combination of current and potential techniques and tools. It would deliberately seek opportunities to take advantage of the durable and self-organizing
properties of data, and to extend these properties throughout the application development process. Potential benefits would include accelerated analysis and development, more straightforward
integration, higher maintainability and lower cost after deployment-all of which translate to quicker and greater return on investment.
Why Data Orientation: The Business View
Data is not a by-product of business computer applications–or as they are better known, enterprise-class applications. On the contrary, enterprise-class applications are by-products of the data
they manage. Furthermore, enterprise-class applications do not actually automate business processes–they only automate the processing of data. (Our profession used to be called “Data
Processing”. Things aren’t that different today.) The fundamental nature of enterprise-class applications is significantly different from video games, robotics and personal productivity software;
they’re much more like digitized filing cabinets.
In reality, the only business requirements that absolutely, unconditionally must be satisfied by a business application are data requirements. And the self-organizing properties of a set of
data–functional dependencies and other specific types of fixed and conditional constraints–determine most, if not all, of the business rules required in an application using that data. So it
follows that given a truly comprehensive, tool-supported data modeling technique, enterprise-class applications could conceivably be built based exclusively on an extended data model.
It is widely recognized that business processes and technology change much more rapidly than data does. Very similar data is used by the business before and after deployment of an application, but
very often by the time an application goes to production, the business processes it emulates have changed significantly, and the technology on which it is based is a generation behind. The
responsiveness of a business to its changing marketplace is significantly enhanced if its processes are embedded to the minimum extent possible in application software and technical platforms.
Data, in contrast to business processes, does not need to be translated by design or architecture into some different digital language in order to be operated on by a computer. A focus on data
rather than software also enables more direct transition and traceability from requirements through deployment.
Why Data Orientation: The Technology View
With the growing adoption of service-oriented architectures, a data-oriented approach to application engineering is an idea whose time has returned. Data drives the services that drive SOA. And the
potential of data orientation is not limited to service-oriented architectures. A data-oriented approach is arguably the most important reason for the widespread success of data warehousing.
Partitioning and delineating the boundaries of the “functional” aspects of an enterprise-class application–components, services, and other types of executables-can be a problematic and to a
significant extent arbitrary endeavor. In contrast, defining the boundaries of data is much more precise, since we have the tools of functional dependency and constraints at our disposal. Basing
the boundaries of the computer-based equivalents of business processes on the data they process removes much of this ambiguity. Ambiguity decreases quality and productivity; precision increases
Extending Data Models
Fully data-oriented application engineering will require a more comprehensive data modeling vernacular than what is currently available.
A data model is not just a means of arriving at a database design-it is the comprehensive specification of the data requirements of a problem domain. When developing a business application, a data
model is not just one of many artifacts produced-it is the primary artifact, to which any and all other artifacts should be directly related. Data management professionals realize this intuitively,
but at this point in time the vast majority of application developers and software architects would probably disagree.
It could be argued that the proliferation of model types outside of data models has come about because of the limitations of data modeling tools and practice. If most or all of the specifications
essential for generating an application can be stated in a single model type, or even a small number of highly integrated model types, less time will be spent on disparate and transitory
documentation activities. This objective, incidentally, is consistent with the Agile Manifesto value statement of “Working software over comprehensive documentation”. It is an example of how a
data-oriented approach can help to attain development agility without compromising discipline and quality.
Much conventional “wisdom” has accumulated regarding the nature of data models, and this has seriously impeded their advancement. Significant but judicious extensions would be required to enable
data models as the foundation of data-driven applications. Developing a viable data-oriented methodology will require substantial advancement and inter-connecting of these and other areas of
- Removing the arbitrary and artificial distinctions between data and (meta)data.
- Removing boundaries between form (syntax) and content (semantics).
- Leveraging late-bound declarative specifications (see Johnston reference).
- De facto standardization of a metamodel for derived-data specifications, perhaps using the CWM expressions metamodel, reverse Polish notation and/or other such techniques.
- Expressing business rules as direct data model extensions. According to Dave Hay, a (conventional) data model can show two and a half of the four kinds of business rules. Barbara von Halle’s
term for a data model extended with rules is “rule-enriched logical data model.”
- Solutions for expressing complex constraints, primarily conditional constraints such as “state transitions” and inter-element constraints.
Ontology research may also have concepts and solutions to offer to help extend the boundaries of the current data modeling idiom.
Constructing a Data-Driven Application
Let’s consider how a data-driven application would be developed, utilizing data-model-driven analysis, design and generation tools.
Any application development effort is an investment in enhancing the value of a subset of an enterprise’s total data resource. So it follows that a data architect would function as the technical
project lead on a data-oriented application engineering effort.
To obtain early and active stakeholder participation, a data-oriented project could be initiated by doing enough data modeling to allow model-based tools to generate initial screens. Construction
would consist of generating concrete artifacts from abstract data models, evaluating of the artifacts by users, changing the models in response to this feedback, and circling back to step one. The
extended data model and corresponding application would grow through iterative refinement and extension.
In one way or another, all enterprise-class application design is data design: determining where the data comes from, where it goes to, and what happens to it on the way. In a data-oriented
approach, data provisioning accounts for how and where the data within scope will be sourced. Data logistics defines where it goes afterward.
Specifications for systems configuration–execution scaffolding–should be available in the form of a business-domain-neutral, reusable operational platform model. This type of model can be
expressed, for example, using IBM’s Architecture Description Standard. Merging this model with the data-driven application model would create the equivalent of the MDA’s Platform Specific Model,
or PSM (see Figure 1 below).
Regardless of whether the data being processed is in memory, on disk or wire, local or remote, it should be specified in a similar manner in the extended data model. The data provisioned from a
local database today could be sourced through an integration hub tomorrow. The operational platform model provides the current physical details at any point in time.
Much of the effort in designing a contemporary enterprise-class application goes into dividing it into manageable chunks and distributing the results. Layers or tiers result from horizontal
partitioning, components result from vertical partitioning. (Think of cellular mitosis-cells dividing to form more complex organisms.) Creating any such partition in application software increases
the complexity at that point by a factor of five. One component becomes two, with two interfaces and some sort of transport between, even if both happen to be concurrently memory-resident. The more
interfaces, the more data element instances that need to be semantically reconciled. The more a given component is reused, the more semantic mappings are required.
The optimal component partitioning and distribution at any point in time could very possibly be derived by computer, based on the extended data model combined with the operational platform model.
Executable units and their interfaces could be generated dynamically and precisely. In this way, when the technology and/or topology of the operational platform model changes, components could be
re-partitioned and re-generated without causing mapping errors or affecting the extended data model.
Production deployment of a data-driven application could be accomplished more by incremental assimilation into the business than by a big bang implementation. Deployment units would be packaged and
rolled out when an acceptable value threshold is reached. “Use cases” would then happen as users interact with data during assimilation. As the application is extended and enhanced, new and
modified deployment units would be generated and rolled out.
- Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools, Greenfield, Short, Cook, Kent. Wiley 2004
- IBM Systems Journal Volume 45 Number 3, 2006, “Model-Driven Software Development”
- Code Generation in Action, Herrington. Manning Publications 2003
- Executable UML, Mellor and Balcer. Addison-Wesley Professional 2002
- The Agile Manifesto, www.agilemanifesto.org
- “Modeling Business Rules: What Data Models Do” David C. Hay, TDAN January 2004
- Business Rules Applied, Barbara von Halle, Wiley 2002