Published in TDAN.com January 2002
[The following is an excerpt from the book titled Data Warehousing and E-Commerce by William J. Lewis, copyright 2001, Prentice Hall Publishers.]
Introduction
In many areas, e-commerce applications and conventional data management techniques converge. However, several areas of contradiction and controversy exist, where rather than converging, the paths
of e-commerce and data management apparently diverge.
The most significant areas of controversy and divergence have been labeled “the object-data divide”. This apparent schism results from the contrast between object-oriented standards widely
employed in e-commerce development, and longer-established standards grounded largely on the relational database model and its relatives and derivatives.
What is the appropriate role of the data resource relative to e-commerce functions and software? Is data a resource in its own right, or is it simply a byproduct, a result of functions and
software? What are truly the “best” practices for capturing requirements for data and its interfaces with functions and software, and implementing these requirements?
In this article, we’ll examine the nature of the object-data divergence, its effects on e-commerce application development in theory and practice, and some potential steps that can be taken toward
convergence of best practices.
To a great extent, standards for e-commerce software development have been driven by the distributed computing requirements of e-commerce applications on the World Wide Web.The Java language is
uniquely suited for such distributed applications because of its cross-platform capabilities.
Widespread adoption of object-oriented software development practices has resulted from the widespread adoption of the Java language. Distributed computing is also driving a resurgence in the
adoption of CORBA (Common Object Request Broker Architecture) standards for distributed object computing, the most significant instance of which is seen in the Java 2 Enterprise Edition, or J2EE.
The growth of object-oriented software development has also influenced standards for analysis and design techniques, evidenced in the creation and growing adoption of the Unified Modeling Language,
or UML.
The great majority of e-commerce solutions — and Web-based solutions, by definition — are distributed computing systems, making use of multiple layers of hardware and software. Many challenges
must be successfully addressed and solved to provide distributed computing systems that perform in a consistent, reliable manner, including:
- Transaction management
- Session management
- Inter-process communication
- Security
- Directory and naming services
Distributed computing environments require a reliable, integrated and complete set of these infrastructure services, implemented in a manner that is independent of hardware and operating-system
software. Distributed computing environments, in addition to their complexities of realization, also impose significant challenges on the development and deployment of software solutions.
Software developers should ideally be able to focus on providing solutions for the business problems and opportunities within any development effort and be isolated as much as possible from the
hardware and software-service intricacies attendant to a multi-tiered execution platform. Providing this isolation is one of the primary goals of e-commerce, or “enterprise” software platform
standards – primarily J2EE and Distributed Component Object Model (DCOM).
The Sun J2EE and Microsoft DCOM standards are both firmly grounded in object-oriented programming techniques. And since J2EE is quite representative of current e-commerce software development as a
whole, and J2EE and DCOM are very similar at a high level, the remainder of this overview focuses on J2EE and the Java language.
Java technology has been developed with the goals of scalability, reliability and platform independence for the realization environment. Java technology also provides software developers with a
level of isolation from hardware and operating system platforms.
Java was developed to conform to the widely-accepted characteristics of an object-oriented language. In general, an object-oriented language is one that supports encapsulation, polymorphism,
inheritance and instantiation. A fundamental goal of languages and development environments that implement these concepts is to facilitate the expedient development and deployment of reusable
software modules.
These are the important concepts underlying today’s emerging e-commerce software architecture standards. So, one may ask, where is the data?
The Software-Data Interface: Entity Beans and Stored Procedures
An Enterprise Java Bean, or EJB, is a server-side, distributable, transactional software component conforming to a specified interface format and set of functions. An Entity Bean is a specialized
type of EJB that, within the Java world, generally correlates to a relational DBMS table and its access functions. Entity-bean EJBs are the standard Java representation for the software-data
interface.
The code of an entity bean can contain variables that correspond to the columns of a relational DBMS table, and functions (“methods”) that implement the various types of read and write accesses
required for the table. These functions can utilize Java Database Connectivity (JDBC) calls, SQL for Java (SQLJ) calls, or both.
JDBC is a Java derivation of the earlier Microsoft Open Database Connectivity (ODBC) specification, and specializes in supporting DBMS-independent, dynamic (e.g., ad hoc) data access.
SQLJ is a relatively newer proposed standard, syntactically much closer to conventional SQL, and specialized for static data access, where the access request can be compiled and stored in the
database, enabling potentially faster performance.
Stored procedure capabilities of earlier DBMS versions required that they be coded in proprietary and/or platform-dependent languages. DBMS vendors recognized Java as a means to provide additional
platform-independence for their customers. Vendors have begun embedding a Java Virtual Machine (JVM) with new releases of their products, enabling Java modules to be developed and executed under
control of the database.
This, in turn, enables the development of Java stored procedures, compiled and optimized prior to runtime, that can significantly enhance data access performance and also are portable across the
various tiers of a distributed e-commerce application.
The Conceptualization Perspective: The UML
Now that we’ve looked at some important current developments in e-commerce software standards, let’s take an “architectural” step back into project scoping and analysis – and the
conceptualization perspective. Why retreat through the software lifecycle, from the solution perspective to the conceptualization perspective? Because this direction follows that of current
object-oriented (OO) standards development activity.
The predominant current standards for the solution and realization perspectives of e-commerce solutions are direct descendants of object-oriented concepts and languages. Given this progression of
events, it’s not surprising that object-oriented concepts have been extrapolated backward in the software lifecycle into the conceptualization perspective. The currently predominant
object-oriented method for conceptualizing and documenting software requirements is the Unified Modeling Language, or UML.
The independence of data and function, one of the fundamental principles of data-oriented, relational thinking, is antithetical to object orientation. From an OO viewpoint, there is no concept of
data existence, independent of function, within any IT architectural perspective from conceptualization through realization.
On the contrary, the existence of data (variables) is dependent on the functions (methods) that operate on that data, and are grouped together with it within one or more classes. The notion of a
class is the fundamental building block of object orientation. Data is a “persistent class.”
Persistence forms the infrastructure of a business organization’s information assets. Persistent data exists independent from processing. All information-processing resources can be constructed in
relation to the superstructure – because it is persistent, i.e., long-lasting.
Whereas one of the primary objectives of object orientation is reuse of software, data is, without a doubt, one of the most highly-reusable assets of a business organization. Businesses ignore the
benefits of enhanced data reusability at their peril.
Many other established analysis and development approaches do indeed consider the data resource of a business independent of the functions to which it interfaces. The most widely-known of these
include the relational model, the Entity-Relationship Approach, and the Information Engineering methodology.
The Function-Data Interface: Where’s the CRUD?
A number of side effects result from marginalizing the independent existence of the data resource, mostly to the detriment of precision of specifications. These include ambiguity in specifications
for data resources themselves, and ambiguity in defining interfaces among data and between data and functions.
Because there is no recognition of data independent from function, there is consequently no recognition of a data-function interface. An interface is a boundary between two independent things, and
if data and function are not independent, the notion of a boundary between them has no meaning.
But, as Ross Perot used to say, listen here: the most serious consequence of the marginalization of data resources is the loss of data reusability. “Data-oriented” analysis and design methods –
currently out of fashion – stress the creation of a common, independent data resource, reusable by any and all sanctioned functionality. Data management methodologies are predicated on the
optimization of data independence: “Here’s the data – go ahead and use it as you see fit.”
In conventional OO approaches, including the original UML, any part of the data resource of business, whether a logical entity or a relational table, is “just” a persistent class, i.e., a set of
objects that for one reason or another happens to need to hang around for a while. An even more dogmatic OO position that has sometimes been taken is that persistent data is nothing but “legacy”
– an artifact left over from an earlier historical epoch.
Conventional OO, in subsuming data within function in pursuit of its prime objective, software reuse, actually tends to promote a disparate data resource, fragmented among multiple
function-oriented classes.
Why would any organization deliberately avoid opportunities to maximize the usability of one of its most valuable resources? Real-world experience teaches otherwise. OO and UML are evolving and
adapting. They must do so, because e-commerce systems, like the vast majority of other business systems, are accountable for treating the corporate data asset in a responsible manner.
For any data entity, there are just four basic “methods,” or data-function interfaces: create, retrieve, update and delete (CRUD). New terminology may have been concocted – “instantiate,”
“consume,” “emit,” “destroy,” etc. – for whatever reason, but in essence there is nothing any more complex than these.
Any and all other functions that interface with the data resource do so through one or more of these. Venerable modeling tools, such as data-flow diagrams and CRUD matrices, have been successfully
utilized within venerable methodologies such as IE and others. These tools serve to specifically elicit and document requirements for these data interfaces, and as deliverables, can be transformed
into data access specifications in design.
Agreement across data and object specialists that there are “entity classes” or “persistent classes” that correlate exactly one-to-one with data model entities and normalized relations, and are
identified and designated by the use of the same methodologies, will go far to get things moving. Precise specification of the data-function interface within the UML can then begin with
accommodation of concepts such as these, which have been proven successful over many years of application and database development.
The Data-Data Interface, or Data Rules
Business rules can be transformed directly into data integrity constraints. Data integrity constraints are data-data interfaces that limit the relationships allowed, either concurrently or
successively, between various parts of the data resource (e.g., values, domains, attributes, relations).
Types of data integrity constraints include referential integrity constraints, domain integrity constraints and entity integrity constraints. Object Constraint Language, or OCL, has been developed
as a UML extension for specifying constraints. Not surprisingly, due to its object-oriented lineage, the OCL specifies constraints within the province of function and software, rather than data.
Constraints, like data, are persistent by definition – they endure over time. Constraints also should be enforced consistently across, and be shareable by, all functions operating on data within
their scope. It follows then that in whatever perspective they may appear, they should be related directly to the data that they regulate. Database stored procedures, for example, meet these
criteria, and are highly appropriate mechanisms for specifying and implementing data constraints.
Much thought is currently being devoted to the topic of rules and constraints. In addition to work on OCL by OO proponents, database and meta-data specialists such as C.J. Date and Ron Ross
continue to speak and write a great deal on the subject. (Date has gone so far as to propose that “rules are data”.)
If these efforts were to begin converging toward a single standard rather than to continue being approached from diverse positions, the results would be of much more benefit to information
technology practitioners in general.
Resistance is Futile
In the world of e-commerce application development, resisting UML or Java is likely to be as successful as stepping in front of a moving bus. Given the rapid proliferation of object-oriented
approaches, and the shortcomings of conventional OO in fully addressing the data resource and its interfaces, is there light at the end of the tunnel or not?
What the Unified Modeling Language originally unified were three approaches that were very similar in the first place, all being object-oriented and software-focused. A more ambitious and
participatory unification of best practices of multiple modeling approaches could result in a comprehensive framework for IT architecture.
UML, because of its wide marketing, visibility, acceptance and utilization, could indeed evolve into the foundation for such a comprehensive framework. Data architecture methods and deliverables
are currently active areas for enhancement in the UML. “Persistent” business data has been successfully analyzed, designed and deployed on computers for several decades.
Much progress has been made during that time on techniques to help assure that computer-stored data effectively meets the requirements of those who create and use the data. There is a significant
body of accumulated and successfully-tested knowledge and experience within the database field. Entity-relationship modeling works. Declarative business-rule modeling works. And above all, the
relational model works.
What is not needed at this time is yet another way to skin the cat. What is needed is accommodation within both the “object camp” and the “relational camp” of the existing successful techniques
of each.
There is evidence that, faced with the persistent (pun intended) challenges of effectively addressing “persistence” within a conventional object paradigm, the OO ship is indeed slowly turning.
Data is infiltrating object-orientation. Can the “object-data divide” be crossed? Absolutely – through acknowledgement, acceptance and cross-training between the “everything-is-an-object” camp
and the “data-is-all-that-matters” camp.