This is the seventh in a series of articles from Amit Bhagwat.
Abstract
Data modeling is no doubt one of the most important and challenging aspects of developing, maintaining, augmenting and integrating typical enterprise systems. More than 90% of functionality of
enterprise systems is centered round creating, manipulating and querying data. It therefore stands to reason that individuals managing enterprise projects should leverage on data modeling to
execute their projects successfully and deliver not only capable and cost effective but also maintainable and extendable systems. A project manager is involved in a variety of tasks including
estimation, planning, risk evaluation, resource management, monitoring & control, delivery management, etc. Virtually all of these activities are influenced by evolution of the data model and
may benefit by taking it as the primary reference. This series of articles by Amit Bhagwat will go through the links between data modeling and various aspects of project management. Having
explained the importance of Data model in estimation process, taken overview of various estimation approaches, presented illustrative example for them, considered importance of intermediate and
derived data and effect of Denormalization / normalization, and interpreted generalization relationship from Relational DB perspective, the series proceeds to looking at the data with an OODB
approach.
A Recap
In the first article[1] of this series, we established data-operation to be the principal function of most enterprise systems and inferred that data structure
associated with a system should prove an effective starting point for estimating its development.
In the next two articles[2] [3] we took a simple example to illustrate the function-based estimation approach and its
simplified derivation in the data-based approach, highlighting the importance of considering only the data owned by the system and using the data-based approach only as pre-estimate / quick-check.
We continued with this example to illustrate the effect of intermediate and derived data[4], and that of denormalization / normalization[5] on estimation.
In the last article[6] we endeavored to interpret the inheritance relationship, which is at the heart of the OO paradigm, from a relational perspective. The
conclusions were:
1. Object paradigm is functionality oriented and promotes data economy and encapsulation
2. Relational databases can interpret generalization through a variety of mechanisms including composition, containment and transformation.
Agenda
With progress of the object paradigm over the last decade[7], serious thought has been given to persisting objects transparently, i.e. without the developer
needing to interpret persistent data in relational / hierarchical terms. Given the proven ability of relational theory and the inertia of industrial data held with RDBMS and earlier file-based
systems, the transition to OODB has been slow. Only in the last few years have we seen popular RDBMS vendors accommodating object concepts and making their DBMS systems ‘Object Relational’.
Thankfully, OODBs capable of taking typical industrial data-load are now available and it is only a matter of time before persistent data starts replicating functional objects. It is therefore the
right time to discuss OODB concepts and their implications on project efforts and therefore on the estimation. We shall begin to do this here, using the example we have been working on so far, for
illustrative purpose. In the present article I shall outline some OODB concepts and shall take them through a full-scale estimation exercise in the next article.
Before we proceed, it will be useful to have for our ready reference a view of important data elements in our illustrative example.
Data of an object
An object is essentially a run-time concept whose sole purpose of existence is to deliver certain functionality in context of overall objectives of the system. To achieve its objectives, the object
may have data associated with it. This data falls under two categories:
-
Intrinsic properties: These properties are owned by the object and indicate the state of the object. Well designed objects have the ability to change these properties in
response to stimuli from their environment and thus in turn change their behavior in accordance with these properties (i.e. in accordance with the state of the object). These properties may or
may not be persistent. Intrinsic properties are of value in understanding and manipulating state of the object. -
Extrinsic properties: These properties are used in referencing the object and are usually not changed by the object. They tend to be persistent and give identity to the
object. The identity may be in the form of reference to a link[8] (indicated by reference to instances of classes at the association ends, if the object
depicts instance of an association class) or to another object (if the object depicts instance of a subordinate class) or a unique reference among its sibling objects (analogous to primary key in
relational world). Extrinsic properties are important in accessing and destroying the object, and come into being in the process of object creation.
Intrinsic Properties and Constraints
Now consider Borrowing and its children classes in our example. Looking at these from the perspective of relational implementation, we had some debate about whether or not to consider Borrowing as
an abstract concept and therefore consider Present and Past Borrowing as separate entities in a denormalized form. We also pondered over what to make of the ‘mutually exclusive’ relationship
between Present and Past Borrowing. Whereas we chose to use the normalized entity structure (and hence the entity Borrowing attached to entity Fine) for the purpose of estimation, we might well
have chosen the denormalized form in implementation, from a performance perspective.
Look at the same set of entities from an OO perspective and many results become far more intuitive.
The first point to resolve would be whether to inherit concrete Present Borrowing and Past Borrowing classes from the Borrowing class, leaving the Borrowing class itself abstract, or whether to
instantiate the Borrowing class directly. The answer is straightforward enough. As a Borrowing, Past or Present, gets its identity in the context of it being a Borrowing, Borrowing itself becomes
the instantiated class.
We next observe that if a Borrowing is a Current Borrowing, it locks the unique Borrowable Item and counts towards Borrowed Items of the Borrower. However, it essentially does not compute Fine in
its Borrowed state.
In other words, a Borrowing is associated with intrinsic property IsBorrowed (meaning, of course, is currently borrowed). This property imposes the following constraints:
You may be tempted to go ahead and create another object-set – Borrower’sBorrowedItems and lock a member of that with the Borrowing when, for that Borrowing, IsBorrowed = true. However, this may
be overkill. Remember, n, the maximum number of items borrowable by a Borrower, or potentially, the maximum number of items borrowable by a particularly type of Borrower, is an independent system
setting, and is therefore not a useful source for a separate subordinate objects set to Borrower, particularly when Borrowing itself is in effect doing exactly the same thing.
It may be desirable to pool Borrowing objects with IsBorrowed = true for efficiency of access. There will be a finite pool of such objects where the maximum number of objects in such a pool will be
less than or equal to (this extreme condition is reached when all Borrowable Items are borrowed) the total count of Borrowable Items. This pooling behavior may be set at design time by putting an
additional constraint on Borrowing:
rather than by creating two separate entities Current and Past Borrowing, and trying to pluck every returned Borrowing from the Current Borrowing table and dump it into the Past Borrowing table, as
from the viewpoint of RDB efficiency enthusiasts.
To conclude, assigning a single intrinsic property IsBorrowed and associating object constraints with this property, takes care of:
Of course if it is decided to pool all currently borrowed Borrowings, the IsBorrowed property itself may be defined as a transient derived property associated with pooled Borrowing objects, when
their other property, Returned Date, is not set.
Hence the cascade of constraints for a Borrowing can become:
If Returned Date is not set then:
When Returned (i.e. Returned Date is set):
Also note here that when need be, the execution of Constraints, set against change in the state of a Borrowing, hands control over to other objects such as Fine / TotalFine, and these in turn react
to the stimulus by changing their state (or constructing themselves)
Function-based Approach with a Difference
Let’s go back to the second article in this series. In this article we introduced function-based analysis technique of obtaining unadjusted function points (UFP), a quantity, that with appropriate
adjustment for complexity of the implementation, can lead to the function point index (FPI) – a value corresponding to effort estimate associated with the project creating that functionality. Here
we first enumerated the so called ‘transactions’ that the system was to perform. We then found the outputs, inputs and entities associated with the transactions and then churned them through a
formula which recognized that validating and accepting an input was over twice as effort-intensive to develop as producing an output. Likewise, manipulating an entity was about thrice as effort
intensive to develop as validating and accepting an input. We later briefly delved upon a short-cut function-based approach and the much faster so-called data-based approach, which was an
assumption-riddled abridged version of the function-based approach.
In the function-based approach that exists at the root of these approaches, we considered the level of data operations involved for each and every transaction of the system, as we could perceive.
We then assumed that except for a degree of commonality and reuse, as made possible by a modular code, we shall need to consider data-manipulation associated with each transaction as separate
development effort in its own right and context.
The OO paradigm too is essentially function-based; however, here individual functional units hold data that they need to work with to deliver their services. This does not mean data redundancy. It
simply refers to collaboration between contributing objects, a clear demarcation of functionality across them and thus clear responsibility of individual objects towards a particular set of data
that their functionality associates with. A good OO design also involves clear dependency structure across objects and follows design-by-contract. This means that a ‘transaction’ no longer refers
to accessing and potentially modifying the data that it works with, with the prospect that another transaction may also be doing much the same thing. Instead, such a transaction is distributed
across objects that are functionally responsible for working on relevant data. The set of transactions therefore does not involve as much development effort as would be associated with development
of individual modules manipulating the same data in the same way under different transaction contexts. In short, by delegating work and associated data-ownership to individual objects, it is
possible to reduce the total amount of development effort. I shall delve deeper into this in context of our example, in the next article.
Conclusions
In this article we touched upon fundamental classification of data associated with objects, the way objects encapsulate data necessary for discharging their responsibilities, the way constraints
may be applied to objects to ensure desired behavior and the way objects can collaborate to avoid repetition of development efforts producing similar functionality. Some important conclusions were:
the state of the object.
their destruction.
What’s Next
In the next article I shall go through a functional distribution and estimation exercise in context of OODB.
[1] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 1: Estimation – TDAN (Issue
26)
[2] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 2: Estimation Example – The
Function-based Approach – TDAN (Issue 27)
[3] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 3: Estimation Example – The Data-based
Approach – TDAN (Issue 28)
[4] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 4: Estimation – Considering Derived
& Intermediate Data – TDAN (Issue 30)
[5] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 5: Estimation – Considering Effect of
Denormalization / Normalization – TDAN (Issue 31)
[6] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 6: Estimation – Interpreting
Generalization with Relational Approach – TDAN (Issue 32)
[7] By the way, the first recognized OO language – Simula – dates back to 1967, though OO has attained the status of de facto programming standard, largely
due to its excellent maintainability characterisitcs, over the last decade or so.
[8] Here the word link is used in the OO sense to mean an instance of association