This is the sixth in a series of articles from Amit Bhagwat.
Abstract
Data modeling is no doubt one of the most important and challenging aspects of developing, maintaining, augmenting and integrating typical enterprise systems. More than 90% of functionality of
enterprise systems is centered round creating, manipulating and querying data. It therefore stands to reason that individuals managing enterprise projects should leverage on data modeling to
execute their projects successfully and deliver not only capable and cost effective but also maintainable and extendable systems. A project manager is involved in a variety of tasks including
estimation, planning, risk evaluation, resource management, monitoring & control, delivery management, etc. Virtually all of these activities are influenced by evolution of the data model and
may benefit by taking it as the primary reference. This series of articles by Amit Bhagwat will go through the links between data modeling and various aspects of project management. Having
explained the importance of Data model in estimation process, taken overview of various estimation approaches, presented illustrative example for them and considered importance of intermediate and
derived data and effect of Denormalization / normalization, we proceed to interpreting Generalization – the most important relationship presented by the object paradigm.
A Recap
In the first article[1] of this series, we established data-operation to be the principal function of most enterprise systems and inferred that data structure
associated with a system should prove an effective starting point for estimating its development.
In the next two articles[2] [3] we took a simple example to illustrate the function-based estimation approach and its
simplified derivation in data-based approach, highlighting the importance of considering only the data owned by the system and using the data-based approach only as pre-estimate / quick-check.
In the two recent articles[4] [5], we continued with the illustrative example and considered the effect of intermediate and
derived data, and that of denormalization / normalization on estimation. The conclusions were:
- Quantities that are important to business logic must be counted in estimation process, whether or not these quantities form a part of the final persistent data structure and whether or not they
are fundamental. - For estimation purposes, entities, attributes and relationships are considered in their logical sense.
- Process followed for data-based estimation assists in transaction discovery, leading to more complete & accurate function-based estimation and potential system re-scoping in good time.
- Denormalized data structure can give an inaccurate and significantly reduced UFP count.
- FPA is based on considering each separately definable concept and relationships between such concepts explicitly.
- As FPA is based on transactions likely to be performed on separately definable concepts, data-elements defined purely to serve as holders-of-relationships do not count as separate entities.
- As classical FPA was established in procedural programming paradigm, it requires refinement when applied in the OO paradigm.
Agenda
It is the last point of conclusion in the last article that we’ll begin developing here, with specific reference to the generalization relationship in context of relational data representation.
We’ll explore fundamental principles of the object paradigm and interpretation of generalization – an important object paradigm concept – in the relational paradigm.
The Object Paradigm
Some of the principles that the Object Paradigm is based on are:
- An object need exist if and only if it has a functional role in delivering overall objectives of the system. The lifetime of the object is dictated by its utility to the system.
- Object should own and encapsulate the data that it requires, to perform its role in the system.
- Persisting a set of data associated with an object is a system run-time consideration. An object and its associated data will be deemed to be in existence, so long as the object is required to
render its services for the system and so long as a data element associated with the object is of value to the object in rendering its services. - An object may inherit elements of its service behavior from one or more other object, directly or through one or more intermediate generations. Such data should be inherited, as is necessary to
support the inherited service behavior.
In simpler words:
The object paradigm denies data for its own sake and considers data as encapsulated within an object for rendering its functionality. It therefore fundamentally inverts the argument that systems
exist to manipulate data, and phrases the argument as: systems exist to serve and data may exist when incidental to the service. It may therefore be argued that the object paradigm questions the
validity of the principles on which FPA is based. Indeed, many object-based estimation techniques do not directly relate to data at all, and are therefore outside the scope of present series. Some
information about them may be obtained elsewhere; including is some of my writing referred in the first article of this series.
However, it can not be denied that in context of typical enterprise systems, data, particularly when including derived and intermediate data, does give us a reflection on what functionality is
being designed and developed, and therefore the amount of effort needed to develop this functionality. Data therefore continues to stand as a fairly reliable tool for estimating efforts involved in
developing function-intensive systems whose services may be equated to the data they produce / manipulate.
The object paradigm also opens multiple approaches to data interpretation and persistence. Thus the approaches available, among others (such as Hierarchical DB), include the OODB approach, the
Relational DB approach and the hybrid or ORDB approach. At the present moment, Relational DB seems to be the most popular approach, though DBMS vendors are increasingly providing object-mapping
utilities, rendering their databases under the hybrid category. Given that the FPA is based on functionality developed in the system and given that the data may be represented and stored in
different configurations for development of similar functionality, we need to be able to interpret the data in its different configurations represented through different database approaches. We
shall endeavor to do that in context of the case we have been following through the last four articles. In the present article, I’ll discuss relational representation of object data involving
generalization relationship and relate this to the estimates we have produced in previous articles.
Relational Representation of Object Data
Pure relational approach does not explicitly recognize generalization. It however implicitly interprets it as composition or containment; which is the same thing, looked at from opposite sides.
As we were developing our case, we began by writing down requirements of our subsystem, identifying data elements important to our subsystem and then representing them and the relationships between
them diagrammatically, as in fig. 1.
Here we represented a separate entity referred to as Past Borrowing, inherited from Borrowing. This has notional existence in our functionality as a separate entity, in that it has a different
multiplicity relationship with Borrower, as Compared to its Sibling object Current Borrowing. It also has an optional relationship with Fine which Current Borrowing does not have.
When we began considering the data-based approach and understood that we needed to consider the data owned by the functionality under consideration, we came up with fig. 2 to base our estimation
on.
In the last article, while considering the effect of denormalization on data-based estimation, we came up with a denormalized view of data owned by our subsystem, as in fig. 3.
after Fine denormalized into Past Borrowing[6]
Here we acknowledged that if we failed to relate a Past Borrowing to its parent / sibling entity, we would get a reduction of ~ 18% in the estimate. What happens if we acknowledge the ‘becomes’
relationship between Current and Past Borrowing and treat it equivalent to other relationships? We are considering here situation as represented by fig. 4.
acknowledging relationship between Current & Past Borrowing
We thus have:
E = 3, R = 2 & A = 4 + 4 + 5 =13
Therefore UFP = (1.42 x 13 x (1 + 2/3)) + (8.58 x 3) + (13.28 x 2)
= 30.77 + 25.74 + 26.56
= 83.07 ~ 83
On the face of it, this estimation is very much within acceptable range, just over 5% higher. However, this is still underestimation, as is characteristic of denormalized data. In practical terms,
we are applying a higher effort in transferring all data from the entity Current Borrowing into a newly created Past Borrowing (which may hold additional data related to Fine) and then deleting the
Current Borrowing. Even if we chose the two entities to be Borrowing and Past Borrowing, as in fig. 3 but with the relationship between the two acknowledged, we shall still come up with the same
effort estimation, which is still underestimation, as functionally we are doing everything as for the case in fig. 4, except deleting Current Borrowing (as here the entity is simply Borrowing).
If estimation is required for this sort of scenario where one Borrowing entity should disappear creating another Borrowing entity, I would consider using model as in fig. 2, but with the Borrowing
entity repeated to represent existence of a separate Current Borrowing entity, which must disappear as Past Borrowing is created. This would mean:
E=4, R=3, A=4+4+2+4=14
Therefore UFP = (1.42 x 14 x (1 + 3/4)) + (8.58 x 4) + (13.28 x 3)
= 34.79 + 34.32 + 39.84
= 108.95 ~ 109 which is ~40% higher than in the case that does not involve creation and deletion of an extra entity (Present Borrowing)
Interpretation
Implementing the data model in its normalized form, as in fig. 2, amounts to interpreting the generalization relationship as composition, which is to say that a Borrowing may exist on its own and
may or may not have a fine, but a Fine must have one Borrowing associated with it for its existence.
Implementing the data model as in fig. 3 but with the relationship between Borrowing and Past Borrowing acknowledged, will amount to interpreting the generalization relationship as containment or
extension. In this case, it may be better represented in a data-economical form where Past Borrowing simply takes as its primary key a foreign key from Borrowing, and ‘contains’ or ‘uses’
Borrowing while ‘extending’ Borrowing.
Implementing the data model as in fig. 4 amounts to interpreting the generalization relationship as transformation between two sibling objects, where the existence of an abstract parent is
established only by similarity of the sibling objects involved.
Conclusions
In this article we touched upon the essence of object paradigm and interpretation of generalization – the most important relationship in this paradigm – in context of Relational Database paradigm.
Conclusions are:
Object paradigm is functionality oriented and promotes data economy and encapsulation
Relational databases can interpret generalization through a variety of mechanisms including composition, containment or transformation.
What’s Next
We now have basic understanding of various approaches of interpreting the generalization relationship in Relational DB paradigm. This allows us to conduct FPA on relational data depicting
inheritance and / or sibling relationship.
In the next article we shall explore the possibility of estimating efforts when the data is represented in OODB form.
[1] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 1: Estimation – TDAN (Issue
26)
[2] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 2: Estimation Example – The
Function-based Approach – TDAN (Issue 27)
[3] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 3: Estimation Example – The Data-based
Approach – TDAN (Issue 28)
[4] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 4: Estimation – Considering Derived
& Intermediate Data – TDAN (Issue 30)
[5] Amit Bhagwat – Data Modeling & Enterprise Project Management, Part 5: Estimation – Considering Effect of
Denormalization / Normalization – TDAN (Issue 31)
[6] The square brackets used for certain attributes within the data-structures newly created in this article are for the purpose of drawing readers’
attention to the attributes that are repositioning themselves among the entities