This is the fourth in a series of articles from Amit Bhagwat.
Data modeling is no doubt one of the most important and challenging aspects of developing, maintaining, augmenting and integrating typical enterprise systems. More than 90% of functionality of enterprise systems is centered around creating, manipulating and querying data. It therefore stands to reason that individuals managing enterprise projects should leverage data modeling to execute their projects successfully and deliver not only capable and cost effective but also maintainable and extendable systems. A project manager is involved in a variety of tasks including estimation, planning, risk evaluation, resource management, monitoring – control, delivery management, etc. Virtually all of these activities are influenced by evolution of the data model and may benefit by taking it as the primary reference. This series of articles by Amit Bhagwat has gone through the associations between data modeling and various aspects of project management. Having explained the importance of the data model in the estimation process, taken an overview of various estimation approaches and having presented illustrative examples for them, this article addresses potential confusion arising out of derived and intermediate data.
In the first article of this series we established the central function of most enterprise projects as data-operation and concluded that data structures associated with a system would prove an effective starting point for estimation process. We also discussed the temporal and accuracy implications of analysis-time – design-time estimates and briefly considered interpreting estimates for specific funding style.
In the next article we took a simple case to illustrate the function-based estimation approach. We presented the case; itemized functions involved and then atomized them into transactions. We next analyzed how each transaction manipulated entities, followed by converting the transaction data to Unadjusted Function Point count (UFP) that can stand as basis for various estimation and resourcing calculations. We also briefly discussed a shortcut function-based technique.
In the last article we continued with the case considered in the previous article and illustrated the quicker but less accurate data-based approach. There were a few important points noted here:
Having illustrated the two approaches to FPA, I hinted in the last article that we will next cover the impact of the following on estimation:
To keep the reader focused and to allow assimilation of many concepts that these topics present, it may be prudent to confine ourselves to discussion on the first point, i.e. Intermediate –
Derived data, in this article.
We’ll continue to use the example of book-lending facility at a public library that we have developed over the previous two articles. Our discussion will also touch upon the value of logical
models to estimation and the value of data-based approach to detailed function-based analysis.
Before we begin, it will be useful to have to our ready reference a view of important data elements – the entities owned by our subsystem. These are provided in Figures 1 – 2 below.
Intermediate – Derived Data
In our subsystem, let’s consider a requirement that if Total Fine Amount is less than a system setting value X (which is set by another subsystem), then the fine charged is zero, else the
fine charged is the nearest round-figure in dollars (which also means that X automatically has a minimum meaningful value of half a dollar). What impact does this additional requirement have on the
First, consider what changes there are to the data structure. We have a new system setting X, which may not be completely owned by our subsystem, but which is none-the-less read in the context of
our subsystem’s functionality.
Then we have two Total Fine quantities:
We know that Fcalc is the total of all fines for a particular Borrower ID for a particular Return on a Return Date (I know what some of you are thinking. If you have exceeded the return
date by a small margin, you can potentially save paying any fine at all, simply by returning the items separately throughout the day. I personally will be happy to waive off your fine so long as
the delay is slight and hasn’t inconvenienced other library users, though I bet I have set some librarians thinking!) We therefore can, from a data-economy viewpoint, do without separately
storing Fcalc in the database (we’ll need to have an ID locking process to define a Return which could be replicated from timestamps on Returned Date
(which of course means Returned Date is actually Returned Date-time)). On the other hand, storing it achieves a significant amount of process economy if this data is going to be referred to
Fact is likewise redundant data in that it can be calculated easily knowing X and Fcalc.
So which will you store, say in the entity Total Fine in our example?
I am not sure that there is any right or wrong in any of these answers, though I would have considered storing Fact, as this is the Total Fine Amount that is
of the greatest interest to the business. Those who follow this approach will have one Total Fine Amount Fact saved with the entity Total Fine. To them,
Fcalc will be a transient intermediate quantity. For those who save Fcalc instead, Fact will be a transient derived quantity. For those who store both, both will be persistent derived quantities; whereas for those who like to calculate
Fact on-the-fly, both will be transient derived quantities. This means the number of attributes associated with the entity Total Fine will vary depending on the line of thinking. The
run-time performance of the system will likewise be dictated by this line of thinking. However, given the ultimate responsibility of the system to calculate total fine and, from it, payable fine,
the programmer will have to write the entire algorithm in any of the four cases. It is therefore prudent here to consider both Fcalc and Fact as attributes of the entity Total Fine for estimation purposes.
In transaction terms, if we need to implement an algorithm y = f (x), and where f is a simple algorithm (as in most business systems), we need to account for the efforts of writing this algorithm
as those of reading x and writing y, whether or not x – y exist in the database or simply in transient memory.
Impact of Potentially Transient Data In the last article, we calculated UFP based on E=3, R=2, A=11. Now adding one attribute to account for the two Fine quantities
associated with the entity Total Fine, i.e. A = 12, we get UFP = (1.42 x 12 x (1 + 2/3)) + (8.58 x 3) + (13.28 x 2)
= 28.4 + 25.74 + 26.56
= 80.7 ~ 81
That’s an addition of 3 UFP caused by the added functionality.
In functional terms, we have one additional transaction that deals with an input (X, although this is typically obtained from system setting, it is too trivial to be considered an entity, but
rather deserves status of input), an entity and an output.
We thus have addition of 0.58 + 1.66 + 0.26 = 2.50 to our old estimate of 78.78 (from second article of this series) giving 81.28 ~ 81
Using either approach, we find the estimate going up by ~ 4% to implement the additional feature desired.
We observe here that when both Fcalc and Fact are important in the logic, they contribute to the estimation, whether or
not they are stored physically. This illustrates that in estimation we must consider all data that logically belongs to the system, irrespective of whether it is physically located in persistent
storage, or for that matter whether it is afforded a separate variable in the algorithm. In other words, for those who may be harassed by the logical-physical dilemma, the message is loud and
clear: as the process of estimation is fundamentally function-based, stick to logical data for purpose of estimation.
In terms of steps followed in estimation, it is useful, easier – quicker to locate Fcalc and Fact, and associate
them with Total Fine, simply by scanning through the requirement document for nouns and their adjective qualifiers. This activity precedes and leads to discovery – understanding of
transactions. An elaborate transactional analysis that projects the role of the data (as contributor to a transaction) in system functionality can then follow. In other words, the typical early
steps taken in data-based approach make function-based approach easier, quicker and more accurate. This means that not only is data-based estimation a quick-reckoner, but in fact it is also a
useful tool for an estimator who is only moderately familiar with the system functionality and who therefore needs a means of attaining familiarity with what the system does, before proceeding with
the function-based treatment.
- Quantities that are important to business logic must be counted in the estimation process, whether or not these quantities form a part of the final persistent data structure and whether or not
they are fundamental.
- For estimation purposes, entities, attributes and relationships are considered in their logical sense.
- The process followed for data-based estimation, apart from proving a quick-reckoner, also assists in transaction discovery. This in turn leads to more complete – accurate function-based
estimation and potential system re-scoping in good time.
Our next focus of attention will be on the effect of Denormalization, Normalization – Generalization of data on estimation. I also have some comments to make towards the end on estimation
refinement, in sync with data structure refinement.
In the meantime, I would like to suggest a small activity for you. Spot an instance of looser coupling attained by multiplicity relationship between certain entities we have discussed in our
example. This multiplicity relationship can be refined in the context of system functionality presented to us. See if you can locate this. I’ll be taking this as an example when I discuss
data structure refinement.
[i] Amit Bhagwat – Data Modeling – Enterprise Project Management, Part 1: Estimation
– TDAN (Issue 26)