Applying Dimensional Analysis to Business Intelligence Systems

Published in TDAN.com April 2002

Most of us study bits of elementary Physics as we get through our schooling. A few of us find it fascinating, a few others reckon it too idealistic with bodies continuing to move forever purely by
divine miracle. Many more find it awfully boring with so many laws to remember and equations to derive. Indeed, very few of us have anything at all to do with much of Physics as realistic science.
Yet there are too many principles and practices initiated in Physics that have value in virtually every sphere of our life, and Information Industry is by far among the greatest beneficiaries of
Physics.

Back in my school days I was much amused by concept of dimensional analysis. The greatest merit of this was to be able to perceive most quantities and their units as derived from a small number of
Fundamental Quantities. In short, it meant that whereas we often try to make life complicated by assuming quantities that we do not fully understand or relate, the universe is essentially built
around only a handful of these that are elementary. Conversely, based on our mastery on manipulating those handful of elementary things, lies our mastery on everything. From that moment onwards I
liked Physics for being the simplest of sciences. Later on, as I studied Chemistry, I found the same concept repeating where everything was made up from a few chemical elements. The two together
have emphasized that complex looking things are essentially made of simple bits and it depends on our ability to combine simple things in proper quantities and context as to how we make the most
sense out of the seemingly complex world.


Why have derived quantities?

The question may be broken into two:

  • Why reckon in quantities that turn out to be derivable?
  • Why derive these quantities?


Why reckon in quantities that turn out to be derivable?

This first question is not that difficult to answer, for there are many of these quantities that we actually ‘feel’. Take example of our routine of traveling to and from our office (not
all of us have e-office with flexi-hours (or further, virtual hours!), however often may we wonder – why not?). Ultimately, the objective is to travel so as to reach from one set of latitude and
longitude i.e. position, to another. This corresponds to a set distance following the shortest available route (e.g. a straight line for good crows on flat earth); but since we have to do this in
such time as to make the most out of our morning, avoid a boring crawl and still meet the office schedule, we are a lot more concerned about the speed that we end up traveling at. Indeed, we are
often willing to travel extra distance if it means better speed and thus more probability of making the most out of our morning. So speed, though derived, is more important to us that distance.

Further, imagine a route full of ups and downs, bottlenecks and fast carriageways where the overall average speed of travel is a bit higher than on a flat country road in ‘good-enough’
conditions and with modest traffic. Are we quite certain that we shall always go by the path allowing greater average speed and thus perhaps a bit less time? On that path the car has to spend a
fair amount of extra energy trying to accelerate after every bottleneck and give in gas on every uphill slope. We also need to spend a lot higher amount of our energy fiddling with the pedals and
being extra vigilant. Perhaps a bit more distance, a bit more time, a bit less speed are all fine if they mean considerably lower acceleration / retardation and thus lower usage of energy. So
energy consumption and acceleration / retardation, though involving derivation a level deeper, are more important in our consideration.

We can go on and on thus, but the two examples above have adequately demonstrated why we are often more concerned about quantities that turn out to be derived. To Manage our day, our Information
System seeks derived quantities rather than fundamental ones. We have just seen a real-life example of what drive the simplest of Decision Support Systems – appropriate quantities, however
intricately derived they may be, those can be measured and mapped most directly to our end objectives. In our example, the goal is to make the most out of our morning drive and this we achieve by
measuring the energy consumed by car and by us, which we try to put to minimum in practical terms.

The next question is – how do we measure energy? We seldom measure it in its units such as joules (wonder how many of us have heard of this) kWhrs or horsepower-hours. We measure consumption in
context of a car as mileage (distance per unit volume of fuel), brake-ware (width per unit time / distance) and service frequency (as incidences per unit time or may be as ratio of down time to
operational time). Those few of us who care enough, will probably also measure emission of pollutants (as mass per unit distance traveled?) Further, whereas there are ways of converting most of
these quantities into “running cost”, we are not always satisfied by that simple conversion, since our beloved car needing frequent servicing, a need for more frequent fuel stops and
the environmental value have implications (that may be made tangible with some thinking) beyond money. We thus often tend to see their trend separately and then work out the best option. All of us
are keeping track of these derived quantities to make the decision that appears most appropriate to us. So commonplace and often subconscious is this reckoning that we’ll ridicule it if
someone were to call this our “Route Management Intelligence System”. But just add bits and pieces to this problem that are specific to individual road users in our area and aggregate
their metrics. What we suddenly have is the Traffic Management System of the area, a system which is beyond any doubt complex in itself and linked intricately with that of other geographical areas.
To attempt to describe this system we’ll readily volunteer to use much of available jargon in the world. This time around therefore we fully appreciate the enormity of culmination of our
individual little exercises with the derived quantities and we’ll readily cut our heart out for anyone who could solve this for us. (In fairy tales they still give away half the kingdom and
the princess’s hand!)


Why derive these quantities?

Probably this question will become less poignant or indeed less of a question, with the exercise that we have gone through. There is no doubt that we need to do our metrics in the most appropriate
quantities and to make the most out of them we need to understand their relation to other quantities. Debate may still be rife as to which quantities should be fundamental. Probably this debate
will continue eternally. Many will suggest that fundamental quantities Are Fundamental, they come by intuition, as say – distance and time in our example. I know however from the same Physics,
where all this started, that intuition is neither universal nor unique.

Back in those good old days when the intelligentsia were supposed to be wearing the weirdest of attire and hairstyle, perhaps to make thinking by far the easiest thing in life, there existed two
“Physics”, the Electric Physics and the Magnetic Physics. The electrical folks reckoned that Charge was the fundamental quantity; whereas the magnetic folks insisted that Current (which
later turned out to be flow of charge per unit time) was The Fundamental Quantity. By the time the two branches merged to form Electromagnetic Physics, the “magnetic” folks had far
excelled the electric folks in their influence on industry. So, much against the intuition of a modern physicist, Current continued to be a fundamental quantity and Charge got to be derived from
it. Whereas theorists take issues like these to get at each other’s throats from time to time, in practice the arrangement goes well, since even today we, Physicists or Realists (!), measure
current a lot more than we do charge, making current effectively a phantom commodity.

So, in the end, all that can be said about fundamental quantities is that it is good if they are tangibly thought of and expressed, are measured fairly comfortably and accurately in their own
terms, and provide building blocks for other quantities that are measured / expressed with them as constituents. They then, by allowing quantities to be derived from them, provide an effective way
of relating these derived quantities with each other.


Dimensional Analysis

Perhaps many of you have begun wondering – all this may be well as it is, but where the hell does the title of this article come from? Well, it comes from everything that we have been talking about
so far. In our little exercise with the car, we talked about distance (or length) and time as the fundamental quantities. We next talked about speed – distance (change of position) per unit
time and acceleration / retardation – change in speed per unit time. Further, to keep matters simple, we considered energy without considering the mass of the laden vehicle, i.e. effectively,
we talked about energy per unit mass.

As we were proceeding with the argument, things up to acceleration were all simple, since speed was time derivative of distance and acceleration was time derivative of speed. Further, talking about
energy, we realized that more is the acceleration, more is the energy needed. But we had no way of telling whether doubling acceleration meant doubling energy consumption and further, for what
matters us most, whether doubling energy consumption is doubling the expenditure. Is it not obvious? No, it isn’t. Indeed those of you who have studied the current and power rating of an
electrical gadget may actually have noted that if current rating doubles, the power rating quadruples, and so does the amount that we shell out while using it. Put in general terms, expenditure
goes as square of the current rating of the appliance. Now that’s frightening, and so is it for Enterprises whose fortunes depend on answer to one question: What other quantities in what way
and in what proportion change quantities that matter most to them?

Physics answers this question through dimensional analysis. As Physics does, let’s represent distance (length) by L, mass by M and time by T. Speed is distance per unit time, i.e. L ¸
T, which may also be represented in power notation as L1T-1. Speed is therefore proportional to [L1T-1]. This is spoken in Physics as speed has the dimensions [L1T-1]. Likewise, acceleration /
retardation will have dimensions [L1T-2], the force necessary to produce this change of speed will have dimensions of product of [M1] and [L1T-2], i.e. [M1L1T-2]. Further, the energy needed to do
the work of applying this force over a distance [L] has dimensions [M1L2T-2] and the power of the vehicle, i.e. ability to do work in unit time, has dimensions [M1L2T-3].

If you are still tuned, you are likely to wonder – we did all this ourselves, so where have the dimensions that we so painstakingly derived, been put to any use? Wherever you like! E.g. you
want to know – what will happen to energy consumption if you drive at double the speed by slamming the gas hard? We are talking about energy [M1L2T-2], i.e. [M1][L2T-2], i.e. [M1][L1T-1] 2 and
speed [L1T-1]. The answer is plainly obvious, energy consumption will go up as square of speed and so it will be quadrupled when we double the speed by slamming down the gas. Hardly worth the
wastage! You also see that unlike with respect to speed, the energy consumption will go up proportional to the loading [M1] of the vehicle and not to its square. So qualitatively, speeding is doing
a far greater damage to our economics than loading does.

Now try and remember your miserable self, restlessly preparing for the driving theory test, trying desperately to remember the stopping distance of a car at different speeds. It was easy to follow
that the total distance will be sum of “thinking” distance – the distance traveled while your reflexes slammed the brakes and braking distance when you and the brake were doing
the best possible. Whereas this sum was easy, remembering these two distances for at least 6 speeds was near impossible. But did you need to remember them all?

For thinking distance, we are talking about distance [L1], i.e. [L1T-1] [T1] at specific speed [L1T-1] and reflex time [T1]. A look at the dimensions and straight comes the answer – the
thinking distance varies directly as both the speed and the reflex time. So double the speed and for the same attentiveness on your part, the distance doubles. Likewise, braking distance [L1], i.e.
[L2T-2][L-1T2], i.e. [L1T-1]2 [L1T-2]-1 varies inversely as braking retardation [L1T-2] and directly with the square of speed [L1T-1]. Thus double the speed and quadruples the breaking distance.
The important point drawn home is that whereas there is no excuse for not being attentive behind the wheel, higher speed is far more disastrous.

As manager of the Enterprise of driving you car, you have just made two important decisions based totally on dimensional analysis:

  1. For fuel economy, you should remove unnecessary load from you vehicle, but far more importantly you should not slam the gas and speed up too far.
  2. For a safe drive, you should be alert while behind the wheel, but once again, you should be extremely careful about speed, as the positive effect of the best reflexes can not compensate for
    negative effect of speeding.


How does this fit with BIS?

If you have worked in an environment where the jargons MIS, DSS, BIS, OLAP have been within earshot, then you have probably also heard about data-warehouses and perhaps know a lot about them.
Further, the word dimension hasn’t been as unknown to you and perhaps you have done a substantial amount of cubing yourself. In the next two sections, we’ll take a recap of what
fundamental things do organizations look for in their Business Intelligence Systems and the way they tend to get close to their solution, before pointing out what they do not get through the way
they work.


What do organizations look for in their BIS?

In all honesty, there is no all-encompassing answer to this. The closest that we can go to generalizing is that they look for metrics of their “performance”. A concept that usually has
facets like efficiency, effectiveness, growth, market dominance, stability etc. Many of these are perceived as change in various quantities translatable into monitory equivalent, over various
scoping factors such as period of time, geography, population, etc. and competitive factors such as competitor popularity, standard of living, etc. The organizations are therefore looking first to
determine what facts are important to their success (their CSFs – to throw another jargon) and having perceived or at least narrowed down their choice of important facts, they go about relating
them to other variables in the environment to find the best movement of these variables in the organizations’ interest.


How do they go about this?

They go about this in just the way they think. They first determine facts that are important to them. They then list the variables that they consider influencing the facts. Next they wait and wait
and wait and if they are not doomed in the meanwhile, they compile a chart full of the fact(s) that matter to them, with values of variable(s) that may have had anything to do with the facts. They
then group each variable for different granularity and match the fact(s) against each to see if, by any chance, a fact shows a trend for change in value of the variable. If it appears to, they make
further business decisions based on this and hope that they will prove to be in the right direction, if not necessarily accurate. Whether the physical arrangement of data is redundantly
multidimensional (typically for frequent and complex analysis of modest size of data) or relational (for modest querying and analysis of large and / or legacy data) or a mix of these, and whether
or not the dimensions are solely and independently defined by granularity, this type of data view may be logically perceived to equate an asterisk, centred around fact data and viewed through
different dimensions, as shown in Fig.1. We need reliable and sufficient quantity of concurrent data to be able to map facts with dimensions, before this data can tell us the obvious. Even then the
best it can do is provide us trends rather than prophesies.





What do they miss out?

What the method explained above misses out on is analytically (rather than empirically) relating the quantities that make up different dimensions. The method therefore has potential hazard of
actually representing one dimension twice. Worse still, the method has little success in predicting collective effect of multiple dimensions on the system. Neither does this allow the dimensions to
be extrapolated to simpler quantities on one side and to more complex, but meaningful derived quantities on the other. It therefore rarely hits on the set of dimensions that will best predict
facts.


The Car Driving Enterprise


With classical BIS

Let us now assume that our Car Driving Enterprise did not have the benefit of dimensional analysis. What best could it have come up with following conventional BIS methods? It would have first
reckoned the facts that matter most to it, i.e. time of travel, comfort of travel and cost of travel. Further, whereas perhaps it would have perceived at some point that comfort of travel had a lot
to do with avoiding frequent and eccentric use of the pedals, the measurement of this being difficult to imagine without thinking of dimensions and derived quantities, this fact would have been
left out.

Now to get the best time of travel, it would first have been necessary to get ‘statistically significant data’. The car would thus have been driven for a few months on each candidate
route. This would have given the competitive time directly, though it would have meant going late (though mostly too early) to the office until a trend of this time was obtained, i.e. doing
business in a variety of wrong / inefficient ways to find the right / efficient way. What about extrapolating the results if the place of work should change? Well, that is beyond our capacity. The
best that could have been done was to hire a consultancy, provide it with our data of time taken and distance traveled, so it could find for us the average speeds, draw a graph of these against the
time taken and give us an “intelligent guess” as to typically for what minimum approach distance, in say a typical urban road pattern, route with what average speed has possibility of
providing the least time. A conclusion, that would have been too specific, cost us a fortune and still left the usual bit of statistical error and uncertainty hanging around.

Going likewise, the best that we could have done about energy consumption would have been to work out mileage per fuel tank on different routes after a good deal of meticulous observation and then
work out the miles on each route and thus find out only the most direct cost – the cost of fuel for each route.

Thus, our metrics fails to give us any guess on one of our Critical Success Factor and provides rough guesses for the other two that may not be applied universally. All this is due to failure to
understand that the extent and frequency of acceleration – retardation cycles needed on the route, i.e. rate of change of speed over time or distance, was the ultimate fact that we looked for;
something that we could have easily related dimensionally. In this case, it was not a fundamental quantity, neither was it tangible. It though was definitely derivable and could be mapped most
directly to the path of travel. Our data model for the classical BIS approach thus logically looks like Fig.2 having many quantities for fewer predictions and without a definite relation between
some of them.





With Dimensional Analysis

First, as we went about it in our little exercise above, we did not actually require loading our car differently or going at different average speed to gather data, create nice charts and trend
graphs to conclude: loading, but more importantly speeding were bad for our fuel economy. Further, our little deduction on braking indicated that we needed more braking retardation if speed was
more or braking distance allowed was less. Similarly, we needed acceleration to get back to cruising speed after braking, which again would be more to gain back the speed in shortest possible time.
Thus, it brings home, that roads that change in driving conditions remarkably and frequently along their length and thus need high amounts of acceleration / retardation [L1T-2] end up
proportionately heavy on energy [M1L2T-2] i.e. [M1L1] [L1T-2]. Thus, we have the necessary conclusion ready without any delay or need for data gathering or analysis. If we must be able to provide
the value and not the trend or proportionality of the energy consumed, all we need is measure the acceleration and retardation that a car produces against distance that it travels – the job of
device as simple as a spring-loaded plotter on a drum connected to a wheel / driving shaft / odometer of the car. Know the total time of our journey and this simple measurement with bits of
dimensional analysis tells us everything that there is to be told about the enterprise. E.g. the frequency and level of its extremes points give us an indication of extra energy gone into
acceleration and braking, that also corresponds to extra fuel consumed and brake wear caused, which also correspond to jerks and pedalling strain experienced by the driver. We thus have all three
facts that we set about to investigate. Our data model logically looks like Fig.3, having fewer quantities to measure and with all predictions that we need, along with the fact that these are
related.





Generalized view of dimensional analysis

Thus, through our example, we have noted that dimensional analysis:

  1. Puts seemingly complex quantities down into fundamental quantities
  2. Allows derived quantities to be related to and expressed through others
  3. Establishes proportionality rather than just trend of variation of one quantity with others
  4. Zeroes on quantities most important for establishing facts
  5. Narrows down business metrics
  6. Reduces burden of data analysis
  7. Provides higher accuracy of analysis
  8. Provides more accurate predictions

It therefore greatly reduces the efforts of data gathering for a greenfield BIS. It also reduces the effort of data analysis for an evolving BIS.


What next?

Having theorized all that is Bold and Beautiful, and got the point home through the simplest example, it’s time that we did more work on real life business problems and streamlined our
methods. After seeing its contribution, over centuries, to complex analysis problems in Physics; I see no impediment to successful use of dimensional analysis for similar real life business
situations, in a typical enterprise BIS. Start applying it on your business problems right away.

The typical steps will be:

  1. Have your business definition ready
  2. Find the quantities for which you need the trends (let’s call them Ts)
  3. Analysis composition of each T in terms of presence of simpler quantities (call them Ss). Let it not bother you whether they are absolutely fundamental, since fundamental is often a paradigm
    specific term.
  4. Establish level of influence of the Ss on Ts.
  5. Establish Dimensions of Ts (do not expect them to come as clean and definite as in our example) in terms of Ss.
  6. Iterate if necessary to establish dimensionality more accurately
  7. Work with combinations of the Ss those will themselves make measurable quantities (call them Ms) and will together work out to establish Ts
  8. If your have a data warehouse / data-mart, locate these Ms in the data available to you
  9. Devise technique of measuring Ms to establish / check your trend
  10. Analysis Ms and extrapolate to establish trends in Ts

Feel free to send me cases relevant to your enterprise and we may be able to solve some of them in coming issues, while further perfecting our methods.

This article was also published in the March 2002 issue of the Journal of Conceptual Modeling.
www.inconcept.com/jcm

Share

submit to reddit

About Amit Bhagwat

Amit Bhagwat is an information architect and visual modeling enthusiast, in the thick of object oriented modeling and its application to information systems. He has developed a specialised interest in applying techniques from pure sciences to data modeling to get the best out of MIS / BIS. He is an active member of the precise UML group and happens to be the brain father of Projection Analysis - http://www.inconcept.com/JCM/June2000/bhagwat.html - and Event Progress Analysis - http://www.tdan.com/special003.htm - techniques. He also maintains contents for the celebrated Cetus Links - http://www.cetus-links.org - in Architecture and Design areas. He shares a variety of other interests including photography, poetry and sociological studies. Explore some of his work at: http://www.geocities.com/amit_bhagwat/

Top