Orthogonal is a term computer science borrowed from traditional mathematics. It originally meant the property of being at right angles to each other, but was generalized to mean the property of
things that are mutually independent. In computer science, it generally is used to refer to programming languages that permit nesting of expressions without regard to context.
Okay, that was hard to read, but it is easier to explain with an example. Back in the old days, when we lived in trees, ate our children and wrote in FORTRAN II, that language allowed only seven
forms of subscripts on an array. Let V = an unsigned integer variable and C = any positive integer constant, the allowed formats were:
V
C
V+C
V-C
C*V
C*V+C
C*V-C
They had to be written in exactly those seven patterns and nothing else would work. The reason for this was that the early compilers were written to use IBM hardware that had registers that could
directly translate the FORTRAN code into machine instructions for computing physical addresses.
Later when we lived in caves, ate other people’s children and wrote in better languages, a subscript could be any numeric expression that returns an integer. Obviously, it has to be an integer
in the appropriate range, but the error message is about the subscript range and not the format of the expression.
Orthogonality is a good thing, if for no other reason than it saves you from learning context-sensitive rules for every situation. It is easy to say “use anything that returns an integer
here” than the seven subscript forms, even though there are an infinite number of ways to get an integer.
However, programmers who come to SQL from procedural languages retain old habits. When they write a stored procedure, they organize it the way they did in a procedural language. The optimizers for
procedural languages are designed to work with programs built from the classic structured programming flow of control constructs (IF-THEN-ELSE, BEGIN-END, WHILE-DO and others). They know how to
arrange stack and register usage, clean up math and do assignment statements best.
SQL is not like that, even in the procedural parts of the language (SQL/PSM, PL/SQL, Informix 4GL, T-SQL et al). All the work went into the query optimizer. Since that is where the “heavy
lifting” gets done, it makes sense. Thus, the more work you can get done in ONE SQL statement, the better your procedure will perform.
This is a general statement, and I am sure that you can find an exception. But let me take an example, based on a problem posted on an SQL newsgroup. The table and the problem have been changed to
make them funnier and simpler.
PROBLEM: Write a procedure to manage your decadent lifestyle by budgeting for wild parties in three categories (wine, women and song). Each party is given a simple sequential number for an
identifier. Each category has a non-negative amount of funding in each party. The schema is pretty simple, with the only tricky part being the “no_gaps_in_party_nbr”
constraint.
CREATE TABLE PartyFunding
(party_nbr INTEGER DEFAULT 1 NOT NULL PRIMARY KEY
CHECK (party_nbr > 0),
wine_amt DECIMAL(8,2) DEFAULT 0.00 NOT NULL
CHECK (wine_amt >= 0.00),
women_amt DECIMAL(8,2) DEFAULT 0.00 NOT NULL
CHECK (women_amt >= 0.00),
song_amt DECIMAL(8,2) DEFAULT 0.00 NOT NULL
CHECK (song_amt >= 0.00),
CONSTRAINT no_gaps_in_party_nbr
CHECK ((SELECT COUNT(*) FROM PartyFunding)
= (SELECT MAX(party_nbr) FROM PartyFunding));
When we budget a party, we can have allocated funds and unallocated funds. The allocated funds for party (n) are assigned to their appropriate vice. The unallocated funds for party (n) are
proportioned in same ratio as in party (n-1) in all three categories. Again, this might easier to see with an example.
INSERT INTO PartyFunding
(party_nbr, wine_amt, women_amt, song_amt)
VALUES (1, 40.00, 60.00, 20.00);
The first party spent 1/3 on wine, 1/2 on women and 1/5 on song from the available $120.00 budget. Given some figures for the second party that also includes a few bucks to be spread around, how do
you do it?
Most people presented with this problem want to do it in multiple steps. First INSERT the budgeted items, and then do some math with the unallocated funds and finally update the table. Here is a
skeleton procedure without error handling:
CREATE PROCEDURE BudgetParty
(IN in_party_nbr INTEGER,
IN in_wine_amt DECIMAL(8,2),
IN in_women_amt DECIMAL(8,2),
IN in_song_amt DECIMAL(8,2),
IN in_unallocated_amt DECIMAL(8,2))
BEGIN ATOMIC
–local working variables
DECLARE bonus_wine_ratio DECIMAL(8,2);
DECLARE bonus_women_ratio DECIMAL(8,2);
DECLARE bonus_song_ratio DECIMAL(8,2);
–insert initial allocations, clean out nulls
INSERT INTO PartyFunding
VALUES (in_party_nbr, COALESCE(in_wine_amt, 0.00),
COALESCE(in_women_amt, 0.00), COALESCE(in_song_amt, 0.00));
–compute bonus ratios
SET bonus_wine_ratio
= (SELECT wine_amt /(wine_amt + women_amt + song_amt)
FROM Party_Funding
WHERE party_nbr = in_party_nbr -1);
SET bonus_women_ratio
= (SELECT women_amt /(wine_amt + women_amt + song_amt)
FROM Party_Funding
WHERE party_nbr = in_party_nbr -1);
SET bonus_song_ratio
= (SELECT song_amt /(wine_amt + women_amt + song_amt)
FROM Party_Funding
WHERE party_nbr = in_party_nbr -1);
–update the current party
UPDATE PartyFunding
SET wine_amt = wine_amt + (unallocated_amt * bonus_wine_ratio),
women_amt = women_amt + (unallocated_amt * bonus_women_ratio),
song_amt = song_amt + (unallocated_amt * bonus_song_ratio)
WHERE party_nbr = in_party_nbr;
END BudgetParty;
Comments:
- Whenever you see local working variables in an SQL stored procedure, be alert. While not always true, you will find that they are being used to hold a calculation, a flag or something that
could have gone directly into the statement using the variable. - There is nothing special about the INSERT statement. It is just using the parameters as input.
- In this example, the local variables are holding the allocation ratios from the previous party. The first thought that programmers have when they see three statements with common
sub-expressions is to factor them out and compute them only once, so we have the total budget from the previous party as a variable to use as a divisor for the three categories. A good procedural
optimizer would do this for us, in fact. But it is not certain in proprietary SQL procedural languages. The point is not to clean up the procedural code, but to get rid of it. - The final step is updating the party budget. This means we have touched the same row twice. There is another heuristic for you. You want to touch a table once, if possible. You want to
touch a row in a table once, and it is almost always possible.
When asked if there was any way to improve this procedure, we got two answers. One was to replace “IF in_wine_amt IS NULL THEN in_wine_amt := 0.00 END IF;” with COALESCE(), as I
already did. The second suggestion was to create a new local variable and assign it the total budget amount of the previous part to avoid recomputing it over and over.
SET prev_total_amt
= (SELECT P1.wine_amt + P1.women_amt + P1.song_amt)
FROM Party_Funding P1
WHERE P1.party_nbr = in_party_nbr -1);
But the way to look at this problem is that all we are trying to insert is a new row. This is the top down view of things. That ought to tell us that we need just one insertion statement. Here is a
second skeleton program:
CREATE PROCEDURE BudgetParty
(IN in_party_nbr INTEGER,
IN in_wine_amt DECIMAL(8,2),
IN in_women_amt DECIMAL(8,2),
IN in_song_amt DECIMAL(8,2),
IN in_unallocated_amt DECIMAL(8,2))
INSERT INTO PartyFunding
VALUES
(in_party_nbr,
COALESCE(in_wine_amt, 0.00)
+ (unallocated_amt
* (SELECT P1.wine_amt
/(P1.wine_amt + P1.women_amt + P1.song_amt)
FROM Party_Funding P1
WHERE P1.party_nbr = in_party_nbr -1)),
COALESCE(in_women_amt, 0.00)
+ (unallocated_amt
* (SELECT P1.women_amt
/(P1.wine_amt + P1.women_amt + P1.song_amt)
FROM Party_Funding P1
WHERE P1.party_nbr = in_party_nbr -1)),
COALESCE(in_song_amt, 0.00)
+ (unallocated_amt
*(SELECT P1.song_amt
/(P1.wine_amt + P1.women_amt + P1.song_amt)
FROM Party_Funding P1
WHERE P1.party_nbr = in_party_nbr -1))
);
Comments:
This procedure has folded everything into one statement with nested expressions and function calls. We don’t have to worry about setting a transaction level so that multiple statements are not
a problem. In the actual program, the COALESCE () calls I used to get rid of NULL arguments had been assignment statements inside separate IF-THEN-ELSE statements. There was no way the SQL optimizer
could get to them.
Likewise, the computations are now inside the INSERT statement where the SQL optimizer can get to them. Will it see the similar SELECT sub-expressions and optimize them in some way to make one query?
Probably, but you will want to look at the execution plan to be sure. If nothing else, you have saved the storage allocation and assignment overhead of local variables.