I tend to come into a database from the SQL queries. Mr. Kopac works from the other direction. He looks at what is available from the system without going to the internals of a coded module. We often arrive at the same conclusions, but from different directions.
He hit on many general rules that apply to any other SQL product.
At the SQL statement level, I do not care about the customer wait time for a web application. I just want the right answer. I am so inside the box that I do not even want to know about what the front end is! But the client needs to know that 15 seconds responses will move the potential customer to click over to the next competitor’s website. Timeouts, lock waits, deadlocks, lock escalations and other performance snags can be caused by bad schemas and SQL. But I have to find where to look for this bad SQL.
Trust But Verify
Another system level problem is a table reorganization (aka compression, defrag, garbage collection, etc). This is generally done on the weekend so that on Monday the tables are physically smaller and often are put into some physical sorted order. This makes the queries run on Monday and Tuesday faster, but by Wednesday things are slowing down. The reason is simple: the tables are being modified (insert, update, delete) but the reorganization was done with zero percent free space (aka fill factor, etc.) in the tables and perhaps zero percentage free space in the indexes. Every time the table changes the system has to handle a data page overflow condition!
Another tag line of Mr. Kopac’s is “Are there leftover pizza crusts in your backyard?” which refers to schema objects that should have been removed (or never created in the first place), but still exist. Look for empty tables or for tables that have been seldom accessed. Another crust are indexes that are not used, but still have to be maintained.
This happens when non-RDBMS programmers created tables in their code to mimic scratch tapes and then never get rid of them. Instead of writing a single SQL statement with derived table expressions, they do the job step by step, just like we did with magnetic tape files. Another give-away are tables with generated names that usually include a sequence number or an ordinal date. The ordinal date is how IBM used to label tapes with a “yy-ddd” field in the label.
TRIGGERS are for Guns and Bombs
When you look at the schema information tables and see too many triggers, you need to do re-write the code.
What is Too Much?
- We want low index cardinality. Low does not mean zero because every table will have a primary key.
- Over five indexes per table is a really bad sign. In particular, single column indexes are suspect. If you reduce the number of overall indexes and change single column indexes to composite indexes, you will have fewer overall locks.
- Lack of clustering or sorted indexes are a bad sign. This will force sorting and table scans that could be avoided.
- Indexes are simply not used at all. There is an old programming proverb that there is nothing more permanent than a temporary fix. Nobody is sure that it is safe to remove it, so it stays. Hints (pragmas, set locks, etc) are even worse.
- If your Indexing has “Include Column” option you should be using it. This allows non-index values to be carried by the index. It is very useful for look-up tables. You ought to look at converting old indexes to the new option.
- If your indexes can allow a reverse scan, enable it. It can only help.
- Get rid of redundant and duplicate indexes. This is easy to do from the schema information table because there will be some way to show all the columns in each index. In particular, look for overlapping column lists
Ghosts from Enhancements Past
Death by ‘Itty Bitty’ Tables
- Small volatile tables are most often scratch tapes in SQL disguise. They are constantly being loaded, emptied or dropped. They need to be replaced with a derived table expression in the SQL statements.
- Small stable tables are most often look up tables that are used in REFERENCES clauses. They need to be replaced with CHECK (.. IN (..)) that can be optimized. For example, a table with the 54 two-letter state abbreviation codes is very stable. Nobody wants to join the Union. The storage space for a short IN() list is nothing today and many SQL products will optimize the search by building a hash table or binary tree when they reach a certain size.
- Application programmers will put configuration data in a small table and load it into the application. Mr. Kopec’s example was a system with a ten row read-only table that was loaded by almost every Java application, thousands of times per day. This data needs to be put in the application side and not the database.
Death by ‘Fat’ Tables
You often do not need all the data from a table. The Pareto distribution, 80/20 rule or Sturgeon’s Law says a relatively small subset of the data is most used. For example, 80% of your business comes from 20% of your clients. The most active part of the database will be today’s work, etc.
You can partition your database and keep the most active parts in faster physical storage. You can design history tables and add Views for the current data. You can archive data and get it out of the database. You can purge data that you no longer need and which might be a source of legal problems.
Using SQL for the Wrong Reasons: It was Nice
Using “SELECT * FROM ..;” in production code is both dangerous and expensive. Many SQLs will expand the * into a column list at compile time. The execution plan is now set and will stay that way until a recompile. Gee, I sure hope the base tables do not add, drop or move columns because you will never be sure what the SELECT list is today.
While not always wrong, options like “FETCH FIRST zillion ROWS” or “OPTIMIZE FOR zillion ROWS” are usually a waste. Will any humans actually look at zillions of rows in a result set on his screen? If you really need all that data, it is usually cheaper to spool all the data to a file and give it to a report writer, statistical analysis package or some tool that can handle zillions of rows.
It was nice to get out of my little box and look at things from the outside for once. Maybe I will learn a new programming language before I die.Nah!