Diseases are named either after the doctor who discovers the disease (Alzheimer’s, Crohn’s,) or after the unfortunate patient on whom the disease or disorder is diagnosed (there is an Eastern
European curse, “May you have a disease named after you.”). Baron von Munchausen was unfortunate enough to have given his name to a debilitating disorder. His disorder was psychological. He
believed he had a variety of diseases and in fact displayed the symptoms of those diseases but with no underlying illness. His legacy is the Munchausen Syndrome.
A more complex and unfortunate syndrome is Munchausen by Proxy in which a parent causes his or her child to become ill or injured and the manifestation can then be treated medically. The parent
becomes the center of attention as the child is cured and all is set right again. A more benign version of Munchausen by Proxy is a form of neglect (malnutrition, no immunization,…) that
eventually most probably requires medical care.
An extreme example of Munchausen by Proxy in IT was the case of a DBA who set database parameters so that when volumes exceeded an anticipated threshold, performance became intolerable. This DBA
would then come in and fix the problem. He was then heaped with accolades – which he otherwise never received. He had set up the problem and was now getting the same attention the parent in
Munchasen by Proxy was getting.
Gerald Hodge, the President of HLS Technologies in Sugar Land Texas, specializes in anticipating performance problems in the DB2 world. He points out that most organizations seem to have no
interest in keeping performance problems from happening but only in fixing them once the problems occur. This is a form of benign neglect, somewhat akin to not immunizing your children.
Performance service level agreements, for example “two seconds or less for 90% of the transactions,” are the norm for large organizations that recognize the heavy costs of terrible response time.
Hodge observed that “The main problem with service level agreements is that they have no economic support. If they are not adhered to, someone may lose his bonus, but no money is put in play to
solve the problem.” This goes to the heart of the problem; few organizations are proactive about poor performance, but only wait until response time become unacceptable.
Hodge also addressed threshold management. “If the service level is subsecond, then tuning should start well before the [subsecond] threshold is reached.” There are early warning systems that can
alert the installation to impending doom long before the users notice anything. Only some organizations have established service level agreements for data warehouse ad hoc queries as the access
paths and the resources used can be highly variable. Some have established benchmarks for selected predefined queries and the response time on these queries are monitored.
Data warehouses have their own set of performance problems, albeit somewhat different than those in operational systems. When left to fester through benign neglect, they can lead to the failure of
the data warehouse just the same.
Let us take a closer look at some of these problems.
Data warehouses are becoming far too large. In many shops with a mature data warehouse, their sizes exceed those of the operational systems.
Queries and reports are not as evenly distributed as they are in operational systems. For example, for financial and sales data warehouse queries, the activity is usually concentrated around
month-end and the beginning of the month.
Query volumes are difficult to predict. Users have a difficult time estimating how many queries they will run. In fact, if the data warehouse is successful, users will generate far more queries
than they might have predicted. Interesting results will raise new questions that need to be answered and more queries will be generated.
It may even be difficult to get a good estimate of the number of users. As the data warehouse becomes more popular and the word spreads, more and more users will log on and try it out with, at
first, simple queries. These simple queries may quickly grow into long, complicated and resource-intensive queries. If access to the database and user activity is not monitored and analyzed, these
users can remain incognito for quite awhile.
The access patterns of ad-hoc queries are difficult to predict. A new query may cause an access for which no index has been created and the result may be a long-running sequential job with some
expensive joins.
The load/refresh/update time often exceeds the allowed window. This has been one of the major performance problems with the data warehouse as the number of source files increase, as volumes get
larger, as the data cleansing becomes more extensive, and timeliness requirements become more stringent. This is also a reason why hardware capacity planning for a data warehouse is difficult and
cannot be considered as definitive as capacity planning for an operational system.
Poor performance does not appear all at once. The problems gradually increase in size and magnify their impact on the system. The timing of when these problems surface are predictable, but only if
the performance of the system is measured. A surprisingly small number of data warehouse installations bother to measure performance. The issue is usually one of lack of management awareness,
concern and attention.
The following steps should be considered to keep your data warehouse from becoming ill.
Sell management on the need to have good performance in the data warehouse. This may seem superfluous, as we assume that management would always recognize the need. Usually, management will give it
lip service but will not allocate the right resources to make it happen. Sell them on the idea of being proactive to eliminate performance problems before they become apparent to the users. You may
also consider what some organizations have done, which is to quantify the cost of poor performance and communicate that cost to management.
Establish the role of a person responsible for data warehouse performance, usually a DBA. This person would work closely with people responsible for the hardware, the operating system and the
network.
Work with your performance management tool vendors, query/report tool vendors and DBMS vendors to provide a robust monitoring capability. You want to know who is generating the queries and reports,
how many are they generating, CPU time, number of I/Os, time of day the query/report is executing, what data is being accessed, the access path, the size of the result set and the response time.
Set some series of service level agreements (SLAs) – even if they are only internal IT agreements – for performance. These would include SLAs for benchmark queries and reports that are: 1. Simple,
2. Medium and 3. Complex. These benchmark queries would have known characteristics for CPU time and number of I/Os. Do not try to establish an SLA for undefined ad-hoc queries, as there is no way
to know if they will generate Ten I/Os or ten million I/Os.
In the class that trains the users responsible for generating ad-hoc queries, include a module on performance that explains the use of indexes, how to avoid Cartesian products, the impact of asking
for more than what is needed, the use of metadata and the use of existing canned queries/reports.
Establish a canned query/report library with a responsibility for its administration. This responsibility would include an acceptance and performance test prior to the query/report being added to
the library. Be sure to communicate to all users that such a library is available, and describe all the queries/reports in it. This would significantly improve use of the library.
Establish lines of communication between the power users and the person responsible for data warehouse performance. The power users would indicate the intention for access to new data or to data in
a different access pattern. This would give the person assigned to data warehouse performance time to estimate the impact of the new query and determine if anything should be done even before it is
run for the first time.
Be sure the database subsystem (DB2, Oracle, etc.) has the proper attention. DBMS systems have a variety of tuning knobs, choices for sizes and numbers of pools, initialization parameters, free
space management, indexes, partitioning, disk configuration and data set placement that will have a marked impact on performance. There are ways to encourage efficient access paths – those paths
being entirely dissimilar between data warehouse and OLTP. Since the data warehouse and OLTP environments have such different characteristics, well-tuned and well-configured DBMS subsystems will
have significant differences between their data warehouse and OLTP versions.
Most large organizations have people or even whole departments devoted to controlling performance. They are responsible for estimating performance, capacity planning, monitoring performance and
tuning and correcting the problems. They work closely with other members of the organization who are directly responsible for the system, the database and the network. Some of these organizations
are now focusing on the performance of the data warehouse.
A healthy data warehouse can remain healthy, but only if IT takes proactive steps to monitor, measure and tune the data warehouse databases. In addition, there needs to be a mindset and standards
(SLAs) for excellent performance where activity should be invoked prior to hearing from complaining users.