Published in TDAN.com October 2000
General John B. Sedgwick
The general’s last words at the Battle of Spotsylvania, 1864.
This article will present the types of failures that have been experienced by various data warehouse projects. There is disagreement over the failure rate of data warehouse projects. Rather than
contribute to the debate we will detail the types of situations that could be characterized as failures, and leave it to the reader to decide if they truly constitute failure.
1. The Project Is Over Budget
Depending on how much the actual expenditures exceeded the budget, the project may be considered a failure. The cause may have been an overly optimistic budget or the inexperience of those
calculating the estimate. The inadequate budget might be the result of not wanting to tell management the bitter truth about the costs of a data warehouse.
Unanticipated and expensive consulting help may have been needed. Performance or capacity problems, more users, more queries or more complex queries may have required more hardware or extra effort
to resolve the problems. The project scope may have been extended without a change in the budget. Extenuating circumstances such as delays caused by hardware problems, software problems, user
unavailability, change in the business or other factors may have resulted in additional expenses.
2. Slipped Schedule
Most of the factors listed in the preceding section could also have contributed to the schedule not being met, but the major reason for a slipped schedule is the inexperience or optimism of those
creating the project plan. In many cases management wanting to “put a stake in the ground” were the ones who set the schedule by choosing an arbitrary date for delivery in the hope of
giving project managers something to shoot for. The schedule becomes a deadline without any real reason for a fixed delivery date. In those cases the schedule is usually established without input
from those who know how long it takes to actually perform the data warehouse tasks. The deadline is usually set without the benefit of a project plan. Without a project plan that details the tasks,
dependencies and resources, it is impossible to develop a realistic date by which the project should be completed.
3. Functions and Capabilities Not Implemented
The project agreement specified certain functions and capabilities. These would have included what data to deliver, the quality of the data, the training given to the users, the number of users,
the method of delivery e.g. web based, service level agreements (performance and availability), pre-defined queries, etc. If important functions and capabilities were not realized or were postponed
to subsequent implementation phases, these would be indications of failure
4. Unhappy Users
If the users are unhappy, the project should be considered a failure. Unhappiness is often the result of unrealistic expectations. Users were expecting far more than they got. They may have been
promised too much or there may have been a breakdown in communication between IT and the user. IT may not have known enough to correct the users’ false expectations, or may have been afraid
to tell them the truth. We often observe situations where the user says jump, and IT is told to say “how high?” Also, the users may have believed the vendors’ promises for grand
capabilities and grossly optimistic schedules.
Furthermore, users may be unhappy about the cleanliness of their data, response time, availability, usability of the system, anticipated function and capability, or the quality and availability of
support and training.
5. Unacceptable Performance
Unacceptable performance has often been the reason that data warehouse projects are cancelled. Data warehouse performance should be explored for both the query response time and the
extract/transform/load time.
Any characterization of good query response time is relative to what is realistic and whether it is acceptable to the user. If the user was expecting sub second response time for queries that join
two multi-million-row tables, the expectation would cause the user to say that performance was unacceptable. In this example, good performance should have been measured in minutes, not fractions of
a second. The user needs to understand what to expect. Even though the data warehouse may require executing millions of instructions and may require accessing millions of rows of data, there are
limits to what the user should be expected to tolerate. We have seen queries where response time is measured in days. Except for a few exceptions, this is clearly unacceptable.
As data warehouses get larger, the extract/transform/load (ETL) process will take longer, sometimes as long as days. This will impact the availability of the data warehouse to the users. Database
design, architecture, and hardware configuration, database tuning and the ETL code – whether an ETL product or hand written code – will significantly impact ETL performance. As the ETL
process time increases, all of the factors have to be evaluated and adjusted. In some cases the service level agreement for availability will also have to be adjusted. Without such adjustments, the
ETL processes may not complete on time, and the project would be considered a failure.
6. Poor Availability
Availability is both scheduled availability (the days per week and the number of hours per day) as well as the percentage of time the system is accessible during scheduled hours. Availability
failure is usually the result of the data warehouse being treated as a second-class system. Operational systems usually demand availability service level agreements. The performance evaluations and
bonus plans of those IT members who work in operations and in systems often depends on reaching high availability percentages. If the same standards are not applied to the data warehouse, problems
will go unnoticed and response to problems will be casual, untimely and ineffective.
7. Inability to Expand
If a robust architecture and design is not part of the data warehouse implementation, any significant increase in the number of users or increase in the number of queries or complexity of queries
may exceed the capabilities of the system. If the data warehouse is successful, there will also be a demand for more data, for more detailed data and, perhaps, a demand for more historical data to
perform extended trend analysis, e.g. five years of monthly data.
8. Poor Quality Data/Reports
If the data is not clean and accurate, the queries and reports will be wrong, In which case users will either make the wrong decisions or, if they recognize that the data is wrong, will mistrust
the reports and not act on them. Users may spend significant time validating the report figures, which in turn will impact their productivity. This impact on productivity puts the value of the data
warehouse in question.
9. Too Complicated for Users
Some tools are too difficult for the target audience. Just because IT is comfortable with a tool and its interfaces, it does not follow that all the users will be as enthusiastic. If the tool is
too complicated, the users will find ways to avoid it, including asking other people in their department or asking IT to run a report for them. This nullifies one of the primary benefits of a data
warehouse, to empower the users to develop their own queries and reports.
10. Project Not Cost Justified
Every organization should cost justify their data warehouse projects. Justification includes an evaluation of both the costs and the benefits. When the benefits were actually measured after
implementation, they may have turned out to be much lower than expected, or the benefits came much later than anticipated. The actual costs may have been much higher than the estimated costs. In
fact, the costs may have exceeded both the tangible and intangible benefits.
11. Management Does Not Recognize the Benefits
In many cases, organizations do not measure the benefits of the data warehouse or do not properly report those benefits to management. Project managers, and IT as a whole, are often shy in boasting
about their accomplishments. Sometimes they may not know how to report on their progress or on the impact the data warehouse is having on the organization. The project managers may believe that
everyone in the organization will automatically know how wonderfully IT performed, and that everyone will recognize the data warehouse for the success that it is. They are wrong. In most cases, if
management is not properly briefed on the data warehouse, they will not recognize its benefits and will be reluctant to continue funding something they do not appreciate.
Conclusion
There are many ways for a data warehouse project to fail. The project can be over budget, the schedule may slip, critical functions may not be implemented, the users could be unhappy and the
performance may be unacceptable. The system may not be available when the users expect it, the system may not be able to expand function or users, the data and the reports coming from the data may
be of poor quality, the interface may be too complicated for the users, the project may not be cost justified and management might not recognize the benefits of the data warehouse. By knowing the
types of failures others have experienced you are in a position to avoid those failures. You must know what risks to anticipate with the data warehouse if you are going to deal with those risks and
head them off before they sink your project. The most important activity of a project manager is picking the right people and avoiding those who can and will hurt the project.
This article is excerpted from the chapter on Data Warehouse Risks in Data Warehouse Project Management, Sid Adelman and Larissa Moss, Addison Wesley Longman, 2000.