Critical Success Factors for Data Warehouse Engineering Part 2

Published in TDAN.com April 2000

Articles in this series – Part 1, Part 2


CSF: Data Warehouse Technology

Only after the data warehouse architecture has been defined should an enterprise begin selecting and implementing its data warehouse technology. Otherwise, there is a high probability that the
technology will not support enterprise requirements. Further, if the enterprise data warehouse solution is designed for a specific technology it will be difficult, if not impossible, to change
technologies as requirements change and as technologies improve and mature. There are more and more technologies available to support enterprise data warehousing. They can be conveniently grouped
into user interfaces, warehouse engines, hardware platforms, system software, and security.

Data Warehouse users get useful information from the data warehouse and data marts through user interfaces. It is these user interfaces that have the most impact on how effective and useful the
data warehouse will be perceived. Therefore, users must be actively involved in selecting their own interface to the data warehouse. Two primary criteria for selecting an effective user interface
are ease of use and performance. For ease of use, most enterprises turn to graphical user interfaces. For performance, developers must ensure that the hardware/software platform fully supports and
is optimized for every chosen user interface. The most important selection criteria for user interfaces are the information needs and the level of computer literacy of potential users who will
retrieve the information they need from the data warehouse. The following data warehouse user categories are based on levels of literacy and information needs:
Information Systems Challenged – data warehouse users who are hopelessly lost when it comes to information systems. In management roles they rely on their secretaries or assistants to retrieve
information for them. These users need an extremely easy to use and highly graphical interface or standard queries and reports with a limited number of parameters.
Variance Oriented – users who are focused on the variances in numbers over time. These users mainly want a set of standard reports that they can generate or receive periodically so that they can
perform their analyses.
Number Crunchers – users who are spreadsheet aficionados. They will take whatever data are available and refine it, re-categorize it and derive their own numbers for analyzing and managing the
enterprise. Their needs can best be met by providing a spreadsheet extract output format for any reports or ad hoc queries provided.
Technically Oriented – users who are either already familiar with computers or have sufficient motivation to learn and use everything they can get their hands on. These people want to have complete
control over the way they retrieve and format information. They are often business or systems analysts who have moved into an enterprise function. They want to have all of the tools the data
warehouse development staff uses.
Most enterprises have all of these categories of individuals. This makes it advisable to provide each type of data warehouse user interface.
The final user interface criterion is that it supports the access meta data designed for the data warehouse. If a user interface is easy to use, allows all potential users to get the information
they need in the format they need, and does it in an acceptable amount of time, it is the right interface.
Once the information requirements and meta data for a data warehouse have been identified and documented, user interfaces have been designed, and the data warehouse structure has been selected, a
data warehouse engine that will support the data warehouse and all access approaches should be selected. Key issues include capability for loading information into the data warehouse, implementing
access control (security) mechanisms and support for one or more user interface tool sets. The architecture, performance requirements, and overall size of the data warehouse will determine software
requirements. For example, a data warehouse that includes data marts will require not only relational technology, but also multidimensional access and a client/server architecture.
The selection of one or more hardware platforms involves answering the following questions: How much data will be in the data warehouse and how much can the platform economically accommodate? How
scaleable is the platform? Is it optimized for data warehouse performance? Will the platform support the software selected for the data warehouse? How many users will simultaneously access the data
warehouse? Will their queries be simple or complex? These are the most important criteria for selecting hardware to support a data warehouse. In answering these questions it is important to
consider all hardware platform characteristics; not just CPU speed and disk capacity, but memory capacity and the input/output system capabilities as well. I/O capacity is often the most critical
to overall data warehouse performance. While increasing the number of servers can usually increase memory and CPU capacity, increasing I/O capacity is not as simple. Nevertheless, it is vital that
the hardware platform(s) supporting a data warehouse have sufficient capacity. This often requires multiple, independent I/O channels or busses.
Data warehouse capacity planning is not an exact science. Underestimating is the rule rather than the exception. Some experts advise doubling initial estimates of hardware requirements because data
warehouse users and query complexity increases exponentially over the first few months after initial data warehouse implementation. Even with sufficient initial capacity, it is critical to choose
scaleable systems to support inevitable but hard-to-quantify future growth.
Despite the many vagaries of data warehousing and the relative youth of the field, early adopters and vendors agree on a few general rules when estimating server capacity.
Small databases, simple queries — LAN (local area network) servers, with a single I/O bus are appropriate for data marts where the database is under 5GB.
Medium to large databases, more complex queries — Response time is faster on SMP (symmetric multiprocessing) systems than it is on uniprocessors, and they tend to be more cost effective the
systems where nothing is shared. Large amounts of memory reduce outside seek time during queries, speeding performance when querying large databases. Conventional wisdom suggests that SMP machines
begin to exceed capacity between 500GB and the low terabyte range. Good performance for a medium-sized database also requires at least two I/O channels. As the size of the database and complexity
of queries grows, more I/O channels are needed to maintain performance.
Very large databases, very complex queries — very large data warehouses (up to 5 terabytes) require clusters of SMP servers or MPP (massively parallel processor) servers. The platforms with the
best performance for very large databases, excluding MPP’s tend to be the ones with a large number of I/O channels.
Huge databases, extremely complex queries — data warehouses that exceed 10 terabytes may need the processing power and I/O channels provided by mainframe systems.

 



Concurrent with hardware selection is the selection of system software to support the data warehouse. The operating systems must support the selected user interfaces, data warehouse structure and
warehouse engine.

Data warehouse security includes both user access security and physical data security. A data warehouse is a read-only source of enterprise information; therefore developers need not be
concerned with controlling create, update and delete capabilities through access security. But, developers do need to address the trade off between protecting a valuable corporate asset against
unauthorized access and making the data accessible to anyone within the enterprise who can put it to good use. The best solution is to allow everyone in the enterprise to have access to the
enterprise measure definitions and derivations, but only allow access to the underlying detailed data on an approved, need-to-know basis. Developers also need to provide sufficient data
security
, through backup, off-site storage, replication, fault-tolerant and/or redundant hardware, etc., to protect the data from loss due to power failures, equipment malfunction, sabotage,
and so on.


CSF: Information Quality

The single most important success factor for data warehousing is the quality of information provided to data warehouse users. Data in the data warehouse must be of the highest possible quality. It
must be accurate, relevant, complete, and concise. It must be timely and current. It must be presented in a way that is clear and understandable. A data warehouse that contains trusted, strategic
information, becomes a valuable enterprise resource for decision makers at all organizational levels. If it’s users discover that it contains bad data, the data warehouse will be ignored and will
fail. Worse, if it contains bad data, but its users never find out and make decisions based upon the data, it is possible that the enterprise will fail.


CSF: Development Infrastructure

The most ignored critical success factor is the one that can have the greatest impact. In order to consistently design, develop, and implement a data warehouse, an enterprise must have a
development infrastructure that uses best practices and techniques. The elements of this infrastructure include project teams, methodology, and
tools.

In addition to consistent management commitment and sponsorship and data warehouse user involvement, there is another critical enterprise culture element. The teams that will be actually designing,
developing, implementing, and managing the enterprise data warehouse must have certain characteristics. They must understand the importance of strategic information. They must be able to analyze
and document business requirements in business language. They must be dedicated to the data warehousing project. They must have sufficient resources. They must practice effective project
management. Every team member must have appropriate skills, knowledge and experience (see below), be sufficiently familiar with the enterprise development methodology, and be able to effectively
use the enterprise data warehouse engineering tool set.
Enterprises that consistently produce quality information systems rigorously use a full life cycle development methodology. Such a methodology is characterized by a sequence of interrelated steps
beginning with determining business requirements and resulting in system design, development, and implementation. The Software Engineering Institute, which established the industry-standard,
software development capability maturity model (CMM), declares that a methodology is an absolute necessity in order to be an effective software developer.
Having a strategically-driven, customer-focused, information-centric, model-based, disciplined, rigorous, and repeatable methodology is absolutely essential for successful data warehouse
engineering.
A data warehouse is too complex and too massive to be developed using manual methods. Development tools such as Computer Aided Systems Engineering (CASE) tools, repositories and fourth/fifth
generation programming languages are useful for data warehouse engineering. In addition, there are several Executive Information System (EIS) and Decision Support System (DSS) tools that can help
with data warehouse access. There are also many special purpose data warehouse tools including middleware and data integration/transformation tools.
Some combination of these tools are necessary to quickly and effectively develop and maintain a data warehouse. The specific tool set will an enterprise uses will depend upon its data warehousing
needs. No matter what tools are used, it is important that the tools work together and that they can be used within the enterprise’s chosen technology environment.
A specialized set of skills and knowledge is required to efficiently develop a data warehouse. They include experience with online analytical processing (OLAP) tools and systems integration; strong
technical background with emphasis on operating systems, data bases, decision support tools, user interfaces and client-server; high conceptual level of relational theory; strong communication
(speaking and writing) skills; and the ability to interact with everyone in an organization from office workers to the CEO. The necessary skills and knowledge may be acquired by hiring experienced
consultants, or by training internal staff. The most effective approach is for consultants to begin development while helping internal staff become skilled so that the enterprise eventually becomes
self-sufficient.

Summary

Data warehouse engineering is not like normal application development. Its scope is broader, its visibility is greater, its user community is larger, and it is more prone to failure.

Before beginning a data warehouse project, an enterprise should evaluate whether it has adequately addressed the critical success factors for data warehouse engineering.

Management
Potential Users
Strategic Plan
Performance Measures
Enterprise Data
Information Systems
Enterprise Technology
Data Model
Meta data
Components
Structure
User Interface(s)
Warehouse Engine(s)
Hardware Platform(s)s
System Software
Security
Project Teams
Methodology
Development Tools
Skills & Knowledge

Articles in this series – Part 1, Part 2

Copyright © 1999 Visible Systems Corporation

Share

submit to reddit

About Alan Perkins

Alan Perkins has been an executive consultant for a Big Four firm, senior enterprise architecture consultant for a large international IT solution provider, vice president of consulting for a software engineering tools company, general manager of a high-tech consulting firm, vice president of system R&D for a virtual corporation, chief information officer for a technology transfer consortium, chief solutions architect for a repository technology, systems analyst on the White House staff, and director of the U.S. Army Data Processing School in Germany. He has provided information and enterprise management consulting to numerous companies, associations and government agencies.

In his SRA role as Principal Enterprise Architecture Consultant, Alan guides and advises clients in order to help them achieve both alignment of IT with business and convergence of IT and business strategies to improve readiness, reliability and resilience. Alan is also leading the internal effort to implement a world-class sustainable career development program for SRA enterprise architects. He may be contacted by e-mail at alan_perkins@sra.com.

Top