Published in TDAN.com April 2000
Articles in this series – Part 1, Part 2
CSF: Data Warehouse Technology
Only after the data warehouse architecture has been defined should an enterprise begin selecting and implementing its data warehouse technology. Otherwise, there is a high probability that the
technology will not support enterprise requirements. Further, if the enterprise data warehouse solution is designed for a specific technology it will be difficult, if not impossible, to change
technologies as requirements change and as technologies improve and mature. There are more and more technologies available to support enterprise data warehousing. They can be conveniently grouped
into user interfaces, warehouse engines, hardware platforms, system software, and security.
data warehouse will be perceived. Therefore, users must be actively involved in selecting their own interface to the data warehouse. Two primary criteria for selecting an effective user interface
are ease of use and performance. For ease of use, most enterprises turn to graphical user interfaces. For performance, developers must ensure that the hardware/software platform fully supports and
is optimized for every chosen user interface. The most important selection criteria for user interfaces are the information needs and the level of computer literacy of potential users who will
retrieve the information they need from the data warehouse. The following data warehouse user categories are based on levels of literacy and information needs:
information for them. These users need an extremely easy to use and highly graphical interface or standard queries and reports with a limited number of parameters.
perform their analyses.
enterprise. Their needs can best be met by providing a spreadsheet extract output format for any reports or ad hoc queries provided.
control over the way they retrieve and format information. They are often business or systems analysts who have moved into an enterprise function. They want to have all of the tools the data
warehouse development staff uses.
they need in the format they need, and does it in an acceptable amount of time, it is the right interface.
data warehouse engine that will support the data warehouse and all access approaches should be selected. Key issues include capability for loading information into the data warehouse, implementing
access control (security) mechanisms and support for one or more user interface tool sets. The architecture, performance requirements, and overall size of the data warehouse will determine software
requirements. For example, a data warehouse that includes data marts will require not only relational technology, but also multidimensional access and a client/server architecture.
scaleable is the platform? Is it optimized for data warehouse performance? Will the platform support the software selected for the data warehouse? How many users will simultaneously access the data
warehouse? Will their queries be simple or complex? These are the most important criteria for selecting hardware to support a data warehouse. In answering these questions it is important to
consider all hardware platform characteristics; not just CPU speed and disk capacity, but memory capacity and the input/output system capabilities as well. I/O capacity is often the most critical
to overall data warehouse performance. While increasing the number of servers can usually increase memory and CPU capacity, increasing I/O capacity is not as simple. Nevertheless, it is vital that
the hardware platform(s) supporting a data warehouse have sufficient capacity. This often requires multiple, independent I/O channels or busses.
warehouse users and query complexity increases exponentially over the first few months after initial data warehouse implementation. Even with sufficient initial capacity, it is critical to choose
scaleable systems to support inevitable but hard-to-quantify future growth.
systems where nothing is shared. Large amounts of memory reduce outside seek time during queries, speeding performance when querying large databases. Conventional wisdom suggests that SMP machines
begin to exceed capacity between 500GB and the low terabyte range. Good performance for a medium-sized database also requires at least two I/O channels. As the size of the database and complexity
of queries grows, more I/O channels are needed to maintain performance.
best performance for very large databases, excluding MPP’s tend to be the ones with a large number of I/O channels.
warehouse engine.
Data warehouse security includes both user access security and physical data security. A data warehouse is a read-only source of enterprise information; therefore developers need not be
concerned with controlling create, update and delete capabilities through access security. But, developers do need to address the trade off between protecting a valuable corporate asset against
unauthorized access and making the data accessible to anyone within the enterprise who can put it to good use. The best solution is to allow everyone in the enterprise to have access to the
enterprise measure definitions and derivations, but only allow access to the underlying detailed data on an approved, need-to-know basis. Developers also need to provide sufficient data
security, through backup, off-site storage, replication, fault-tolerant and/or redundant hardware, etc., to protect the data from loss due to power failures, equipment malfunction, sabotage,
and so on.
CSF: Information Quality
The single most important success factor for data warehousing is the quality of information provided to data warehouse users. Data in the data warehouse must be of the highest possible quality. It
must be accurate, relevant, complete, and concise. It must be timely and current. It must be presented in a way that is clear and understandable. A data warehouse that contains trusted, strategic
information, becomes a valuable enterprise resource for decision makers at all organizational levels. If it’s users discover that it contains bad data, the data warehouse will be ignored and will
fail. Worse, if it contains bad data, but its users never find out and make decisions based upon the data, it is possible that the enterprise will fail.
CSF: Development Infrastructure
The most ignored critical success factor is the one that can have the greatest impact. In order to consistently design, develop, and implement a data warehouse, an enterprise must have a
development infrastructure that uses best practices and techniques. The elements of this infrastructure include project teams, methodology, and
tools.
developing, implementing, and managing the enterprise data warehouse must have certain characteristics. They must understand the importance of strategic information. They must be able to analyze
and document business requirements in business language. They must be dedicated to the data warehousing project. They must have sufficient resources. They must practice effective project
management. Every team member must have appropriate skills, knowledge and experience (see below), be sufficiently familiar with the enterprise development methodology, and be able to effectively
use the enterprise data warehouse engineering tool set.
beginning with determining business requirements and resulting in system design, development, and implementation. The Software Engineering Institute, which established the industry-standard,
software development capability maturity model (CMM), declares that a methodology is an absolute necessity in order to be an effective software developer.
engineering.
generation programming languages are useful for data warehouse engineering. In addition, there are several Executive Information System (EIS) and Decision Support System (DSS) tools that can help
with data warehouse access. There are also many special purpose data warehouse tools including middleware and data integration/transformation tools.
needs. No matter what tools are used, it is important that the tools work together and that they can be used within the enterprise’s chosen technology environment.
technical background with emphasis on operating systems, data bases, decision support tools, user interfaces and client-server; high conceptual level of relational theory; strong communication
(speaking and writing) skills; and the ability to interact with everyone in an organization from office workers to the CEO. The necessary skills and knowledge may be acquired by hiring experienced
consultants, or by training internal staff. The most effective approach is for consultants to begin development while helping internal staff become skilled so that the enterprise eventually becomes
self-sufficient.
Summary
Data warehouse engineering is not like normal application development. Its scope is broader, its visibility is greater, its user community is larger, and it is more prone to failure.
Before beginning a data warehouse project, an enterprise should evaluate whether it has adequately addressed the critical success factors for data warehouse engineering.
Potential Users
Performance Measures
Information Systems
Enterprise Technology
Meta data
Components
Structure
Warehouse Engine(s)
Hardware Platform(s)s
System Software
Security
Methodology
Development Tools
Skills & Knowledge
Articles in this series – Part 1, Part 2
Copyright © 1999 Visible Systems Corporation