Author: Ralph Kimball
The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses is must reading for anyone interested in data warehousing. This classic book is the basis of one of the two major streams of data warehousing thought. The book (and accompanying CD) provides database design principles and techniques for creating a data warehouse as well as many sample dimensional data models. The design principles are woven in through the text and listed in an appendix. Key concepts include dimensional modeling, star schema, slowly-changing dimensions, and hierarchies.
The Toolkit focuses on the star schema (“stars”); stars are the pervasive design construct in the dimensional modeling approach to data warehousing. The star is simple enough to be understood by end-users, and it offers performance advantages when implemented in a relational database system.
A star schema consists of a single “fact” tables and several dimension tables. The fact table records the (generally numeric) measures the business is interested in tracking such as the amount sold in a sales transaction. The “dimension” tables provide context for the facts. They tell, for example, which store sold what product, and when the product was sold. The primary key of the fact table is a composite key made up of the foreign keys from the dimension tables.
Dimensions contain “hierarchies”—higher level columns that classify or contain lower-level columns. In the time dimension, for example, a year contains quarters, quarters contain months, and months contain weeks. A similar kind of relationship exists in the product dimension in the category, brand, and product columns. These kinds of “rollups” allow the business community to see measures at various levels of abstraction. Other common hierarchies include geography and organizations.
A Sales transaction is only one opportunity to measure and build a star schema. Each transaction in a business provides an opportunity to measure performance through a star schema. Other examples include Warehouse Inventory, Delivery Status, and Shipments. These transactions constitute part of the supply side of the value chain. But what is a “value chain”?
The Toolkit gives examples and stars for common value chains. Each business participates at some point in a value chain—a set of activities to design, produce, market, sell, and support a product or service. The value chain is sometimes split up into a “demand” side and a “supply” side. The supply side faces suppliers and is concerned with managing the acquisition and availability of goods (including purchase orders, delivery of products, inventory at the plant and warehouse, and operations). The demand side faces the customer and is concerned with product inventory and sales. Each activity in the value chain is an opportunity to measure business performance with a star schema.
There are nine “decision points” in the database (data warehouse) design; the decision points are driven by “the needs of the user community and the realities of the data.” The first decision point is choosing a business activity to model, such as purchase orders or returns. Data collected in performing the activity will become a fact table. The next decision points elaborate on the fact table, including choosing the level of granularity, fact table dimensions, columns, and columns in the dimension tables. Other design decisions involve how to track slowly-changing dimensions, aggregations, archiving requirements, and how often the data is needed. Members of the data warehousing team interview end-users and legacy system DBAs to “gather evidence” to make the nine decisions. The book provides helpful interview questions.
A Galaxy of Stars
Other topics in the book deal with the rationale for aggregates and how to store them, loading the database, query tools, the load cycle, and recommendations for enhancements to database systems for data warehousing. Appendices provide the design principles for a dimensional data warehouse, a dimensional warehouse checklist, a glossary, and information about the application on the CD. One of the really practical features of the book is the sample star schemas on the CD. The CD provides stars for Retail, Insurance, Financial Services, Subscription Services, Telecommunications, Manufacturing, among others.
The book is an excellent resource for data warehousing, and it makes many excellent points: Normalized tables don’t work very well for reporting. Star schemas are simple to understand.
Interviewing end-users is essential for gathering requirements. Most business reports are really a combination of queries. Understanding the concepts and techniques in the Toolkit will deepen your understanding of data warehousing and the technology that supports it.