METAbits – July 2002

Published in July 2002

Data Warehouse Design: Principles Persist, Architectures Adapt

Designing a data warehouse (DW) environment is not nearly as straightforward as it was 5-10 years ago. Basic approaches had architects extracting data periodically from transaction databases and
master files, performing some degree of precalculation or preintegration, inserting it into a database, then slapping a query/reporting tool on top of it. DW architects should continue to respect
the basic tenets of data warehousing (e.g., the subject-oriented, time-variant, non-volatile, and historical nature of the data model), while enabling exceptions and variations of analytic
architectures to meet specific analytic needs

Synchronized Swimming: Metadata and Data Quality

The ability of enterprises to consume (and thereby leverage) their information assets to positively affect business performance is predicated on their ability to understand and trust the
information. Attention paid to metadata management and data quality can significantly improve the value of information to an enterprise. Furthermore, our research indicates that superior
metadata and data quality efforts feed off each other
, and therefore should be synchronized.

Ranking DW Technology Selection Criteria

In our latest data warehouse (DW) study, technology performance ranks first among buyers (75% ranking it “critical”), followed closely by scalability (60%), and cost of ownership
(48%). Implementation time is noted as either “important” or “critical” by 86% of buyers, second only to performance (94%). Of little interest to buyers (ranking the criteria as “less
important” or “unimportant”) are whether the technology is industry-specific (28%), includes packaged business content (37%), and what its market share is (48%). Surprisingly, customizability is
seen as critical by only 15% of buyers – those not deluded by “out-of-the-box” expectations. Through 2004, business content/function, architecture, and demonstrable ROI will increase in
importance as competitiveness, integration, and financial prudence become enterprise watchwords.

Data Integration: Niche Guys Finish Last

With the escalation of data management issues (e.g., increasing velocity of data through information supply chains, increasing variety of data sources to be integrated, increasing volume of data to
be managed), enterprises must adopt data integration solutions that accommodate a broader set of enterprise needs. Niche data integration tools with singular abilities (e.g., data warehouse
populating, application-level integration, database replication, data synchronization, virtual data integration) will survive (if not thrive) through 2004/05. However, by 2006/07, hybrid
data integration technologies that provide a single mapping interface and multiple data integration modes will dominate

Measuring Information Management Maturity

Information is becoming recognized as a critical business currency that drives business performance and enhances business partnerships. Information is thereby understood as an important corporate
asset. Enterprises must adopt a method for gauging their “information maturity” – i.e., how well they capture, manage, and leverage information to achieve organizational goals.

Data Quality Maturity

Level 5: Optimized

Organizations at the apex of data quality (DQ) maturity (Level 5) consider information an enterprise asset (not merely an IT asset), treating it much in the same way as financial and material
assets. DQ is an ongoing strategic enterprise initiative with demonstrable ROI. Fringe DQ characteristics are continuously measured and monitored, data is enriched in real time,
and complex relationships within and among business allies are captured. Unstructured information is subject to DQ controls, and data is tagged with quality indicators. Data is sufficiently
high-quality for confident real-time business process automation across business processes, and DQ-related data administration governance is automated.

Level 4: Managed

Penultimate DQ maturity (Level 4) is indicated when information is perceived as a critical component of the IT portfolio and talked about as an enterprise asset. DQ has become a principal IT
function. DQ is regularly measured and monitored for accuracy, completeness, and integrity at an enterprise level, across systems. DQ is linked to business issues and process
performance. Most cleansing/standardization functions are performed where data is generated. DQ functions are built into major business applications, enabling confident operational decision making,
and DQ-related policies are well established and regulated.

Level 3: Proactive

Moderate data quality (DQ) maturity is achieved when information is perceived as a genuine fuel for business performance and business analysts feel DQ issues most acutely. DQ is
now part of the IT charter, and major DQ issues are documented but not well quantified. Data cleansing is typically performed downstream (e.g., by departmental IT or in the data warehouse) by
commercial DQ software, where record-based batch cleansing (e.g., name/address), identification/matching, deduplication, and standardization are performed. These processes mend data sufficiently
for strategic and tactical decision making. DQ-oriented data administration guidelines have been issued but are not monitored or enforced.

Level 2: Reactive

At Level 2 data quality (DQ) maturity, decisions and transactions are often questioned due to suspicion or knowledge of data quality (DQ) issues. Application developers implement
simple edits/controls to standardize data formats, and some manual or homegrown batch cleansing is performed at a departmental/application level within the application database. Employees perceive
information as a lens to greater business process understanding and improvement, but data quality throughout the enterprise may be sufficient only to perform high-level strategic decision making.
Those most affected by DQ issues are field or service personnel who rely on access to correct operational data to perform their roles effectively.

Level 1: Aware

Organizations at Level 1 (lowest) data quality (DQ) maturity have some awareness that DQ problems affect business execution or decision making. They have no formal initiatives to cleanse data, and
achieve DQ needs on an ad hoc basis by individuals with pressing needs (e.g., generating a clean mailing list or a disposable data extract through custom coding). The organization tends to ignore
the problem or hope it will go away when new/upgraded systems are installed. Information is perceived as an occasionally interesting application byproduct, and often customers,
partners, and suppliers are more annoyed by DQ issues within the enterprise than employees themselves.

Lack of a Scorecard for Information Management Capabilities

Only by measuring information maturity can organizations hope to establish appropriate programs, policies, architecture, and infrastructure to manage and apply information better.
While sketchy information management excellence assessment criteria abound (e.g., Baldrige National Quality Program), our research exposes the lack of a detailed, stratefied scorecard for
information-related concepts.

Generating Generally Accepted Information Principles

Through 2002/03, we believe recognizing information as part of the IT portfolio will prompt leading organizations to fashion custom indicators and a scorecard for measuring data quality. By
2004/05, a standard information maturity model/index will be published that considers key concepts such as data quality, information architecture, information governance,
information usage, metadata, and information infrastructure/operations. And by 2006/07, leading organizations will submit to regular independent information audits or information management
certification processes.

Five-Year Data Warehouse Trends

Through 2002/03, data warehouse (DW) users will demand real-time access to many data sources, but will also come to realize that balancing information latency with decision cycles can save
infrastructure costs
. By 2004/05, infrastructure components (e.g., middleware, DBMS), not just front-end functionality (e.g., portals), will enable self-service data warehousing for access
and analysis of data not explicitly designed into the DW environment. An explosion of unstructured information (e.g., text, graphics, audio, video), along with the palpable business need to access
it and integrate it with structured data, will engender a new class of infrastructure components and architecture alternatives by 2006/07.

Analytic Architecture Alchemy

Analytic architectures increasingly must support larger user communities, incorporate a greater variety of information sources, and accommodate robust analytic functions. Building
scalability and flexibility into information architectures
to handle emerging analytic requirements requires piecing together an expanding assortment of analytic architecture concepts and

Business Information Training

IT organizations should provide periodic information exploitation training to business users, educating them on the data available for analysis (usually data warehouse structure). This education
should cover what data is available/not available, what it means, where it derives from, how it can be accessed/related/sliced, what its characteristics are (e.g., update frequency, volume,
quality, privacy, security), and examples of how it can be used. Business information training can expand information’s value, accelerate decision-making, empower business users,
and bridge IT/business gaps.

Spurious, Lagging Financial Indicators Limit Business Performance

Particularly in the current climate of financial reporting mistrust, enterprises must develop and leverage performance measures representing key business concerns (e.g., customers,
products/services, suppliers, employees, marketplaces) rather than merely money. Non-financial performance indicators can breed optimized decisions, improved decision-making
coordination (especially in distributed enterprises), and more timely decisions when regularly monitored (vs. lagging financial indicators).

Used by permission of Doug Laney, META Group, Inc.
Copyright 2002 © META Group, Inc.


submit to reddit

About Doug Laney

Doug Laney, Vice President and Service Director of Enterprise Analytics Strategies for META Group is an experienced practitioner and authority on business performance management solutions, information supply chain architecture, decision support system project methodology, consulting practice management, and data warehouse development tools. Prior to joining META Group in February 1999, he held positions with Prism Solutions as a consulting practice director for its Central US and Asia Pacific regions, as a methodology product manager, and as a consultant to clients in Latin America. With data warehouse solution involvement in dozens of projects, his field experience spans most industries. Mr. Laney's career began at Andersen Consulting, where he advanced to managing batch technical architecture design/development projects for multimillion-dollar engagements. He also spent several years in the artificial intelligence field, leading the development of complex knowledgebase and natural language query applications. Mr. Laney holds a B.S. from the University of Illinois.