Published in TDAN.com July 2002
Data Warehouse Design: Principles Persist, Architectures Adapt
Designing a data warehouse (DW) environment is not nearly as straightforward as it was 5-10 years ago. Basic approaches had architects extracting data periodically from transaction databases and
master files, performing some degree of precalculation or preintegration, inserting it into a database, then slapping a query/reporting tool on top of it. DW architects should continue to respect
the basic tenets of data warehousing (e.g., the subject-oriented, time-variant, non-volatile, and historical nature of the data model), while enabling exceptions and variations of analytic
architectures to meet specific analytic needs.
Synchronized Swimming: Metadata and Data Quality
The ability of enterprises to consume (and thereby leverage) their information assets to positively affect business performance is predicated on their ability to understand and trust the
information. Attention paid to metadata management and data quality can significantly improve the value of information to an enterprise. Furthermore, our research indicates that superior
metadata and data quality efforts feed off each other, and therefore should be synchronized.
Ranking DW Technology Selection Criteria
In our latest data warehouse (DW) study, technology performance ranks first among buyers (75% ranking it “critical”), followed closely by scalability (60%), and cost of ownership
(48%). Implementation time is noted as either “important” or “critical” by 86% of buyers, second only to performance (94%). Of little interest to buyers (ranking the criteria as “less
important” or “unimportant”) are whether the technology is industry-specific (28%), includes packaged business content (37%), and what its market share is (48%). Surprisingly, customizability is
seen as critical by only 15% of buyers – those not deluded by “out-of-the-box” expectations. Through 2004, business content/function, architecture, and demonstrable ROI will increase in
importance as competitiveness, integration, and financial prudence become enterprise watchwords.
Data Integration: Niche Guys Finish Last
With the escalation of data management issues (e.g., increasing velocity of data through information supply chains, increasing variety of data sources to be integrated, increasing volume of data to
be managed), enterprises must adopt data integration solutions that accommodate a broader set of enterprise needs. Niche data integration tools with singular abilities (e.g., data warehouse
populating, application-level integration, database replication, data synchronization, virtual data integration) will survive (if not thrive) through 2004/05. However, by 2006/07, hybrid
data integration technologies that provide a single mapping interface and multiple data integration modes will dominate.
Measuring Information Management Maturity
Information is becoming recognized as a critical business currency that drives business performance and enhances business partnerships. Information is thereby understood as an important corporate
asset. Enterprises must adopt a method for gauging their “information maturity” – i.e., how well they capture, manage, and leverage information to achieve organizational goals.
Data Quality Maturity
Level 5: Optimized
assets. DQ is an ongoing strategic enterprise initiative with demonstrable ROI. Fringe DQ characteristics are continuously measured and monitored, data is enriched in real time,
and complex relationships within and among business allies are captured. Unstructured information is subject to DQ controls, and data is tagged with quality indicators. Data is sufficiently
high-quality for confident real-time business process automation across business processes, and DQ-related data administration governance is automated.
Level 4: Managed
function. DQ is regularly measured and monitored for accuracy, completeness, and integrity at an enterprise level, across systems. DQ is linked to business issues and process
performance. Most cleansing/standardization functions are performed where data is generated. DQ functions are built into major business applications, enabling confident operational decision making,
and DQ-related policies are well established and regulated.
Level 3: Proactive
now part of the IT charter, and major DQ issues are documented but not well quantified. Data cleansing is typically performed downstream (e.g., by departmental IT or in the data warehouse) by
commercial DQ software, where record-based batch cleansing (e.g., name/address), identification/matching, deduplication, and standardization are performed. These processes mend data sufficiently
for strategic and tactical decision making. DQ-oriented data administration guidelines have been issued but are not monitored or enforced.
Level 2: Reactive
simple edits/controls to standardize data formats, and some manual or homegrown batch cleansing is performed at a departmental/application level within the application database. Employees perceive
information as a lens to greater business process understanding and improvement, but data quality throughout the enterprise may be sufficient only to perform high-level strategic decision making.
Those most affected by DQ issues are field or service personnel who rely on access to correct operational data to perform their roles effectively.
Level 1: Aware
achieve DQ needs on an ad hoc basis by individuals with pressing needs (e.g., generating a clean mailing list or a disposable data extract through custom coding). The organization tends to ignore
the problem or hope it will go away when new/upgraded systems are installed. Information is perceived as an occasionally interesting application byproduct, and often customers,
partners, and suppliers are more annoyed by DQ issues within the enterprise than employees themselves.
Lack of a Scorecard for Information Management Capabilities
Only by measuring information maturity can organizations hope to establish appropriate programs, policies, architecture, and infrastructure to manage and apply information better.
While sketchy information management excellence assessment criteria abound (e.g., Baldrige National Quality Program), our research exposes the lack of a detailed, stratefied scorecard for
information-related concepts.
Generating Generally Accepted Information Principles
Through 2002/03, we believe recognizing information as part of the IT portfolio will prompt leading organizations to fashion custom indicators and a scorecard for measuring data quality. By
2004/05, a standard information maturity model/index will be published that considers key concepts such as data quality, information architecture, information governance,
information usage, metadata, and information infrastructure/operations. And by 2006/07, leading organizations will submit to regular independent information audits or information management
certification processes.
Five-Year Data Warehouse Trends
Through 2002/03, data warehouse (DW) users will demand real-time access to many data sources, but will also come to realize that balancing information latency with decision cycles can save
infrastructure costs. By 2004/05, infrastructure components (e.g., middleware, DBMS), not just front-end functionality (e.g., portals), will enable self-service data warehousing for access
and analysis of data not explicitly designed into the DW environment. An explosion of unstructured information (e.g., text, graphics, audio, video), along with the palpable business need to access
it and integrate it with structured data, will engender a new class of infrastructure components and architecture alternatives by 2006/07.
Analytic Architecture Alchemy
Analytic architectures increasingly must support larger user communities, incorporate a greater variety of information sources, and accommodate robust analytic functions. Building
scalability and flexibility into information architectures to handle emerging analytic requirements requires piecing together an expanding assortment of analytic architecture concepts and
components.
Business Information Training
IT organizations should provide periodic information exploitation training to business users, educating them on the data available for analysis (usually data warehouse structure). This education
should cover what data is available/not available, what it means, where it derives from, how it can be accessed/related/sliced, what its characteristics are (e.g., update frequency, volume,
quality, privacy, security), and examples of how it can be used. Business information training can expand information’s value, accelerate decision-making, empower business users,
and bridge IT/business gaps.
Spurious, Lagging Financial Indicators Limit Business Performance
Particularly in the current climate of financial reporting mistrust, enterprises must develop and leverage performance measures representing key business concerns (e.g., customers,
products/services, suppliers, employees, marketplaces) rather than merely money. Non-financial performance indicators can breed optimized decisions, improved decision-making
coordination (especially in distributed enterprises), and more timely decisions when regularly monitored (vs. lagging financial indicators).
Used by permission of Doug Laney, META Group, Inc.
Copyright 2002 © META Group, Inc.