Business intelligence (BI) is the activity of monitoring the key business processes (KBPs) of an enterprise for key performance indicators (KPIs) of its business activities measured across various
dimensions of the business environment. Such Intelligence helps the enterprise to make appropriate changes to stay responsive to the customer requirements, forces of competition and the changing
constraints in the landscape of the business ecosystem. This agility in response to the environmental stimulus largely differentiates an entity’s survival.
Data mining is the activity of identifying interesting patterns that are not obvious. These patterns could be association between various dimensional values. Some of these patterns could be
classification and clustering that could spot commonalities between entities having apparently unrelated dimensional values. These patterns could be sequences, periods and time intervals between
activities or outliers and predictions based on past data. Data mining adds to the arsenal of business intelligence techniques.
In this paper, we apply this concept of business intelligence for analyzing the data architecture of an enterprise. First we identify the processes and the measures in each of them. Next, we identify
the various possible dimensions. Finally, we list the probable benefits that could be achieved by organizing and analyzing this data architecture model repository.
KEY BUSINESS PROCESSES, KEY PERFORMANCE INDICATORS AND THE MEASURES NEEDED
In our architecture dimensional nodel, we need to list all the key processes that we would like to monitor, the measures in these processes, and the formula to calculate the KPI from these
measures. Later, we will inquire what kind of intelligence can be derived from monitoring the dimensions and measures, and their change observed over other dimensions and time. To identify the key
performance indicators of data architecture, we can look to the service level agreements for quality parameters. These parameters address the non-functional requirements driving the design of data
The key processes to monitor could include, but are not limited to:
- Designing data architecture for a new enterprise may be like a startup working on domain-specific high performance chip design.
- Designing data architecture for a new industry or domain, like Second Life.
- Extending an existent data architecture for a new user community (like children below 10 years of age).
- Data quality across significant business processes.
- Designing data architecture for NFR.
A representative list of the measures in these processes could include, but are not limited to:
- Total cost of ownership for data architecture implementation.
- Level of encryption in static (tape and database disks) & data on the move (over network and in database memory and applications).
- Data threat detection and prevention mechanisms and attacks identified.
- Reliability (mean time between failures), recovery (mean time to recover), performance [(number of seconds taken – number of seconds per SLA) * (Weightage based on the severity of the use
- Performance in terms of backup & recovery, encryption, compression, replication, extract – transform – load (ETL) window.
- Cost of change. This is measured in terms of time, man-hours effort, cost in money, and the risk of change.
- How competition is doing in comparison?
- User community reached as a measure of accessibility. Mobile and Internet connection might play sub-dimensions, as would accessibility to differently abled people. Also various classes of
users including those external to the enterprise. Studying the ability to leverage user participation and interest in shaping the enterprise or even industry direction.
MULTIDIMENSIONALITY OF DATA ARCHITECTURE
A traditional data warehouse model uses the business dimensional model to analyze its business. For analyzing data architecture, we want to understand the multidimensionality of an enterprise
data architecture solution along with a similar understanding of the problems it solves. In the following sub-sections we discuss the various classes of dimensions and their interrelationships. There
are classifications such as dimensions and sub-dimensions, co-existent and mutually exclusive dimensions of an enterprise at any instant of time, static and mutable dimensions, finally overlapping
and disjoint sub-dimensions. They also change over time assuming varying levels of significance.
Static or Fixed Dimensions and Mutable Dimensions
Certain dimensions are fixed from the birth of an enterprise or an enterprise’s architecture due to the domain or the nature of data. Some examples are geographical applications that use
spatial data types, and spatio-temporal applications that use spatial and temporal data types. Because of the nature of their application, these data types are fixed and the probability that they
will change is very miniscule. An example from another domain could be the place and date of birth, or the first transaction date of a customer. This is a static dimension that will never change. But
there are other dimensions that are mutable, and which could change over the lifetime of the enterprise. For instance, the latency of the data available for analytics gets reduced with the advent of
technology and the competitive business
Environment, forcing everyone to get latest information for accurate decision making.
To give an example from another domain, the gender of an employee in a human resource application is a static dimension, while the relative employee performance rank, education, skill sets, role and
designation are all slowly changing dimensions. Though facts are largely monitored, often times, some business intelligence is derived when we measure the rate of change of dimensions (such as
“How many promotions did an employee get during the past three years?”) where we measure the rate of change of designation of the employee over time. Such rate of change of any measure or
dimension with respect to some other measure or dimension yields interesting results such as in CRM. In the telecom industry, how many times a customer has changed his mobile phone talk-time plan in
the last quarter could tell something about the customer’s happiness, or predict a churn.
Dimensions and Sub-Dimensions (both overlapping and disjoint) of Data Architecture
Within each dimension, there could be many independent perspectives. So we have classified data administrators’ and data users’ sub-dimensions into one stakeholder’s dimension. The
sub-dimensions could be overlapping like domain and data type. They could also be disjoint like data administrators and data users.
Significance of Dimensions and Abstraction Levels of Models
Though the dimensions could be many for a particular data architecture, a few of them assume prominence and have a larger say over data architecture decisions. Various abstractions could be modeled
depending on the significance of the dimensions portrayed in that particular model. So a model at a very high level would have probably two or three dimensions that predominantly decide the
architecture. As we go to more detailed abstractions, the other subtle dimensions could be shown with appropriate interactions.
Co-Existent and Mutually Exclusive Dimensions at any Point in Time
Examples of co-existent dimensions are data types and domains. Another example could be enterprise data that is structured. There are dimensions whose value do not co-exist, such as open source
product data architecture, but falls under defense security classification.
Single and Multi-Valued Dimensions at any Point in Time
Domain of a data architecture is always single valued. Product or project data architecture is also single valued dimension. Some dimensions are multi-valued like types of administrators and types of
users of data. Another example is data type (structured, unstructured) and nature of data (master, transaction, metadata, audit, reference, and external).
Time Variant Polymorphism of the Dimensions
The architectural dimensions are a nebulous bunch of axes that come into play, depending on the severity of the situation that is faced. Here the model emerges out of the concerns of that particular
scenario. The meta-model consists of a mix of all these dimensions and has certain dimensions and their
relationships prominent in the context of the current situation.
For example a product could be successful within a particular enterprise. And it might assume widespread popularity among the user community and regulatory bodies that it is prescribed for the entire
industry. Now the scalability and security dimensions assume more importance than the functionality that
the product was trying to achieve earlier. Here the architecture will now be measured against a different class of dimensions that have assumed prominence.
Dimensions and Sub-Dimensions
In this sub-section, we see the more common dimensions which are found in a typical enterprise. For a list of possible dimensions and a detailed discussion with examples, please use reference
- Requirements Classification View (Functional and Non-Functional).
- Data Types.
- Structured (Hierarchical, Network, Object Oriented, Relational, Graph, Semantic).
- Unstructured (Multi-media – Video, Image, Audio).
- Semi-Structured or Extensible Structured (XML, Text, Document).
- Domain Specific (Geo-Spatial, Temporal, Spatio-Temporal, Retail Domain, Data Architecture Domain).
- Stakeholders or People.
- Data Administrator’s / Stewardship View (Planner’s/ Custodian’s / Trustee’s) view.
- Data Users View (Strategic-Tactical-Operational, Internal-External, Owners-Users, Creators-Consumers).
- Layering of Data Architecture.
- Horizontal Data Flow Layer from Data Source to Data Users.
- Vertical Layered Architectures that depict the static view of Data.
- Domain View (Retail, Health Care and Insurance, Biological Life Sciences, Weather, Banking and Capital Markets).
- Nature of Data (Master, Metadata, Audit, Transaction, Lookup, External Data).
SIGNIFICANCE OF DATA ARCHITECTURE WAREHOUSE
Yes, you have read the title of this section correctly. It is not data warehouse architecture, but a warehouse for data architecture models. Data architectures can be represented by
architectural domain types and stored by periodically collecting snapshots, or a snapshot of them during their significant life cycle activities, or every change that is made on them as a change
management induced transaction.
This is similar to monitoring various types of facts (transaction, periodic snapshot, and cumulative snapshot) in a normal commerical enterprise.
The use of such a warehouse would be to know:
- What (Subject / Entity) has happened?
- When (Temporal)?
- Where (Geo/Spatial)?
- Why (Analytic Logic)?
- Who (People) where involved in it?
- What might happen in the future (Predictive)?
- What other data (read requirement, architecture) is related to this occurrence, and how it happened?
- What if other things remaining the same, some of the causal factors changed (What if)?
- Data mining used to find:
b. What are the target domains for which an architecture style suits best?
c. What are the sequences of incidents that occur over the life cycle of particular data architecture?
d. Applications  in
i. Targeting user training by data stewards, customer relationship management (CRM) systems to measure data users’ satisfaction, cross-selling data services, users’ segmentation for
ii. Forecasting for database capacity planning, customer retention for data stewards to measure their users’ satisfaction, comparative data architecture in the industry used by
iii. Fraud detection for data security compromises in various data architectures.
ii. Database segmentation for designing distributed databases (decide on fragments and allocations based on users and the complexity of their queries).
iii. Interrelationships between non-functional requirements (scalability and security, security and performance).
iv. Intrusion detection in regular data access patterns.
This will let one query this warehouse defining the requirements, and the architectural choices narrow down as the requirement specification progresses. At the end, there could be a clustering
algorithm which gives all related architectures, rank them in order of whatever dimension we have chosen according to the weightage we have provided. Importantly, we can see the time variant nature
of similar architectures, and trace the evolution, and fit it using the past to extrapolate the future.
INTERCONNECTIONS AT INTERSECTS OF THE DIMENSIONAL MODEL
Some of the inter-connects that we have observed are :
- Master (design time data model) and transaction (for example, daily set of ETL processing metadata) data sub-types in metadata.
- Distributed database techniques used to design master data repositories and metadata repositories.
- Data mining used to validate data model or metadata.
- Data profiling and data mining for achieving data quality.
- Data quality techniques for refactoring out master data hub from a data warehouse.
- Replication for data warehouse’s ETL, mobile database synchronization in disconnected computing, distributed databases, performance and network load reduction by building redundancy,
designing database recovery structures.
These examples are indicative that a certain pattern used in a context, can be re-used for solving another similar problem in another context, like the distributed databases architecture used in
designing metadata repositories that could be distributed. Insofar as human perceptions would allow, these could be documented. But there are many such patterns that abound in the various
architectures that are present all over the world. A data architecture repository effort could bring all that in one place, and analysis and mining techniques could find many more such patterns that
might not meet the human eye. So far, no literature is available on this area of mining architecture for automatic discovery of patterns, and finding
solutions for architectural problems by matching attributes between problems and solutions repositories. This approach is innovative and could be taken further for implementation by further
We have done a critical analysis of the data organization problem in the context of various environments that affect the selection and design of a solution to it. We have also proposed a
repository mechanism to store, match and analyze the various problems and solutions with respect to their various dimensionalities. We observe behaviors of these solutions in terms of their key
measures that will be vital to improving their services to the functioning enterprise.
Over a period of time, we would be able to see the evolution, interesting patterns and classifications that emerge out of this data capture of the architecture’s behavior. Apart from being a
repository of problems, solutions, and analyzing them, we would also be able to match solutions to problems from non-obvious relationships by dropping some dimensions, and by giving more significance
to others. This is the way we see an enterprise being able to respond in an agile manner to the impulses it receives from the changes happening to the enterprise’s external environment. This we
call the intelligence of our business of data architecture. More than the technique, the realization that such a vast amount of intelligence is available for having an edge over the competition is
the first step of an intelligent architecture-enabled enterprise.
The first two authors Sundara_rajan and Anupama_Nithyanand are grateful to their mentor and third author, S V Subrahmanya, Vice President at E-Commerce Research Labs for seeding and nurturing this
idea, and Dr.T.S. Mohan, Principal Researcher, Dr. Ravindra Babu Tallamraju, Principal Researcher, and Dr.Sakthi Balan Muthiah, Manager-Research at E-Commerce labs at Education & Research,
Infosys Technologies Limited, for their extensive reviews and expert guidance in articulating these ideas. The authors would like to thank all their colleagues and participants of authors’
training and knowledge sharing sessions at Infosys Technologies Limited, and contributed positively to these ideas.
The authors would like to acknowledge and thank the authors and publishers of referenced papers and textbooks, which have been annotated at appropriate sections of this paper, for making available
their invaluable work products which served as excellent reference to this paper. All trademarks and registered trademarks used in this paper are the properties of their respective owners /
- Dan Sullivan, Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing and Sales, John Wiley & Sons, Inc.,
- David C. Hay, Data Model Patterns: A Metadata Map, Morgan Kaufmann, 2006
- Barry Devlin, Data Warehouse: From Architecture to Implementation, 1996
- Ralph Kimball and Margy Ross, Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd Edition, John Wiley & Sons, 2002
- W. H. Inmon, Building the Data Warehouse, Wiley, 2005.
- “The Zachman Framework for Enterprise Architecture”, Zachman Institute for Framework Architecture (www.zifa.com, www.zachmaninternational.com)
- Kamran Parsaye and Mark Chignell, Intelligent Database Tools & Applications, John Wiley & Sons,1993.
- Peter Cabena, Pablo Hadjinian, et al, Discovering Data Mining: From Concept to Implementation, Prentice Hall, 1997.