This article describes the steps needed to build a robust, cost-effective, and efficient data-centric organization. It begins by detailing the current state of organizations where IT departments
are mostly application-centric and data is less emphasized. Such an environment leads potentially to a crippled decision-making process; large overhead associated with multiple data sources and
dirty data; and an inability to leverage data, its definition, and its consistency across the organization. The article then outlines the benefits of a data-centric state and the necessary
approach—encompassing both processes and systems—to transition an organization into a data-centric one. We also examine the best practices in building a data-centric organization from
day one and how IT can support such a goal.
Our main objective is to identify a needed paradigm shift for organizations regarding data, its importance, and its value. The approach we describe benefits organizations tired of spending millions
of dollars and years of effort to solve specific, discrete problems using silver-bullet applications that, unfortunately, do not deliver on their promises.
The data-centric approach provides significant savings by eliminating redundant hardware and software maintenance costs. It also offers executives a 360-degree view of the business so they can
increase sales, reduce operational costs, and attain higher profits.
Many executives and decision makers are empowered with handheld computers that can deliver the latest information from across the globe with news, weather, sports, music, and more instantly. They
can be constantly connected and call all their business associates regardless of their physical location. They use their laptops from anywhere in the world to perform their daily work.
Even with such sophisticated devices at their disposal, how many of them have a complete picture of their business? How many office workers, executives, decision makers, sales people, call center
personnel, doctors, or engineers have instant access to all the data they need to make better business decisions?
Why is it that we can receive so much data from around the globe in an instant, but we cannot have access to all the data we need to make our working lives easier and achieve optimum efficiency
within our own organization?
The reason is lack of correct focus and processes within information management departments and lines of business. We spend significant time worrying about hardware and applications and less time
about data and information. Let’s begin by exploring the state of data within an organization.
The Current State of Data
After almost four decades of relying on information systems as a main component of our day-to-day business, IT departments are still struggling to capture and disseminate timely, accurate, and
consistent information to their business users—users who are faced with systems that provide incomplete views of their ever-changing business.
Data consumers in these organizations struggle for access to such information. For example, not having a complete view of a customer base is a main challenge for most large organizations. Several
customer service representatives I interviewed have only a limited picture of products sold to a client or need to connect to several systems to access it. In one case, a parts engineering and
manufacturing company had designed and built the same part 19 times because it did not know that the part was already built for other clients. (Abai et al., 2005) Imagine the waste that results
from not identifying the correct information when you need it.
In many organizations, data is stale. Data consumers cannot reach or find it in a timely manner, and it loses its value. In some organizations, for example, when the report is finally delivered to
its consumers, it is outdated and does not provide any valuable information. Another such problem occurs in information brokerage companies that supply the latest organizational knowledge regarding
the C-level executives of specific companies to their clients; too often they provide outdated information and lose valuable business because of inaccuracies.
In a large number of organizations, the main problem is the quality of data. Dirty data is one of the major problems leading to bad decisions across all organizations. At a 2002 Information Quality
Conference, a telecom company was revealed to have recovered over $100 million in “scrap and rework” costs, a bank $60 million, and a government agency $28.8 million. (Abai et al.,
Another organizational issue is data consistency. I have personally observed different data consumers at several locations within the same company ask for the same data but receive different
Another major issue is that organizations may fail to harvest their unstructured data assets. IBM estimates that 85 percent of data in organizations is unstructured. (Reimer, 2003) We define
unstructured as residing in word processing, spreadsheet, e-mail, video, audio, XML, PDF, and other document files. Few organizations have made the necessary investment in technological and human
resources to centralize the information stored within such files.
Reasons for Disarray
There are several reasons for data being in a state of disarray in organizations. In this article, we focus on the most important reasons:
Application-Centric IT Organizations
In general, information technology departments within organizations are application-centric rather than datacentric. Although the names
of such departments vary from data processing to information technology to information management, the focus has always been on the delivery of applications, not the delivery of timely, consistent,
and clean information to their clients, the business-line information consumers.
IT organizations continue to spend millions of dollars on “silver bullet” applications to solve their business problems but which continue to fail because they focus on the application
and not the associated data. Many ERP, CRM, SCM, and data warehousing applications have failed—and millions of dollars have been lost—due to the mishandling of the associated data. In
addition, many organizations seem to build redundant systems that ultimately perform the same tasks. We have all seen organizations that have implemented CRM packages (from various vendors) for
each of their departments without implementing a standard throughout the organization.
Why such an emphasis on applications and so little on the data that is the lifeblood of those applications? The reason is simple: for many data processing individuals, the art has always been in
building applications. For them, data is a byproduct of the application. Although over 90 percent of an organization’s applications only capture and disseminate data—arguably, we do not
process data anymore—the number of data professionals in most organizations is minimal (they are principally database administrators and not data architects). What is the difference?
A database administrator is focused on modeling and creating tables and ensuring the efficiency and consistency of the DBMS in his/her charge. On the other hand, a data architect is responsible for
the health of data throughout the organization, with expertise in generating organizational metadata repositories; a data architecture ensures use of the correct data integration methods, and works
to reduce and (if possible) eliminate data redundancy across the organization.
Many larger organizations have implemented the concepts of data stewards and enterprise data management (EDM) departments, but most have not provided them with the training, processes, budget, and
resources to monitor their projects. Such teams have developed standards that are not followed by other teams and have not created the governance processes that are invaluable in creating a
data-centric enterprise. In short, they set policies but have no executing power.
As technology has matured, application-centric IT organizations have improved their infrastructure and development environments, but most have failed to focus on the importance of data. Consider
one example: the buzz around service-oriented architecture (SOA). Although SOA is an important step towards a data-centric organization, it is missing an important element—data. I have yet to
see an SOA model that addresses data problems such as timeliness, cleanliness, and consistency. As in any other application-centric IT organization, it is just assumed that data is timely, clean,
Mergers and Acquisitions
Not all the blame should be directed at IT organizations. In today’s business environment, mergers and acquisitions (M&A) are standard elements of doing business. An agile company should
be able to take the opportunity to buy, merge, or sell some of its assets to capture a larger market share and increase its profit margin.
Arguably, business and IT operate at different speeds. Therefore, M&A causes data in organizations to become a complex problem. Consider a supermarket chain that merges with its local
competitor, and imagine the specific master data that needs to be accessed now from several heterogeneous systems.
In addition to having to merge all business processes (including accounting, vendor management, marketing, sales, and distribution) and overcome cultural inertia, they now have to merge the
technology as well. The IT team will have to merge point-of-sales, ERP, SCM, as well as the operational data store and data warehousing systems. In addition, several critical in-house systems built
for limited utilization will also have to be merged. This can lead to a long period during which data is not consistently and accurately retrieved from both organizations and, therefore, is
delivered either stale or dirty.
I have personally analyzed a situation like this where, four years after the merger was complete, IT departments were still working as complete, independent silos and the business line department
had to hire several “temporary” staff members to manually capture data from one company’s systems and enter it into the other company’s applications. Meanwhile, in the same
organization, I have seen marketing, sales, strategic planning, and other departments basing their decisions on partial information as their application systems lacked the full business picture.
A different and perhaps even greater challenge occurs when an organization decides to sell or swap some of its business with another. IT personnel in the selling company must decouple the
information existing in each of the systems associated with the portion of the “sold” business, and staff at the purchasing company must identify a location for the merged data. It is
not uncommon to see active data associated with a sold business in an organization’s data stores years after such a sale has been completed.
Lack of Central Data Governance Policies
Data governance is the set of guidelines, policies, and processes implemented in an organization to address ownership, security, privacy, and compliance issues related to data. It allows companies to
assess the value of data and the impact associated with its mismanagement in much the same way the mismanagement of security, privacy, and compliance can be calculated in real dollars.
Data governance is an important step missing in many organizations. Larger organizations have implemented some policies, but smaller organizations totally lack a sound data governance strategy.
Business Mistrust of IT
Often, especially in the larger organizations, the business line end users do not trust the IT department. Reasons vary, including:
- Personality and turf clashes between IT and business executives
- IT departments are often late and over budget in meeting business requirements
- IT departments often do not collaborate with business- line users throughout the development process, and business users feel left out of the process and eventually do not receive what they
have asked for
- IT departments are not agile enough for business departments
Regardless of the reason, the end result of such mistrust is that business departments often go beyond the IT department’s authority and hire their own internal or external teams to
deliver their technology needs. Though such an approach can provide short-term positive results, it may put the organization as a whole in jeopardy where data is concerned. Since these teams follow
the business line leadership and not the IT department, systems are typically developed in a vacuum as isolated silos. As a result, all the established concepts of standards, policies, and governance
are bypassed for the sake of delivering the product.
These days, outsourcing of development projects (both onshore and offshore) is becoming a norm in the marketplace. Most companies, as a cost-cutting measure, have either started or are considering
As Mark Lutchen of PricewaterhouseCoopers notes, “Twenty to 25 percent of large IT outsourcing projects fail within two years, and 50 percent fail within five years. Far too many outsourcing
initiatives miss the opportunity to return the full value potential to the client organization.” (2005)
I propose that a large number of such outsourcing companies have the technology needed to build the required applications, but lack the overall knowledge of the client organization, its standards,
and its policies. Again in this case, the outsourcing company works in a vacuum and under tremendous pressure to gain a profit. As a result, the advancement of the long-term goals of the client
organization often becomes an afterthought.
Conversely, the client IT organizations in many instances have not established the necessary change management and governance framework to ensure that such outsourcing projects are in line with the
organizational goals and standards and conform to the long term goals of the organization.
The Characteristics of a Data-Centric Organization
To discuss how to build a data-centric organization, we first need to define its characteristics.
Data Is Considered an Organizational Asset
Business and IT departments alike understand the value of data as an organizational asset. They understand that data has value, depreciates over time, and needs to be protected. From the boardroom
to the project room, everyone is informed and fully grasps the importance of the data to the organization. It might not be reflected on a balance sheet, but it will be evident in the culture of the
IT and Business Work in Unison
The problems between business and IT departments have been resolved and they have a detailed collaborative methodology for the delivery of sound IT projects. In order to resolve that, IT needs to
be more agile and business needs to be more involved in the IT activities.
Data Quality Is Priority One
Data quality has the highest level of priority. Data quality issues are dealt with not through a process, but at the source. Data is managed, integrated, and disseminated uniformly and consistently
to ensure its highest quality.
Data is Disseminated Uniformly throughout the Organization
Every consumer of a specific data subject can see a holistic view of that data subject from a unique source. In other words, all data consumers need to access the same “service” for
retrieving all the information associated with a specific subject regardless of their geographical and/or organizational location within the organization. For example, if information is needed
about a specific customer, the user has access to the service history, the products purchased, and so on from the same call, rather than having to connect to several systems.
Steps in Transforming to a Data-Centric Organization
It would be much easier to build a data-centric organization from inception. Reality, however, is quite different. Throughout the decades that many organizations have existed, the emphasis has been
placed on expanding the business and expanding the role of IT departments within the organization, not on the data.
What does it take to turn an existing organization into a data-centric one? We discuss some of the most important steps here.
It Starts at the Top
High-level IT and business line executives must want to collaborate to take the organization toward data-centricity. They must want to stop the waste and deliver an environment that could save the
shareholders millionsof dollars annually by avoiding redundant data and processes. It starts in the boardroom and trickles down to the lowest level managers.
Build an Evangelical IT Department
Successful businesses are run like tight ships. Policies, procedures, standards, and methodologies are documented and discussed, and team members are trained based on those policies and standards.
Employees are expected to follow them and make sure they are implemented.
IT departments should be no different. CIOs and other IT executives are responsible for setting the course. They should empower their senior technical staff to set the technical standards and
methodologies for developing new applications and retiring old ones. These policies and methodologies should then be communicated throughout the organization and all members of the IT department
(including tier-one vendors) should become advocates of such policies and communicate them to all team members, stakeholders, and sub-contractors. Should these procedures need to be bypassed, the
exceptions should be noted and documented for business continuity reasons, to allow for continuous modification of the standards and policies.
As part of the evangelical nature of this process, we must build enterprise architecture (EA) and enterprise data management (EDM) groups to set the policies and governance methodologies associated
with application and data architecture. Members of these two groups are respectively responsible for setting architecture and data goals, policies, standards, and governance throughout the entire
organization. They are also responsible for ensuring that these standards and policies are enforced.
Govern Application Development Activities
Regardless of whether application development is outsourced or conducted in-house, the IT department must set policies for monitoring application development and full lifecycle quality assurance.
Starting in the inception phase of a specific application project, a member each EA or EDM group must be assigned to the project to ensure that the short- and long-term goals of the organization
(from the standpoint of data) are followed. The EA and EDM team members monitor and review several applications simultaneously for conformance, and they continue to evangelize the future plan of
the organization to the project managers, architects, and DBAs within those applications.
A data-centric organization is all about measuring its achievements and successes. It is equally interested in its failures and missed opportunities. It should be able to collect data associated
with each of its plans, and to measure it based on thresholds, benchmarks, and service level agreements (SLAs) defined when those plans are established. It is only then that a successful
organization can identify problems and take actions to get back on track.
Minimize Data Redundancy
As part of its policies, the IT department must set goals to reduce data redundancy within the organizations. The new hot topics in data these days are the concepts of data integration
(DI)—specifically customer data integration and master data management. These topics focus on reducing data’s redundancy. An imperative objective for an IT department should be: no ETL.
The amount of extraction, transformation, and loading performed on data must be minimized because of resulting data latency, the cost associated with ETL, and data pollution possibilities. Instead,
IT must take advantage of techniques such as enterprise information integration (EII) to access data as needed.
One Subject, One Source
Expanding on a previous point, an organization should plan to have one data source for one subject area. Depending on your organization, you have basic subject areas: customer, vendor, partner,
products (and/or services), finance, and so on. IT and business teams should identify all important subject areas and build one data store for each.
Of course, this is easier said than done. Organizations today will have many redundant data stores. How do we transition to, and achieve, this “single source for a subject” goal?
- Identify all subject domains. Both IT and business teams must agree on the organization’s most important subject areas.
- Plan the transition according to the order of importance. Large organizations can normally take on more than one project simultaneously.
- When starting on a subject area, identify all systems that currently contribute to and/or consume its data directly or indirectly. You need to take a complete inventory of data access including
the type of access, frequency, detailed level of data, historical needs, etc. During this process, systems might be found that perform the same or similar tasks. Identify possible targets for
- Identify business users who are owners of this subject. They will exist across departments, organizations, and companies. Determine who is responsible for every piece of data. Arrange meetings
to identify short- and long-term goals of each user group. This way, the system will ultimately be built towards a future need.
- Create a transition plan. This plan could be executed in months or years depending on the budget, the need, the cost savings, and the agility of the organization. In this plan, identify systems
that need to be retired. Also, identify systems that will be used as the “transitional source” until the future source is built. As part of the plan, identify technologies and processes
that need to be replaced in order to keep the systems operating during the transition time and yet build the groundwork for utilizing the transitional source.
- Start becoming a service-oriented organization. Create common organizational or enterprise-level services that will access a subject area and deliver information from all systems that support
that subject. For example, if you need all the information about a customer (name, address, and purchase history), the consuming system should not care where the information is located. It should
call the common service to deliver the information.
- Build the future source data store. This might involve multiple systems (e.g., current and historical), but consuming systems should not be affected. They should utilize enterprise-level
services that will deliver all information about the subject in one call.
Organizations’ data continues to grow at an exponential rate. According to a 2003 study by the University of California at Berkeley, about 5 exabytes of analog and digital information were
produced worldwide in 2002. (Lyman, 2003) That’s a data explosion equivalent to half a million new libraries the size of the print collection of the Library of Congress.
We need organizations geared up to harness the power of data and channel it toward their short- and longterm busi-ness goals. A data-centric organization can support the strategies needed to
deliver the vision of the information age.
Abai, M., S. Adelman, and L. Moss. Data Strategy, Boston: Addison-Wesley Int’l, 2005.
Lutchen, Mark, IT Effectiveness Practice, PricewaterhouseCoopers, “IT Outsourcing: The Importance of Retaining a Strong Management Capability,” PricewaterhouseCoopers Executive
Perspectives, 2005. See http://www.pwc.com/extweb/execpers.nsf/
Lyman, Peter et al., 2003, “How Much Information 2003,” UC Berkeley. For detailed study, please see www.sims.berkeley.edu/research/projects/how-much-info-2003
Reimer, James, PhD, at “IBM Enterprise Content Management and Portal Solutions,” June 9, 2003.