Data Strategy Introduction

Published in TDAN.com April 2005

Please note: This is an excerpt from Data Strategy by Sid Adelman, Larissa Moss and Majid Abai, published by Addison Wesley. The book should be available in early July, 2005.

“There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things.” Niccolo Machiavelli, The Prince, 1532

A Chief Financial Officer (CFO) was approached by the CEO and asked for an accounting of the company’s financial assets. The CFO gave a vague response indicating a lack of knowledge of the corporate bank accounts, had little idea what was in each account, and had no idea about the status of accounts receivable. The Board of Directors asked the CEO about the intended use of the corporate assets and were told “there is no plan for their use.” The CFO and the CEO were soon pursuing new personal interests.

A primary asset in IT is data and if most CIOs are asked about the assets under their control, most would be forced to respond that there are no plans for the use of this primary asset. They would have to admit that there is no inventory of data, that little is known about what data is in which database, that they have no idea about the quality of the data, and “there is no plan for the productive use of this asset.” Interestingly, the turnover in CIOs is also high, albeit not for the same reason.

Current Status in Contemporary Organizations

Very few organizations, large or small, have a well-defined data strategy If asked some will point you to dusty and outdated volumes of database standards, usually geared specifically to their Database Management System (DBMS). The more advanced organizations will have a subset of standards and perhaps a documented strategy on portions of what should be included in an overall strategy.

In most organizations, the value of data is not well understood. Data is considered the province of the department that creates it, and this data is often jealously guarded by that department, often under the guise of data ownership, which to the mind of the department head means secreting their data from all others in the organization. Some of the more astute people in your organization may recognize that data and the information derived from that data translate into real power, and that control of that data can be the steppingstone to advancement.

A data strategy is usually addressed piecemeal. A company will launch an effort to choose its preferred DBMS or will attack a database performance problem when response time becomes excessive. Rarely do organizations work from the big picture, and as a result they sub-optimize solutions, introduce programs which may have a deleterious effect on the overall enterprise, cause inconsistencies that result in major efforts for interfacing, or develop systems that cannot be easily integrated.

As application packages or enterprise resource programs (ERPs) are introduced, they bring with them a plethora of diverse standards, naming conventions, codes, and DBMS platforms. Modifying these packages to conform to the organization’s standards is unthinkable and so the data Tower of Babel becomes even more unmanageable.

There are also organizations that make terrible choices of their underlying technology infrastructure. For example, if a platform (operating system and DBMS) does not scale or will not support the organization’s service level agreements, projects will fail usually after millions are spent, years are wasted, and credibility and careers are dissipated. There is an interesting fallacy that holds that once the failure has occurred, the guilty (and some of the innocent) are sacrificed, the organization will have learned and therefore never make the same mistakes (or any other mistakes), and that success will be assured the second time around. There are countless instances of the second or the third time around also being failures, either making the same mistakes, or making new and equally catastrophic decisions. The absence of a sound data strategy provides a fertile ground for yet another disaster. The voice of reason is often ignored, those with an understanding of the potential problems are discouraged from voicing their opinions, and anyone who disagrees with a well-intentioned but unviable architecture is known as “not a team player.”

Most organizations have rogue databases running rogue and stealth applications. These are often on Excel or on Access but they may also be on SQL Server or on some other DBMS. The existence of these applications is often unknown (and purposely undisclosed) to IT. They came into existence because a department needed an application and users were unwilling to wait for IT to deliver, the department wanted total control or did not want to have to conform to IT’s standards. These departments either developed the applications themselves or hired an outside consultant to write the applications. The choice of the platform, the DBMS, and the development language was usually determined by the skills of those writing the application. There was almost no thought to conforming to organizational standards and certainly no thought to integrate with other applications.

Why A Data Strategy is Needed

Working without a data strategy is analogous to a company allowing each department and each person within each department to develop their own financial chart of accounts. This empowerment would allow each person in the organization to choose their own numbering scheme. Existing charts of accounts would be ignored as each person exercised his or her own creativity. Even to those of us who don’t wear green eye shades, the resulting chaos is obvious and easy to predict.

The chaos without A Data strategy is not as obvious, but the indicators abound: dirty data, redundant data, inconsistent data, inability to integrate, poor performance, terrible availability, little accountability, users who are becoming increasingly dissatisfied with the performance of IT, and the general feeling that things are out of control.

Without A Data strategy, the IT people within the organization have no guidelines for making the decisions that are absolutely crucial to the success of the IT organization. In addition, the absence of a strategy gives a blank check to those who want to pursue their own agendas including attachment to certain technologies, or Machiavellian aspirations of power. This includes those who want to try a new DBMS, new technologies (often unproven), and new tools that may or may not be appropriate. This type of environment provides no checks or validation for those who might be pursuing a strategy that has no hope for success.

A Data strategy should result in the development of systems with less risk and a much higher success rate. It should also result in much higher quality systems. A data strategy provides a chief technology officer (CTO) and CIO with a rationale to counter arguments for immature technology, and data management approaches which are inconsistent with existing strategies.

Value of data as an organization asset

Organizations have data on their customers, their suppliers, and transactional data that captures the heart of the business, the purchases, sales, customer calls, activities, as well as financial data. This data has value, which means it is an asset that is just as important, if not more important, than the buildings, the parts inventory, accounts receivable, and equipment assets of the organization. Some organizations that sell their data do carry it as an asset on their books and this may be considered intellectual capital. When a company gets evaluated for acquisition, evaluated as a merger candidate, or appraised by Wall Street, the notion of a going concern includes the value of the data as an asset. The Europeans have already adopted the notion of “fair-value accounting” and the United States is supposed to adopt this method of accounting by the end of 2006. This means they will be accounting for assets, previously unrecognized, and placing those assets on their books. An asset has future value and an organization’s data certainly has future value but this value has rarely been reflected on a company’s books. However, it’s lurking there sometimes as sound management, excellent technology, goodwill, and sometimes it is reflected in the price of the stock. It will be difficult to properly value this asset, especially data that has been in the organization for some time.

Why would we need to assign value to data since an organization cannot exist without the data that supports their applications? The reason is that budgets and resources are limited, and it may be difficult for the person attempting to get the needed budget and the right people resources. By showing the business value of data, the budget and the right staff should be easier to acquire.

New federal regulations require the CEO and the CFO to certify the accuracy of what is reported to the investment community and to the governmental regulators. Such certification would be greatly improved if the organization had a strong and viable data strategy.

Vision and Goals of the Enterprise

The vision of a data strategy that fits your organization has to conform to the overall strategy of IT, which in turn must conform to the strategy of the business. Therefore, the vision should conform to and support where the organization will want to be in five years.

Every organization, be it a publicly traded company, a private for-profit company, a non-profit organization, a governmental agency, or a quasi-governmental entity all have strategic goals. At minimum, the for-profit companies want to increase profits and most organizations have customer service in their goals as well as high levels of financial integrity.

Your goals will be unique because your organization is unique. Arriving at these goals and gaining consensus will be time consuming and will cause some dissension, but this is not a task to avoid; it must be addressed. Without the goals and the principles that follow, there will be little direction for the decisions you will be making

The following are a set of primary goals of IT. These goals should not be accepted without extensive discussion and the requisite commitment needed to make them work. None of these goals come free. There is a cost to each of them.

  1. Deliver systems faster
  2. Improve the relationship with the business community
  3. Improve the quality of delivered systems
  4. Reduce the risk and reduce the failures in application development
  5. Reduce costs
  6. Attract and retain good people
  7. Increase the productivity of application development
  8. Provide controls for the IT environment
  9. Improve the productivity of business people who use the data

Support of the IT Strategy

The target data environment does not exist in a vacuum. It is relevant only within the context of an overall IT strategy. This strategy, in turn, has been established to support the business of the organization, both for the existing environment and for the future plans of the business. Some organizations have been unable to pursue business opportunities because IT was unable to support them in a timely manner.

While we will be discussing the data environment, it is critical to recognize that a supporting infrastructure or architecture must be in place to be able to support a robust data environment.
Components of a Data Strategy

There are a number of key components in a data strategy, and each aspect is covered by a chapter in this book:

  1. Data integration
  2. Data quality
  3. Metadata
  4. Data modeling
  5. Organizational roles and responsibilities
  6. Performance and measurement
  7. Security and privacy
  8. DBMS selection
  9. Business intelligence
  10. Unstructured data
  11. Data categorization – categorizing data on the basis of performance requirements, security requirements, and availability requirements

Data Environment Assessment

Orienteering (the British call it Cunning Running) features cross-country running with a map and a compass. When learning to use a map and compass, the first exercise is to determine where you are on the map. Without this knowledge, you are absolutely lost. So it is with establishing a target data environment under the umbrella of a data strategy. First you must determine where you are today. This includes an assessment of your existing DBMSs, internal skills, culture, and legacy systems. Table 1 is a data Environment Assessment Questionnaire that should give you a first analysis of your existing data environment.

How is this going to happen?

The only way a data strategy will be developed and then implemented is with the very strong support of the CIO and business executives who must be educated on the criticality and impact of a good data strategy. It will also require that the business managers understand and concur with the goals and with the process; their cooperation is essential. A small team of dedicated (dedicated means full time with no other duties) people who will concentrate on building the data strategy and then concentrate on selling and then implementing the strategy. The ideal team will be composed of the chief technology officer (CTO) who will lead the project or at least be a major contributor to the project, a data administrator, a strong DBA, and a business analyst who comes from the business side. All these people should have a strong track record of accomplishment and a good relationship with key members of both IT and the business. They should also be intimately familiar with the business and the data that supports the organization. The team will have to sell the strategy and should be authorized to create the strategy and to implement it. Again, this means strong support from the CIO, senior directors, and the business executives, and this support must be communicated to everyone involved. A data strategy may attract many would-be assassins and the only thing that will keep them at bay is their understanding that the strategy will happen.

Table 1 Data Environment Assessment Questionnaire

The purpose of this questionnaire is to assist your organization in gathering basic information about its current data environment.

1. Organization

1.1 Data Administration (DA)
1.1.1 How many people are there in data administration?
1.2.2 What are their skill levels and what kind of training have they had?
1.1.3 What is the charter of the department?
1.1.4 Is DA centralized or decentralized (are the DAs assigned to specific lines of business)?
1.1.5 Where does DA report in the organization?
1.1.6 Does DA set naming standards? If not, who does? Are naming standards enforced? How are they enforced?
1.1.7 Does DA keep track of all synonyms and homonyms in the system? How do they track them? What do they do to prevent new synonyms and homonyms from being created?
1.1.8 Is DA responsible for metadata or do you have a metadata administration function
1.2 Database Administration (DBA)
1.2.1 How many people are on the DBA staff?
1.2.2 What are their skill levels and what training have they had?
1.2.3 What is the charter of the department?
1.2.4 Which DBMSs does the department support?
1.2.5 Is the DBA staff centralized or decentralized?
1.2.6 Where does DBA report in the organization?
1.2.7 Do the DBAs or do the programmers design the databases?

2. Application Production Environment

2.1 Mainframe
2.1.1 What DBMSs are used?
2.1.2 Which applications use which databases?
2.1.3 Which are operational, which are decision support?
2.2 Midrange
2.2.1 What DBMSs are used?
2.2.2 Which applications use which databases?
2.2.3 Which are operational, which are decision support?
2.3 Servers
2.3.1 What DBMSs are used?
2.3.2 Which applications use which databases?
2.3.3 Which are operational, which are decision support?
2.4 PC workstations
2.4.1 What DBMSs are used
2.4.2 Which applications use which databases?
2.4.3 Which are operational, which are decision support?
2.4.4 Is data stored in spreadsheets? Are they shared? Who maintains them?
2.4.5 Are spreadsheets being used for operational or decision support purposes?
2.4.6 How many people in the organization have access to those spreadsheets?
2.5 Is there a strategic plan for using your DBMSs?
2.6 Are there plans to move legacy databases to relational DBMSs?
2.7 How much redundant data is in the systems today? How do you measure the redundancy? Are there plans to reduce data redundancy?
2.8 What problems has redundancy created? How do you measure the problems?
2.9 How integrated are the databases? Are there plans for future integration?
2.10 Are there plans for the use of subject databases?
2.11 Is there an accurate inventory of data? If so, what mechanisms are in place to maintain this inventory? Who maintains this inventory? The DAs? The metadata administrators?

3. Application Development Environment

3.1 Is there a standard for the “buy versus build” decision?
3.2 Which application packages are being used? How extensively? What databases or access methods are associated with these application packages? (example: DB2 with SAP, Oracle with PeopleSoft)
3.3 Are industry data models being employed? Are these logical, physical or both?
3.4 Do your application vendors provide you with their data models (database views)? If so, how are they being used?
3.5 Do you use logical data models in your application RFPs to clarify your requirements to the vendors?
3.6 Application Delivery – Analysis Phase
3.6.1 What experience does the staff have with data analysis?
3.6.2 What experience does the staff have with data modeling? Is their experience limited to database design or does it include business modeling?
3.6.3 Are these activities part of the System Development Life Cycle (SDLC) methodology? Are these activities separate from design activities?
3.7 Application Delivery – Design Phase
3.7.1 What experience does the staff have with database design?
3.7.2 What experience does the staff have with data modeling in terms of logical database design?
3.7.3 Are these activities part of the SDLC methodology? Are these activities separate from analysis activities?
3.7.4 Are data modeling tools used in the design process? Which ones? How are they used?
3.7.5 How are the data models mapped (logical data model from analysis to physical data model from design)?
3.8. Application Delivery – Production Phase
3.8.1 Is DDL generated automatically? How? What tools are used?
3.8.2 Are separate data request modules used? Where are the reusable modules stored? Do all programmers know about them and use them?
3.8.3 Are physical data models used to reflect the physical DB design? How are they used?
3.9 Application Delivery – Maintenance Phase
3.9.1 Are you reverse engineering from legacy databases when you enhance existing systems?
3.9.2 Are tools being used to reverse engineer? What tools are used?

4. Metadata

4.1 Is metadata an integral part of application development?
4.2 What role does metadata play in your organization?
4.3 What metadata tool (if any) is being used?
4.4 Who uses your metadata?
4.4.1 Who inputs the metadata?
4.4.2 Who extracts the metadata?
4.4.3 What training is given to metadata users?
4.4.4 Who controls the meta data?
4.5 How accurate and current is the information in your metadata repository?
4.6 Are there plans for changes in the use of metadata?
4.7 What additional capabilities do you require to make metadata significant and productive?
4.8 Is metadata retrieved and displayed as part of a database query?
4.9 Is business metadata mapped to technical, process, and usage metadata? Where is it mapped? Who does the mapping?
4.10 Is business metadata, as it pertains to data, mapped and synchronized with the logical data models?
4.11 Is technical metadata, as it pertains to data, mapped and synchronized with the physical data models and the corresponding DDL?
4.12 Is process metadata, as it pertains to processes, mapped and synchronized with the various tools that use it (ETL, OLAP, data mining, etc.)?
4.13 Is usage metadata collected? How is it captured?

5. Distributed Data

5.1 Describe the current and planned strategy for both centralized and decentralized data.
5.2 Will data be distributed to support your clients’ desktops, etc.?
5.3 How are distributed databases being managed? Who controls the data redundancy? How is it controlled?
5.4 How does data naming remain centralized to avoid creation of synonyms and homonyms? Does DA play a role in controlling the naming process? If not, who does?
5.5 How is metadata managed in the distributed environment? Is metadata distributed as well? Is the distribution by data subject-area or by user department? Is the distribution accomplished through
mirrored metadata repositories or through XML-enabled metadata retrieval directly from tools?
5.6 Who maintains the meta-meta model over the distributed metadata implementation?

6. Data Quality

6.1 How would you characterize the quality of your data?
6.1.1 How important is the data? Do you know which data is critical to the organization? Which data is important but not absolutely critical? Which data is relatively insignificant?
6.1.2 Is the data available when your users need it?
6.1.3 How timely is the data?
6.1.4 How accurate is the data?
6.1.5 How complete is the data?
6.1.6 How many overloaded data fields exist (fields that store values of different domains; example: Master-Code A, B, C, D where the values A and B describe a customer and C and D describe a
product)

7. Enterprise Issues

7.1 Is data viewed as a vital corporate resource? Who in your organization has this opinion?
7.2 Does your organization have data owners and data stewards? Do you know who they are?
7.3 Is there an existing data strategy, which guides the application development process? Who is responsible for creating and maintaining the data strategy?
7.4 Are data standards in place (naming, security, database recovery)? Who creates them? Who enforces them?
7.5 Do you have a target application development environment (where you want to be in three years)? What is it? Whose vision is it? (CIO, CTO, DBA…)?

This is an excerpt from Data Strategy by Sid Adelman, Larissa Moss and Majid Abai, published by Addison Wesley. The book should be available in Early July,
2005.

Share

submit to reddit

About Sid Adelman

SidÊis aÊPrincipal at Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses and in establishing effective data architecturesÊand strategies.

Top