The Capability Maturity Model (CMM), published by the Software Engineering Institute (SEI), is a well-established, defined model that characterizes the software development maturity of organizations based on their practices and procedures. However, it does not address the maturity of organizations with regard to the manner in which data is managed. I have found that the five levels of the SEI CMM can be mapped to a data perspective based on the manner in which data is stored, managed, and maintained. For more information on the SEI CMM please review the SEI web site at http://www.sei.cmu.edu.
The following outline maps the five levels of maturity to the manner in which organizations manage data – effectively outlining the structure of a Capability Maturity Model for the management ofdata.
Level 1 – The Initial Level
The level 1 organization has no strict rules or procedures regarding data management. Data may exist in multiple files and databases; using multiple formats (known and unknown); and stored redundantly across multiple systems (by different names and using different data types). There is no apparent method to the madness and few, if any, attempts have been made to catalog what exists. Changes are made “on the fly” as they are requested by program development. If a centralized data management group exists, it functions merely to apply the change requests as they are needed. Frequently there is no central data management group – instead new data structures are created, and changes are made, by the development groups requiring them (or perhaps by systems programmers who know nothing about the definition of the data).
The quality of data in level 1 organizations depends on the skills of the technical programmer analysts and coders. Level 1 organizations will take on monumental tasks with little knowledge of their impact causing project cancellations, or even worse, completed systems with severely corrupted data and/or invalid reports. As a rough estimate, approximately 30% to 50% of organization operate at Level 1.
Level 2 – The Repeatable Level
To move from level 1 to level 2 an organization must begin to adhere to a data management policy. The policy should dictate how and when data structures are created, changed, and managed. Although level 2 organizations follow some sort of management policy, they have usually yet to institutionalize the policy. Instead, they rely on a central person or group to understand the issues and implement the data structures of the organization reliably and consistently. This manifests itself by the creation of a database administration function.
The success of level 2 organizations depends on the skills of the DBAs charged with managing the “technical” aspects of data. Although the differences between the business and technical aspects of data are usually (not always) understood at some level, there is less effort made to document and capture the business meaning of data. Little (or no) differentiation between the logical and physical models of data is made. Level 2 organizations will begin to institute database administration practices such as managed schema change (maintaining records of the change) and reactive performance monitoring and tuning (whoever screams the loudest gets the attention). Approximately 15% to 20% of organization operate at Level 2.
Level 3 – The Defined Level
Organizations that have successfully move from level 2 to level 3 on the data capability maturity scale have documented and established a data management policy as a core component of their application development lifecycle. The policy is enforced and testing is done to ensure that data quality requirements are being met. Level 3 organizations typically understand the business meaning of data and have created a data administration function to augment the database administration function. Level 3 organizations have a stated policy that “data is treated as a corporate asset,” even if they do not entirely understand what that means.
The success of the level 3 organization typically depends on the interaction between the DA and DBA functions and the proper utilization of tools. Although level 1 and level 2 organizations may have tools at their disposal, they usually do not apply them consistently or correctly (sometimes they linger as “shelf-ware”). Tools are used by level 3 organizations to create data models, to automate DBA steps initiated by level 2 organizations (e.g. schema migrations) and to begin proactively monitoring and tuning database performance. Approximately 10% to 15% of organization operate at Level 3.
Level 4 – The Managed Level
An organization can move to level 4 only when it institutes a managed meta data (data about data) environment. This enables the data management group (DA and DBA) to catalog and maintain meta data for corporate data structures. It also provides the application development and end-user staffs access to what data exists where within the organization (along with definitions, synonyms, homonyms, etc.). The data management group is involved (at some level) in all development efforts to assist them in the cataloging of meta data and reduction of redundant data elements (in logical models always; in physical models as appropriate for performance and project requirements). Level 4 organizations have begun to do data audits to gauge production data quality.
The success of the level 4 organization depends on the buy-in of upper management to support the “data is a corporate asset” maxim. This involves treating data as they treat other assets (personnel, finances, buildings, finished goods, etc.). Advanced tools are utilized to manage meta data (repository), data quality (transformation engines) and databases (agent-based monitors, centralized consoles for heterogeneous database administration, etc.). Approximately 5% to 10% of organization operate at Level 4.
Level 5 – The Optimizing Level
The level 5 organization uses the practices evolved in levels 1 through 4 to continually improve the data access, data quality, and database performance. No change is ever introduced into a production data store without it first being scrutinized by the data management organization and documented within the meta data repository. Level 5 organizations are continually trying to improve the process of data management (example: using data modeling tools and the repository to generate physical schemas instead of as a documentation and meta data capture tool). Less than 5% of organizations operate at Level 5.
Synopsis
This short article merely attempts to map the five levels of the CMM to the manner in which data is managed within organizations. Of course, a more exhaustive treatment is required that thoroughly maps the procedures, policies, and characteristics of organizations at each level of the model.