Published in TDAN.com October 2002
In many organizations, analysis continues to be an area requiring constant justification. I.T. organizations are constantly swimming against the flow of hype that has plagued business users from
suppliers of I.T products, who have promised lowered development time, working prototypes, rapid application development and design, etc. for at least the past two decades. Even the new crop of
object-oriented products and methodologies have failed to realistically represent the true effort required for a medium to large size I.T. project.
As data professionals, we have come to realize the importance of a solid data strategy at an enterprise level and consistent data modelling at a project level. Communicating this importance to
those who hold the purse-strings can be a frustrating exercise in our current corporate environment of short-term profits and shareholder appeasement.
In many cases, relating data modelling directly to the “bottom line” may be the only way to communicate the importance of this activity to our sponsors and clients. Most of us have done our data
modelling handbook, implemented data model reviews, assigned data stewards and generally marketed the data architecture concept. How many of us have attempted a cost/benefit analysis for these
initiatives, or have been able to show where our organization has saved money or time as a result of these initiatives?
We need measurements to which actual costs (or time savings) could be applied, so that over time, cost savings could be computed. When someone comes to us and asks “How much money will the company
save by doing a data model?”, we need to be able to provide some sort of answer.
Of course, not all data model quality measurements can be directly related to time and money savings. It could be argued that accuracy, consistency and completeness are important quality
characteristics that may not be related to savings in time or cost. It is feasible that, over the lifetime of a model, these will indeed contribute to savings due to the relationship between these
characteristics and the ability for the analysis to be re-used.
Many of the metrics described in this article are related to savings in time, both during the modelling process and during subsequent analysis efforts. Direct cost savings are more related to
subsequent re-use of previous analysis and are described in the relevant section.
Measuring Data Model Value
The quality of any deliverable is a function of how well the deliverable meets its intended purpose. In the case of a data model, answers to the following questions are needed in order to create
the metrics to be used to assess the quality of the data model deliverable:
- What is a data model
- Why is it created
- What is it used for
- What is its life expectancy
The value of a deliverable is measured directly, based on the performance of a deliverable or indirectly based on time or dollars saved in the use of the deliverable.
I would like to assume that quality and value are related in that a higher quality deliverable will demonstrate higher earnings or contribute to time or dollars saved, whereas a lower quality
deliverable may never realize the originally intended potential.
What is a data model?
For the purpose of this discussion, a data model is a representation of the requirements related to the retention of persistent objects.
Why is a data model created?
Data modelling is a specialized form of analysis that is used to gather requirements specific to the retention of persistent objects. As is the case with most forms of analysis, the high-level
reasons for creating a data model are to:
- Ensure the requirements have been met
- Provide a vehicle to assist in the retention of information related to the requirements (metadata)
- Provide a consistent interface for discussion between the client and the analysts
What will a data model be used for?
Any sort of analysis has as its “reason for being” the goal of creating a deliverable which has a long-term value to the organization. If the deliverable is, for example, to grab a snack, there
is not much use in analysing beyond the decision of which product to select from the vending machine. If the final deliverable is a new organization-wide customer relationship management system, a
thorough process needs to be followed for:
- Collecting requirements
- Validating requirements
- Designing the various modules of the system, taking into account the often unknown longer-term effects of a design
A deliverable may have many uses, depending on the specific situation. In the case of a data model, uses include:
- Estimating the scope of a project or sub-project
- Evaluating recommended solutions
- Designing an OLTP or data warehouse database
- Assessing current data quality
- Creating a metadata repository
How does all of this relate to data model value?
The central themes in this discussion are focused on three characteristics of an analysis-related deliverable:
- Complete representation of requirements
- Retention of collected information
- Consistent interface
Complete representation of requirements
This is the area where the “completeness” of the model is assessed. If the model is complete, it is assumed to include all of the known requirements and to have passed multiple review iterations.
As data models become more generic (or refinement progresses to higher levels of normalization) and the use of patterns when creating a model increases, reviewing the model to assess the support of
business rules becomes more difficult. The review process now becomes:
- Review the definitions with the client to ensure business rules have been collected and accurately documented. Make sure the client knows they are the owners of the definitions.
- Review data types, attribute sizes, domains and valid values with the client to ensure business rules are accurately reflected.
- Review cardinality and optionality with the client to ensure business rules are accurately reflected. It may not be possible to perform this process by walking the client through the
entity/relationship diagram, depending on the level of abstraction that has been used.
- Review the data model with the data architect to ensure standards have been followed and appropriate information has been re-used (no wheels re-invented).
- Blue-sky” review with the client and the data architect to ensure the model will withstand changes to business requirements or the business environment. This review is potentially the most
difficult and should include client representatives who have a strategic vision.
The measures of quality for the completeness theme are:
- Number of definitions the client takes ownership of. If the client is willing to assume responsibility for the maintenance of the definitions, then it is safe to assume the definitions are
- Number of modifications to the model after each review. This is more of a rolling “how well is the modelling process going” measure than an end-state measure of how complete the model is. A
lower number of post-review modifications is an indicator of a higher degree of completeness.
Retention of collected information
This area is one of the most important facets of an analysis deliverable and the area which should have the most direct impact on costs.
Meeting the current deliverable is only part of the picture. The ability to re-use the collected information is where actual cost-savings can be realized. In the case of a data model, particularly
one which has an enterprise-wide focus, providing the ability to pull needed entities from an existing model or the ability to refer to previously collected definitions when creating a data mart
will save a tremendous amount of analysis effort for subsequent projects.
The measures of quality for the retention theme are:
- Number of times portions of a model are referenced (on a web page for example). If the model has been published (which all should be) and the repository information is easily accessible, the
“number of hits” on each entity (for example) can be a gauge of the usefulness of the originally collected information.
- Number of entities re-used in subsequent projects. This is as much a measure of the quality of the original analysis (and potentially design) as it is a measure of the amount of re-use. Costs
savings for this measure can be calculated based on a “days per entity” number. Total time savings (and related cost savings) would be equal to the “days per entity” multiplied by the number of
- Time to market for projects. Assuming we were able to re-use an existing database for a second application, the time savings could simply be “days per entity” multiplied by the number of
tables in the existing database.
A model serves several purposes. It is first a representation of one or more realities, either the current, the future or some combination. This is why it is referred to as a “model”. It is also
a communication tool and a repository of current thinking and hopefully strategic (future) direction.
A data model has a limited number of constructs, which include the entity/relationship diagram primitives (boxes and lines) and the specific categories of metadata, such as definitions, datatypes,
The limited number of constructs is the data model’s greatest strength because it is much easier to impose consistency when there are a limited number of objects that need to be consistent. It is
also easier to impose a structure on the objects and to use the limited number of objects as a communication tool.
Data models (and the actual data) will out-live the application for which the model was originally developed. Throughout the lifetime of the data, related data models (and metadata) will be used as
references by anyone requiring access to the data or making modifications to applications which access the data. In cases such as data warehouse or data mart projects, where data may be retrieved
from multiple sources, many data models may be referred to. A consistent style across all data models will help project teams find required information quickly.
Style includes standards for diagramming methodology and metadata. As is the case with any useful reference material, diagrams must be neat and easily readable. Definitions should include
consistent headings and sub-headings where required. Standards should be concise, brief, available and easy to adhere to.
The measures of quality for the consistency theme are:
- Review time by entity. The time required to review each entity (or definition) should decrease as the reviewers become familiar with the consistent style of the model. A side benefit to
following a consistent style is that subsequent projects will be able to accurately reflect the amount of time required to review a data model in project plans based on the results of past reviews.
- Amount of time spent during subsequent referral to the model. Just as the number of times the model is subsequently referenced is a measure of the retention theme, the amount of time spent when
referencing a specific portion of the model is a measure of the consistency. If the model has followed a consistent interface, subsequent users of the model should be able to find the required
Describing and implementing a series of processes to ensure quality, such as various stages of model review, while important, provides no actual measurement of the value of the model(s).
A quality assurance program needs to include the expectations for a deliverable and some way of quantifying the level of fit between the deliverable and the expectations. In the case of a data
model, the expectations are the current and future (known and unknown) requirements and the metrics (for value) are the time savings which can be derived from the various review and re-use
Relating the metrics to cost and time savings can be a performance yardstick and important tool as we strive to implement data architecture initiatives.
The author would like to thank the authors of MEASURING THE QUALITY OF MODELS, John C. Claxton & Peter A. McDougall, whose article provided inspiration for MEASURES OF DATA MODEL