Data Integrity in a New Light

Published in TDAN.com April 2005

One of the main areas of responsibility for any data steward is the enforcement of data integrity. Most data administration texts define data integrity as “attention to the consistency, accuracy
and correctness of data stored in a database or other electronic file” (Watson, R., “Data Management”, Wiley, 2000). Commonly, data integrity refers to the validity of data in its incarnations
(electronic, paper, etc.). This approach is primarily a reactive one and is focused on the rules used to create and store data values, by creating and storing the “right” values for each data
element.

However, data stewardship in its most robust and active form should be concerned with much more than simply enforcing rules for creating and storing “right” data values. As a steward, one has the
responsibility for the proper use and welfare of the assets under their stewardship (Brackett, M. H., “Data Resource Quality”, Addison-Wesley, 2000). If stewards are to implement their
responsibilities fully, perhaps we should adopt an alternative definition to “data integrity” – one that has as its goal the use and presentation of data without bias, refusing to allow data to
support of one point of view to the exclusion of any competing view.

As defined by Webster, “integrity” is “firm adherence to a code, a set of moral values, honesty and incorruptibility”. Therefore, “data integrity” can be defined as “using data according to
a code or set of values, with honesty”. Data stewards can serve as the foundation of this new approach to data integrity through their oversight of the data rules, values and access to the data in
their areas of responsibility. By choosing to teach data stewards of the ethical and honest uses of the data in their areas, organizations can formulate and articulate the acceptable uses of their
data – and the consequences of unacceptable uses. Some commonly accepted prohibitions on the use of data, taken from Cornell University’s Information Use in the 21st Century project include:

  • Do not use information (even if authorized to access it) to support actions by which individuals might profit (e.g., a change in salary, title, or similar administrative category). Do not
    disclose information about individuals without prior supervisor authorization.
  • Do not engage in what might be termed “administrative voyeurism” (e.g., tracking the pattern of salary raises; determining the source and/or destination of telephone calls or Internet
    protocol addresses; exploring race and ethnicity indicators; tracking internal stock purchases), unless authorized to conduct such analyses for stated business efforts.
  • Do not circumvent the nature or level of data access given to others by providing access or data sets that are broader than those available to them via their own approved levels of access
    (e.g., providing a company-wide data set of human resource information to a coworker who only has approved access to a single human resource department), unless authorized.
  • Do not facilitate another’s illegal access to the company’s administrative systems or compromise the integrity of the systems data by sharing your passwords or other information.

The ethical use of data can be applied to the creation and analysis/interpretation of data in reports or similar documents, especially those that are drawn from a data warehouse. Presenting data
from a pre-determined view (e.g., looking for evidence to support an already-chosen outcome or result) is a common activity in organizations, even in organizations that espouse the traditional
definition of “data integrity”. These reports are written by analysts who can be considered as “custodians” of data, defined as “anyone who has access to, receives or dispenses data” (Watson,
2000). As custodians, these analysts should be expected to present data in an un-biased format, so the final recipient of the results or report can draw their own conclusions. Giving the final
users of the data the opportunity and freedom to make decisions and form conclusions with un-biased data should be a goal of all data stewards, in both transactional systems and with decision
support and data warehousing systems.

Why is the impartial presentation of data an issue for data warehouse data stewards? Data drawn from a data warehouse can be combined in ways not expected with traditional systems, since the
dimensional view of data in a data warehouse allows users / analysts to relate previously unrelated values. These new relationships can result in the presentation of data that can be slanted toward
or against a particular outcome, and this bias may be invisible to the eventual report reader.

Allowing analysts to display data impartially is considered a core integrity value in many organizations that have adopted Peter Block’s “Organizational Stewardship” approach, since it gives the
report reader (i.e. final user) the opportunity to use the data as the reader wishes, without the need to filter the analysis through a distant analyst’s prism. Block’s view of stewardship can be
succinctly defined as “giving order to the dispersion of power, moving choice, resources and control to the edges of the organization where actual activity occurs (Bloch, P., “Stewardship”,
Berrett-Kohler Inc., 1993). Recently, this un-biased approach has been used by some investment firms to overcome the impression that their analysts have presented past results in
less-than-impartial terms to external customers. The impartial presentation of the facts, with the opportunity for the final user to draw fact-based conclusions could become an objective of data
warehousing and other decision support systems’ performance measurements.

However, attempts to present data without bias can be taken to an extreme, and can result in analysts abdicating their responsibilities to provide advice and guidance in interpreting complicated
data. “Impartial” does not have to mean “without any attempt at clarification or examination” since many uses of data require some interpretation and deduction to be operable. Analysts, and
data custodians in general, should strive for a balance between the raw presentation of data versus the tendency to slant the presentation to serve a pre-determined outcome or decision. Data
stewards can assist analysts and other custodians to develop this balanced approach by working to develop guidelines for data integrity that include this impartial yet advisory presentation of
data. Stewards can lead this effort by educating all members of the organization of the need for using data with integrity, and by facilitating discussions on the un-biased yet examined analysis of
the organization’s data to internal and external customers.

In conclusion, adopting a new definition of “data integrity” could lead to expanding an awareness of the need for active data stewardship within organizations and within a data warehouse. Data
stewards can foster the data integrity approach through communication of the possibility of impartial presentation and use of data and by exhibiting the principles of true data integrity in their
development of standards, definitions and guidelines for data usage.

Share

submit to reddit

About Anne Marie Smith, Ph.D.

Anne Marie Smith, Ph.D., is an acclaimed data management professional, consultant, author and speaker in the fields of enterprise information management, data stewardship and governance, data warehousing, data modeling, project management, business requirements management, IS strategic planning and metadata management. She holds a doctorate in Management Information Systems, and is a certified data management professional (CDMP), a certified business intelligence professional (CBIP), and holds several insurance certifications.

Anne Marie has served on the board of directors of DAMA International and on the board of the Insurance Data Management Association.  She is a member of the MIS faculty of Northcentral University and has taught at several universities. As a thought leader, Anne Marie writes frequently for data / information management publications on a variety of data-oriented topics.  She can be reached through her website at http://www.alabamayankeesystems.com and through her LinkedIn profile at http://www.linkedin.com/in/annemariesmith.

Top