Data Retention – More Value, Less Filling

 

Published in TDAN.com October 2004


The shift from owner to steward

In the past 15 years the volume of data under management by many organizations has grown exponentially. But that’s not the only story. The relative ownership of the data has shifted as well.
Transactional, contact, billing and accounting information which were acquired with the assumption that the acquiring organization had sole rights to it now face the fact that in large part they
are mere stewards of this information and accountable to shareholders, federal and state regulatory agencies, individual customers along with internal knowledge workers.

Data Management practices originate with business objectives. Data retention practices are driven by data retention policy established through a legal department in response to data retention
regulations. Storage management considerations come into play based on the type and volume of content required to support the organization. Organizations have seen rapid growth in areas such as
content management, email and other non-relational data, which can be governed by data retention rules. Content management growth rates in the areas of e-commerce, insurance claims management and
other data intensive areas have been as high as 100% per year. One company I work with has seen an annual doubling of content managed information per transaction for the last four years. Once
acquired this data must be retained, housed and fed until it can be disposed of in compliance with retention policy and regulations. Our original desire to keep everything has come back to haunt
us.


The Challenge

Regulatory rules related to data retention have evolved into a central part of many IT shops data management practices. Current rules related to SEC filings, HIPAA and insurance claims affect
virtually all organizations in their financial, HR, Environmental or direct business activities. On the horizon are a fresh group of regulatory requirements related to consumer information and
customer contact. The management of the processes has moved from a minor sub activity to a full blown organization, responsible for data retention oversight, audit and monitoring with a similar
escalation in associated costs.

 


Less is more

In data retention risk management, less is more. Reduction in the total volume of data under management is a key ingredient. Atomic level, lower value data that has been summarized and integrated
for analysis purposes, needs to be disposed of as early as possible. This elimination does not impact analytics having already been summarized or harvested of BI related information. It also
reduces the time frame of disaster recovery, backup and recovery and operational overhead.


Data Lifecycle

The data lifecycle of any data element, entity, record or subject area can be captured and used at the metadata level. This operational metadata describes specific retention rules related to data.
This metadata is then interpreted as the governing rules for the retention process. Data lifecycle incorporates the major events of the data such as:

  • Create – The actual creation of the record
  • Update – Modification of the record
  • Distribute – Share data with other services
  • Archive – Physical or logical repositioning of the record
  • Dispose – Physical removal of the data record

 

Additional lifecycle events may be required based on specific industry or regulatory needs. The rules behind these processes are not always static and may be contradictory. In litigation, halt
destruct orders will supercede regulatory retention periods. The implemented rules must work in concert with precedence being assigned to rules in the case of conflict.


Audit and Reporting

Data retention requirements need to include some form of historical tracking of the data lifecycle events. This can be as simple as a catalog of disposal dates or a more complete form that catalogs
the incremental lifecycle events. This audit component can also demonstrate a well thought out data management strategy integrated with the comprehensive lifecycle events. The reporting services
can be used in-house to find the current retention status of a given piece of data or for external and audit reporting services. The ability to audit and report on the current lifecycle status of
data can become very significant when an entire class of data comes under review. The ability to provide rapid access to the status of transactions from a data retention standpoint can fend off
unnecessary investigation and discovery in sensitive data areas.


Solutions

The solution to the data retention process implementation needs to focus on several key services:

  1. A standard data retention lifecycle for at least all the data subject to the data retention policy of the organization.
  2. A rules based engine that can determine the retention status of a piece of data based on the data lifecycle.
  3. An audit service that retains information about the data retention regulated data as it transitions through its lifecycle.
  4. An audit service that retains information about the data retention rules enforced at a given time.
  5. A reporting service that can leverage information from the audit service for analysis on a specific piece of data or a collection of data.

The Integrated Data Retention Process

Several solutions are currently available for implementing a complete data retention process. These include:

  • Manual Development – Manual scripting of various processes to identify the current retention status of each data row under management.
  • Commercial Off The Shelf (COTS) solutions – Most of the major storage vendors have data retention offerings. These offerings tend to focus on “content management” oriented
    data and have avoided the complexities of some relational data challenges. Most of these solutions are bundled as part of an underlying storage solution. They do offer audit trail support, and to
    varying degrees, the rules retention engine. There are several off the shelf solutions that are not specifically tied to storage or content management. These include Outerbay Technology and
    Princeton Softech. Both of these solutions are stand-alone and offer a rules based / data lifecycle data retention strategy.
  • ETL and EAI Tools – Data retention can be viewed through an ETL engine as the reverse of a data loading or data acquisition process. It is therefore possible to use these core ETL/EAI
    components as our rules engines and develop the required audit and reporting services separately. By the nature of these tools, they can represent at least a partial data retention solution for
    those organizations using them for data loading purposes.


Summary

For years we have focused on acquiring and making data available for the various user communities. Due to regulatory requirements and the cost of retention, new strategies need to be developed that
ensure that data can be managed and disposed of in a traceable and orderly fashion. Data retention solutions should include the following components:

  1. Documented data retention rules agreed upon by senior management and legal.
  2. A rules retention engine capable of examining various types of relevant information and developing a disposition.
  3. Detailed audit and reporting services capable of tracking and reporting on the changes in the records data lifecycle.
  4. Mechanisms for automating the management of lifecycle status changes, archiving and disposal process.

The Data retention process should be viewed as an enterprise service shared by the various information management groups. This integrated approach will ensure a uniform and consistent application
of the rules and processes managed by the data retention requirements.

Please send comments to John at Jmurphy1@mindspring.com.

References

http://www.hipaadvisory.com/regs/
http://www.sec.gov/about/laws.shtml
http://www.sec.gov/rules/interp/34-47806.htm
http://www.princetonsoftech.com/index.asp

Share

submit to reddit

About John Murphy

John is a 1975 graduate of Bridgewater State College, Bridgewater Massachusetts. Following a brief career as a public and private school teacher, John went to work for Core Laboratories as a geo technician operating early computerized well logging units in the gulf coast for companies such as Gulf, Exxon and BP. John then joined the R&D staff of Teleco Oilfield Services, a subsidiary of Southern Natural Gas, forming their first data integration and analysis department while building early relational analytical data models, integrating drilling, formation and production data. In the late 80’s John worked as a consultant to the Department of the Army in building the Department of the Army Data Dictionary and the Department of Defense Data Repository System, two early metadata repositories. John also worked with the Defense Information Systems Agency (DISA) on data standardization, data modeling and enterprise data development practices.

John became an independent consultant in 1992  From this, John applied his knowledge of Metadata, data architecture, and data standardization to developing Enterprise data design and management practices at companies such as Qwest Communications, Jeppeson Sanders Flight Information Systems, Interactive Video Enterprises and the Federal Aviation Administration, Cigna Health Care, Safeco Insurance, Marriott International and Ford Motor Corporation. Mr. Murphy provided design and architectural support for several large scale initiatives including the Canadian ISPR migration, the Mexican National Retirement Systems (Processar) and early internet marketing ventures with Pacific Telesis. John has developed several e-marketing models for view / visit and navigational analysis along with wirelesss call switch analysis. Both forms of analysis focus on data clean-up and reduction. John also developed several data visualization and analytical processes for the rapid identification and analysis data anomalies.

Through the remainder of the 90’s and to present John has continued consulting in the areas of Data Warehousing, Database architecture, data standardization, data modeling and data migration for companies such as AT&T Broadband, USBank, Marconi Communication, Cigna Health Care and SUN Computers. John’s recent work has focused on data cleansing and standardization based on detailed metadata modeling.

Top