The shift from owner to steward
In the past 15 years the volume of data under management by many organizations has grown exponentially. But that’s not the only story. The relative ownership of the data has shifted as well.
Transactional, contact, billing and accounting information which were acquired with the assumption that the acquiring organization had sole rights to it now face the fact that in large part they
are mere stewards of this information and accountable to shareholders, federal and state regulatory agencies, individual customers along with internal knowledge workers.
Data Management practices originate with business objectives. Data retention practices are driven by data retention policy established through a legal department in response to data retention
regulations. Storage management considerations come into play based on the type and volume of content required to support the organization. Organizations have seen rapid growth in areas such as
content management, email and other non-relational data, which can be governed by data retention rules. Content management growth rates in the areas of e-commerce, insurance claims management and
other data intensive areas have been as high as 100% per year. One company I work with has seen an annual doubling of content managed information per transaction for the last four years. Once
acquired this data must be retained, housed and fed until it can be disposed of in compliance with retention policy and regulations. Our original desire to keep everything has come back to haunt
us.
The Challenge
Regulatory rules related to data retention have evolved into a central part of many IT shops data management practices. Current rules related to SEC filings, HIPAA and insurance claims affect
virtually all organizations in their financial, HR, Environmental or direct business activities. On the horizon are a fresh group of regulatory requirements related to consumer information and
customer contact. The management of the processes has moved from a minor sub activity to a full blown organization, responsible for data retention oversight, audit and monitoring with a similar
escalation in associated costs.
Less is more
In data retention risk management, less is more. Reduction in the total volume of data under management is a key ingredient. Atomic level, lower value data that has been summarized and integrated
for analysis purposes, needs to be disposed of as early as possible. This elimination does not impact analytics having already been summarized or harvested of BI related information. It also
reduces the time frame of disaster recovery, backup and recovery and operational overhead.
Data Lifecycle
The data lifecycle of any data element, entity, record or subject area can be captured and used at the metadata level. This operational metadata describes specific retention rules related to data.
This metadata is then interpreted as the governing rules for the retention process. Data lifecycle incorporates the major events of the data such as:
- Create – The actual creation of the record
- Update – Modification of the record
- Distribute – Share data with other services
- Archive – Physical or logical repositioning of the record
- Dispose – Physical removal of the data record
Additional lifecycle events may be required based on specific industry or regulatory needs. The rules behind these processes are not always static and may be contradictory. In litigation, halt
destruct orders will supercede regulatory retention periods. The implemented rules must work in concert with precedence being assigned to rules in the case of conflict.
Audit and Reporting
Data retention requirements need to include some form of historical tracking of the data lifecycle events. This can be as simple as a catalog of disposal dates or a more complete form that catalogs
the incremental lifecycle events. This audit component can also demonstrate a well thought out data management strategy integrated with the comprehensive lifecycle events. The reporting services
can be used in-house to find the current retention status of a given piece of data or for external and audit reporting services. The ability to audit and report on the current lifecycle status of
data can become very significant when an entire class of data comes under review. The ability to provide rapid access to the status of transactions from a data retention standpoint can fend off
unnecessary investigation and discovery in sensitive data areas.
Solutions
The solution to the data retention process implementation needs to focus on several key services:
- A standard data retention lifecycle for at least all the data subject to the data retention policy of the organization.
- A rules based engine that can determine the retention status of a piece of data based on the data lifecycle.
- An audit service that retains information about the data retention regulated data as it transitions through its lifecycle.
- An audit service that retains information about the data retention rules enforced at a given time.
- A reporting service that can leverage information from the audit service for analysis on a specific piece of data or a collection of data.
Several solutions are currently available for implementing a complete data retention process. These include:
- Manual Development – Manual scripting of various processes to identify the current retention status of each data row under management.
- Commercial Off The Shelf (COTS) solutions – Most of the major storage vendors have data retention offerings. These offerings tend to focus on “content management” oriented
data and have avoided the complexities of some relational data challenges. Most of these solutions are bundled as part of an underlying storage solution. They do offer audit trail support, and to
varying degrees, the rules retention engine. There are several off the shelf solutions that are not specifically tied to storage or content management. These include Outerbay Technology and
Princeton Softech. Both of these solutions are stand-alone and offer a rules based / data lifecycle data retention strategy. - ETL and EAI Tools – Data retention can be viewed through an ETL engine as the reverse of a data loading or data acquisition process. It is therefore possible to use these core ETL/EAI
components as our rules engines and develop the required audit and reporting services separately. By the nature of these tools, they can represent at least a partial data retention solution for
those organizations using them for data loading purposes.
Summary
For years we have focused on acquiring and making data available for the various user communities. Due to regulatory requirements and the cost of retention, new strategies need to be developed that
ensure that data can be managed and disposed of in a traceable and orderly fashion. Data retention solutions should include the following components:
- Documented data retention rules agreed upon by senior management and legal.
- A rules retention engine capable of examining various types of relevant information and developing a disposition.
- Detailed audit and reporting services capable of tracking and reporting on the changes in the records data lifecycle.
- Mechanisms for automating the management of lifecycle status changes, archiving and disposal process.
The Data retention process should be viewed as an enterprise service shared by the various information management groups. This integrated approach will ensure a uniform and consistent application
of the rules and processes managed by the data retention requirements.
Please send comments to John at Jmurphy1@mindspring.com.
References
http://www.hipaadvisory.com/regs/
http://www.sec.gov/about/laws.shtml
http://www.sec.gov/rules/interp/34-47806.htm
http://www.princetonsoftech.com/index.asp