8 Ways to Discover Personal Data for GDPR

As many companies around the globe are struggling with how to meet the new GDPR requirement, we are all receiving a flurry of opt-in emails asking for permission to continue to process Personal Data.

Unfortunately, one of the major difficulties encountered by organizations engaged in the consent process is to complete the seemingly simple task of locating and categorizing Personal Data held across the enterprise. Often this is referred to as an Information Audit or Data Readiness exercise. For those who have been using a large ERP or CRM application packages to store and process data, meeting this challenge is even more acute.

Silwood Technology recently conducted research into five of the largest and most widely used information application packages. This revealed that the job facing organizations who are still trying to locate Personal Data in the coming weeks is significant. In SAP there are more than 900,000 fields, 140,000 in JD Edwards, and 100,000 in Microsoft Dynamics AX 2012 that may (or may not) contain personal information that requires detection and risk assessment. In short, businesses that are not well-advanced in data discovery or are undertaking manual discovery processes will not be ready on time for GDPR.

Many organizations are addressing the Personal Data challenge through software such as Information or Data Catalogue solutions within their overall Governance or Compliance program, which often incorporate some form of scanner or crawler that connects to many sources, identifies the metadata, and imports it automatically. Others may be using spreadsheet or more home-grown solutions to try to record Personal Data locations and understand how data flows through their organization.

These solutions can be very effective for some IT systems. However, they will not be as successful for organizations running enterprise CRM or ERP applications from SAP, Oracle, Salesforce, Microsoft, or other large application packages unless they incorporate specialist discovery software designed for the task. This is due to the size, complexity, and level of customization of the underlying data landscape of these systems as evidenced by the research.

Here we explore the eight main strategies available for organizations identifying Personal Data when starting the GDPR compliance process. Unfortunately, many are extremely time consuming and rely on extensive manual interrogation of databases and systems a luxury that is not available to any enterprise that is not well advanced in the process.

1. Looking For Documentation

Looking for documentation may seem a natural first port of call when trying to understand how to find Personal Data items in an application. However, even if the data models do exist in this static way, they will be of only limited use in anything but smaller, perhaps home-grown applications with simple data structures.

For those with large scale ERP and CRM packages, the task of navigating documentation to find individual tables and attributes from amongst thousands will be a significant challenge and of course any useful information cannot be shared easily with other tools as re-keying will be required.

2. Manual Investigation

This typically involves someone tasked with scouring the relational database (RDBMS) system catalogue for any information that might provide clues as to what data the tables contain, what attributes and fields they include, and crucially analyze the relationships between tables.

This is a perfectly acceptable approach for small database systems, where a package is limited in scope or has been developed in-house, but is very labor-intensive in larger systems that have many tables which do not have useful business names or descriptions.

3. Turning to Application or Technical Specialists

Specialists are likely to have the most familiarity with the application and its underlying data model. They are also most likely to have access to any technical tools that are provided by software vendors, which can be used to locate the information required. However, their knowledge of the business context of a request for Personal Data may not necessarily be complete, and such specialists are often busy and in very short supply.

4. Hiring External Consultants

Another common approach is to engage external consultants. Obviously, they may provide an expert resource; however there can be a significant cost as well as time to familiarize themselves with the data landscape and its customizations. In addition, this can contribute to lower in-house knowledge levels in the long term.

5. Metadata Driven Software Approach

Using software to identify the metadata associated with Personal Data across an organization’s IT ecosystem can make the discovery process considerably faster and more effective. Many data catalogue and governance products have facilities to connect to source systems and import their metadata directly so that it can be investigated more fully. Automating this process reduces the opportunity for error as there is only very limited manual intervention.

This approach does not work for large CRM and ERP systems because of the size, complexity, and level of customization of their data landscapes. There are a few advanced self-service metadata discovery tools, which provide a view into their metadata and allow users to navigate and search for Personal Data attributes and subset them into appropriate categories. That information can then be shared with Data Catalogue or Governance products, or even used with Excel.

Metadata-based solutions can accelerate Personal Data discovery considerably, especially when compared to entirely manual or semi-automated processes.

6. Internet Search

Using the internet to locate Personal Data attributes is only really of any value when the data models are in a format that can be published either by vendors or by customers. It would not make much sense to publicly exhibit data models of one’s own in a house-developed system.

However, it is possible to find metadata definitions. For example, consider well-known social media platforms and occasionally data models from popular ERP and CRM packages which might point you in the right direction of the Personal Data you are seeking. This is often seen as a viable, low cost option, but is labor intensive and also questionable in accuracy terms.

There are also risks. The published information is unlikely to represent the system as implemented by the seeker either through version differences or individual customizations. In addition, it is often necessary to ask a technical specialist to interpret the model and augment it with relevant information from the application itself.

7. Best Guess and Hypothesis Testing

When faced with the problem of Personal Data discovery, many companies use guesswork or hypothesis testing methods to try to find tables and attributes they need. They rely on data observation, insight, and on trying to find an appropriate start point from which to launch a search — a strategy that can be frustrating, time consuming, and potentially inaccurate.

8. Turning to Software Vendors

Data Modelling tools offer a good solution for finding Personal Data based on their ability to reverse engineer RDBMS and create a data model from the tables, fields, and relationships they find. From there an analyst can try to find the items needed for GDPR.

Data Profiling software can also be useful since it provides the ability to look at data formats to determine if they are likely to contain Personal Data. Sometimes this uses a form of machine learning or other analysis techniques to surface what may be relevant.

ERP and CRM package vendors do have tools which can be used by technical specialists for more traditional database and metadata tasks.

However, the particular challenge of trying to locate Personal Data in large-packaged CRM and ERP applications is not adequately met using these approaches. This is because of the lack of meaningful metadata in their database schema, the size and complexity of the model, and the numbers of attributes to be investigated.

Summary

For large organizations really struggling to find Personal Data, it is too late to employ consultants or redirect staff away from their normal tasks to get through the work. Instead, it is time to look to software tools to automate as much of the Personal Data discovery work as possible.

Of course, Personal Data discovery is just one step towards GDPR compliance, however it is a vital one. It is also worth remembering that GDPR compliance is not a one-time event; maintaining compliance will be an essential business process in the future. A manual, intensive approach today may simply delay the inevitable data reckoning – can you afford to ignore the future?

MenuMenu

Roland Bullivant