Imagine a coworker logs onto the company’s HR portal to see his benefits and can see your salary!! What? Don’t worry. That won’t happen as these applications are built with the expertise of years of business processes. These processes determine who can see and edit what data there. Suddenly, this data is brought into a data lake or a data warehouse. How you set up data access governance here is a mighty challenge. Read on to learn how to solve it.
Organizations must protect data to prevent adverse events while still making it available for informed decision-making. The mechanism of providing access to the right data to the right people in a timely manner is called data access governance.
What is Data Access Governance?
Users can access data through a variety of channels. It can be accessed through an application’s UI, directly from a database or data warehouse, or in some cases, when data is still in transit. Data access governance enables users to control, protect, and audit data use to maintain and ensure privacy. It also protects your company’s proprietary information and intellectual property.
As organizations strive for increased analytics, providing data to various project teams, executives, and analysts in a format they can consume is critical. However, there are many common challenges that arise when granting this access and complex reasons why data isn’t freely available.
Data Access Challenges Before Data Platforms
In an organization, data generally exists in various separate systems for different purposes. For example, an organization may use multiple sales management systems like SalesForce, SalesLoft, etc. for different use-cases. When anyone needs data, they are not sure which application might have what data. They are generally called data silos. These silos make it difficult for users to find the data they need if they don’t know the application where it is located. Generally, accessing information within an application or warehouse takes weeks or even months. These delays grind projects to a halt and slow innovation.
Data Volume and Technological Limitations
Sometimes, applications can only display a certain volume of data. So when interacting with the application, the user interface may restrict the view of data sets. For example, an application might display hundreds of data sets, but aggregated data for a particular project could amount to millions or even billions of data sets.
To fix these limitations, users must consult the database administrator. Generally, the database administrator has to obfuscate all confidential data before granting access, but without an automation process in place, this process is very labor-intensive.
We all know the power of data, but it takes a great deal of effort, care, and consideration to keep data clean and organized. So, it should come as no shock that some application owners are hesitant to hand over their optimized data to other teams.
Power struggles over data access can create tremendous problems in an organization, because in cases like this, supporting documentation and information around the data are not provided. This challenge serves to remind us that fostering a collaborative culture is just as important as encouraging data access practices.
Rise of Data Platforms
To overcome siloed data and technological challenges, companies started to create data platforms, generally data lakes or warehouses. To populate a data lake, move all of the data from various applications utilizing big data technology. In contrast, you only move selected, critical data for specific use-cases into a data warehouse. As organizations create data warehouses and lakes, various access challenges arise.
Data Access Management Challenges after Data Platforms
Complex Access Permissions Management
Organizations can use data lakes or various, similar platforms to aggregate curated data to overcome siloed data and technological limitations. However, it’s not easy to transfer all permissions from all applications into a data lake.
Role-based permissions are designed for each application, often with years of iteration behind their use. Essentially, the practicality of combining everything in a data lake or warehouse is incredibly challenging.
Privacy Compliance and Regulatory Oversight
Organizations must uphold privacy compliance regulations and information security practices to enable users to identify areas of risk and implement additional measures to protect confidential data. Many regulatory bodies outside of an organization impose laws around personal data with hefty fines for non-compliance. These laws require data protection and are one reason why access to PII data can’t be given universally.
Data Discoverability Challenges
As modern data platforms host huge volumes of data from multiple sources, it becomes very difficult to find the right data source.
Why Traditional Techniques Are Not Enough for Data Protection in the Modern Era?
Traditionally, users access data via applications or through a self-service portal. Applications generally have well-defined policies, but for self-service, data is manually curated and moved to a data warehouse or data lake. Afterwards, data is divided into various roles and managed by role management tools like OKTA and Active Directory.
Groups are formed that identify individuals with common access requirements needed to support the execution of their roles in the organization. Data is accessed via entry into the group where access opens up in bulk when you are assigned to the group. Anything not covered through this method goes to ad hoc workflows.
However, ad hoc access is often not well managed. Users who don’t have access don’t know what to ask for and from whom. Generally, IT has a form where users can request access to datasets they discovered through emails or searching through individual applications. The user uses this form to write the access request for a whole area, or access equal to that of another individual.
Modern Data Access Governance
Here at OvalEdge, we see an emerging trend for automated data access management through policies developed by data governance. The modern method of data access management enables you to tackle the most persistent data access management challenges with a full-circle approach.
Modern data access expands the traditional method to allow for automation, discoverability, and streamlined ad hoc workflows. The process works like this: You need to build a data catalog, classify the data into various groups, design access policies based on classification, and utilize ad hoc workflows for requests that reside outside of a classification’s parameters. Access is managed through policies automatically applied at the data layer.
Centralize Metadata in a Data Catalog
The first step is to create a centralized catalog of data assets. A data catalog like leverages metadata for easy discoverability without exposing the actual data. Users can search and learn about the data in the ecosystem from many vantage points and request access when needed that will route to the allocated workflow for a quick turnaround. It’s easy, automated, and scalable.
Best Practices for Data Classification
The next step is to classify your data which can be challenging. Firstly, there are many different applications to consider and within these applications are numerous complex processes and policies.
Secondly, your classification needs to support comprehensive access policies that solve many pain points. cNot only must you navigate various individual applications, but you need to address compliance, security, and other concerns too.
To enable organizations to achieve this, we’ve developed a series of best practices for data classification. The first step is to divide your data horizontally.
This step is relatively easy as horizontal classifications are based on various business functions, such as Sales, HR, Marketing, and Finance. If you don’t already have a set naming convention in place, it’s quite simple to develop one.
The next step is to classify the data vertically. This part is a little more challenging. Vertical classification seeks to divide the data into various classification categories, including Ownership, Unclassified Data, PII, Confidential, Super Secret, and Public.
As there is so much data to define, AI is used to supply the vertical classifications based on the existing horizontal classifications and predefined attributes.
Every classification should have an identified access owner, and the position should be appropriate for that classification. The owner is responsible for setting the parameters of the access policies.
Configure and Enforce Policies Based on Classifications
Access policies based on classifications can be framed in a data governance committee meeting based on classification groups in a robust access policy framework. You will most likely also need tools to configure this policy framework in a data warehouse or data lake.
Policies will focus on roles and the specific permissions they afford. For example, a Sales Rep might only have permission to access the metadata of unclassified data but complete data access to classified PII. You can write as many policies as are required for the different roles in your organization.
One way to keep track of access policies is to make an access matrix that shows which roles can interact with which classifications. The access matrix displays your organization’s access policies and adds transparency for who can access what data. Bill from marketing can access data labeled ‘marketing general audience,’ ‘marketing limited,’ ‘sales general audience,’ and any other classifications the matrix indicates.
Ad Hoc Workflow for Continuous Classification
New files and tables are created every day, bringing in more unclassified data to your organization. Because of this volume, you need an ad hoc workflow to identify these new tables, files, and reports to send to the appropriate person to confirm the classifications.
Every organization is different. They are at different places in access governance with unique handling, policies, and procedures already in place. Use the methods outlined in this article to gain a general idea of how data access governance can be implemented from scratch and introduce what makes sense for the unique needs of your organization.