Data Governance Metrics Tracking

FEA01x - edited feature imageThe Provider
Data Business Problem

At NYU Langone Health System, the role of provider is vital not only to patients, but to the enterprise as a whole. A provider is an employee who provides healthcare to patients, including physicians, nurses, and allied health professionals, among others.  For the purposes of this paper, a provider represents a physician going forward.  Providers are essential to business functions throughout the enterprise, as is maintaining information about them.

There are numerous attributes associated with providers that are utilized for purposes related to credentialing, human resources, physician referrals, electronic health records, and public web resources for finding a doctor, to name a few.  At NYU Langone Health, provider data exists in several systems, from which a master data management system is fed.  The accuracy of these data elements is critical for internal operations, as well as interaction with external agencies and customers.  Occasionally master and reference data attributes are inaccurate or require updating.  These attributes are employed across the enterprise, and if a quality issue arises, it is not always clear to end users where source updates are required.


The notion of master data management is a method or system of combining data from multiple sources to create a central reference point for authoritative data.  The master data management system at NYU Langone Health contains a provider domain.  This domain consumes provider data from six sources, as outlined in the table below:

Screen Shot 2018-09-30 at 10.57.22 PMEach source feeds the MDM system, refreshing data daily.  The MDM system utilizes a matching algorithm to combine records for the same provider into one golden record, deemed the single source of truth.  Once matched, the golden records are available in a database: The Provider Operational Data Store (PODS), which is available for consumption.


One consumer of the provider master data is the NYU Langone Health public website.  The find-a-doctor page has a search function that allows consumers and potential patients to search for providers by name, specialty, among other search criteria.  Each provider has an individual profile outlining their credentials and specifications (Figure 1).  The data on these profiles originate from multiple sources, resulting in an involved data flow (Figure 2).


Figure 1: A provider profile on NYU Langone Health’s find-a-doctor website.


Figure 2: Flow of data from source system to web profiles.

The Provider Change Management Process

The Department of Enterprise Data Quality (DEDQ) at NYU Langone Health is responsible for facilitating enterprise data governance.  To ensure that changes to provider data are addressed correctly and completely, the DEDQ team has established a change management methodology.  The processes and policies that comprise of this methodology ensure that enterprise master and reference data attributes are accurately updated in a consistent and standardized manner.

End users, including providers and the many other employees who interact with them, submit change requests to update provider data attributes.  Requests are often submitted to update a data element in one location or application.  However, the attribute in question may have an impact on the enterprise level.  Verification and coordination across multiple authoritative data sources are required.

At NYU Langone Health, a standardized process has been put in place to address source changes to provider data (Figure 3).  This provider data change management process involves coordination from the DEDQ team with stakeholders requesting the data changes as well as data stewards who regularly work with the data.


Figure 3: High level diagram of the provider change management process.


NYU Langone Health utilizes FrontRange HEAT service management software to track IT-related issues.  The DEDQ team utilizes this software to triage and track tickets requesting updates to provider data.  Most frequently, requests to update provider data arise from physicians and their staff viewing the public NYU Langone find-a-doctor web profiles.  These profiles display data about providers for patients searching for physicians, such as credentialing, education, practice locations, languages spoken, and clinical specialty, among others.  These data elements feed the website from authoritative data sources which have been designated as part of the provider master data domain.  Anyone requesting a change or update to provider data submits a request to the NYU Langone Health web team, which then assigns the tickets requiring a source data update to the DEDQ team.  Requests are also submitted through a service catalog form on the internal employee website.  Requests submitted by this method also generate a ticket within the FrontRange HEAT application for the DEDQ team to address.


The DEDQ team has collaborated with all data owners and stakeholders involved with authoritative data sources for provider data.  Service level agreements have been established and agreed upon with all data owners.  Pertinent data change request information is compiled by the DEDQ team and sent to the appropriate data steward group on a weekly basis.  Data owners have agreed to process the requests and report back to the DEDQ team within five days of receipt.  If a request cannot be completed as initially requested, the reason is indicated.  The DEDQ team then verifies the data change and notifies the person who initially submitted the change request.

Metric Capture and Reporting Process
with Trifacta and Tableau

Development of the Metric Capture and Reporting Process

In order to track progress and identify opportunities for improvement within the provider data change management process, several metrics have been developed for distribution on a monthly basis.  Initially, the metrics were developed manually by exporting raw data from the FrontRange HEAT system utilizing excel.  In an effort to make the process more efficient, Trifacta data wrangling software was explored.  Raw ticket data in its original form required cleansing and preparation prior to metric calculation.  With Trifacta, the DEDQ team developed an automated methodology to quickly and efficiently prepare the data for calculation.  A recipe was developed in Trifacta, whereas every month an updated data is replaced for automatic combination, standardization, and preparation.  In other words, Trifacta wrangles the raw data into a ready-to-use data set. With this new process, data preparation can be completed within seconds. Instead of using pivot tables to show our metrics, the DEDQ team uses Trifacta’s capabilities of transforming the data into a structure that is usable for visualization in Tableau.

Process Details


While working on tickets, we add tags to the summary field to categorize the data element that the request wants to change. This allows for much quicker counting of the tickets once we begin the metrics process. We start with exports of the ticket data from the ticketing system. We then import them into Trifacta to create a single, unified dataset to work in. To begin cleansing the data, we filter out any duplicate entries. Since we are combining tickets opened in a given month with tickets closed that month, there are bound to be some repeats. Next we extract the tags that are added while working on the tickets as an array into a new column, and create new rows from the arrays so that each row will only have a single tag in this new column (Figures 4 and 5).


Figure 4: The data set mid-transformation, showing intermediary steps of tag extraction, column list_ch_types.


Figure 5: The data set showing finalized tag extraction, column list_ch_types.

Using a lookup table, we are able to add the source system that each update type is made in. We are also able to read the data in other columns to properly tag tickets from another person who is not part of our team, but makes similar changes, and so those tickets get included with our metrics. We also want to determine how long our tickets are staying open so, using a DATEDIF function, we determine the duration of each ticket, from which we can calculate the minimum, maximum, and average durations. However, since we split the arrays of tags into rows, we now have some tickets that appear multiple times. To remove these duplicates from the average duration calculation, we use a window function to compare the ticket numbers, and only use unique ticket numbers in the average. The last steps in Trifacta are to create counts of the number of tickets that are opened and closed that month for each tag. The final dataset is then exported so it can be used in Tableau visualizations (Figure 6).


Figure 6: The data set after transformation in Trifacta.


Trifacta and Tableau have partnered together to bring the most out of data wrangling and visualization. The DEDQ team is taking advantage of the benefits this partnership offers to create visualizations for provider data change management metrics. Trifacta has the capability to discover, structure, clean, enrich, and validate the data for downstream visualization and analysis.  Trifacta allows us to save the wrangler results in Tableau Data Extracts (TDE), which are files in a format compatible for ready access within Tableau.  We are then able to quickly analyze, visualize, and share with data stewards the monthly metrics for providers change management (Figure 7).  Metric visualizations as displayed in a Tableau dashboard can be seen below (Figures 8, 9, 10).


Figure 7: Overview of data flow for the metric development process.


Figure 8: Metric chart developed in Tableau demonstrating the total number of tickets opened and closed on a monthly basis.


Figure 9: Metric table developed in Tableau demonstrating the total number of tickets opened and closed by source system, on a monthly basis.


Figure 10: Metric table developed in Tableau demonstrating the total number of tickets opened and closed by data element, on a monthly basis.

Summary and Outlook

Business Value & Lessons Learned

Metric development within Trifacta and display with Tableau has allowed us to quickly and easily gain insight on the processing time of provider data change requests.  Automating the cleansing of raw ticket data through Trifacta has drastically decreased the amount of time it takes to generate the metrics each month.  Development of these metrics has allowed us to demonstrate a significant decrease in time change request tickets are closed in, prior to the change management process being established, as well as share these types of insights with data owners and stakeholders involved.

After generating the metrics month after month, it became apparent that in the Summary by Master Data Elements metric table that location attributes consistently had the highest volume of tickets, more so than any other attribute type.  This trend gave us reason to take a deeper look into what could be causing such high numbers of location attribute changes.  After investigating, it was clear that there was not a set location data model to be used on the enterprise level and triggered a project to create a metadata repository for location data and develop data quality routines and scorecards to further address location related data quality issues.  These efforts were made possible by the insights provided by the change management metrics developed in Trifacta and Tableau.

Future Considerations


Recently, Trifacta launched a web browser version of their Wrangler tool so users will no longer need to download the desktop application. Instead they will sign in using their browser and upload the files to be wrangled. The functionality is the same as the desktop version, but the desktop version is no longer receiving updates, and it will become inaccessible at some point in the future. The most notable change between the two versions is the toolbar at the top of the tool. In the desktop version, the only way to add steps to a recipe is to press the new step button and type what you want to do. Now, many of the features can be accessed by buttons in addition to the previous method.


Figure 11: The Trifacta Cloud toolbar.


Up to this point, the metrics developed with Trifacta and displayed with Tableau have been distributed via email, with screenshots of the dashboards, to all pertinent data stewards and stakeholders on a monthly basis.  The DEDQ team plans to explore utilizing Tableau Server to set up automated email sends with embedded dashboards.


Trifacta allows for automated data cleansing and preparation of raw ticket data that is used for metric calculation and visual display in Tableau.  Quick and efficient metric calculations for the provider data change management process at NYU Langone Health allows the DEDQ team to share process insights regularly to data owners and other stakeholders.

Share this post

NYU Langone Health System

NYU Langone Health System

This paper was written by Christina DaSilva, Jonathan Hess, Rosanny Duran, Kelly Gracia and James Smith of the Department of Enterprise Data Quality at NYU Langone Health System.

scroll to top