Reference Data Management often struggles to be governed as an enterprise data asset. Many organizations still view Reference Data as an individual application problem and thus Reference Data is often an application afterthought. The Data Governance team has to ask questions such as:
- Who is accountable to source, maintain, and distribute reference data?
- Are the domain data values isolated to one application, or is it common across multiple applications and business units?
- What are the implications for Data Consumer usage, management reporting, analytics/Big Data, and regulatory reporting?
- Is the management responsibility at a business unit, Finance, an application, the CIO or more common now the CDO?
For most organizations the answer is “it depends.” That is not a good answer to tell your audit team or industry regulators. Today Reference Data should be considered as a critical enterprise data asset. Not just for an individual business unit or application, but as an enterprise asset and governed as such.
Clarity is Important
Like all critical data assets, having a solid description of the language and terminology is important. First, what do I mean by the term Reference Data? Reference Data is data that defines or creates a context for other data. For example, Reference Data helps to identify characteristics of a Customer. Such characteristics as what industry the Customer is in, what country the Customer’s account was established in, where the Customer lives, where we send the Customers bills to, and the gender of the Customer.
I can remember when we did not have the concept of Reference Data (yeah, I’m that experienced but let’s not say old). We created the concept so we no longer had to change the application code anytime a new valid value was needed to manage business processes, or even when the business rules for data content validation changed. Reference Data is used in the business rules for the capture and management of data content; in the “dropdown boxes,” in the data quality rules, validation of the data contents, in data integration aggregations (data warehouse), as well as “roll-ups” and aggregations for Analytics and reporting including Big Data analytics.
The underlying issue may be that Reference Data is not always considered to be business data. Many IT and business operations individuals still view Reference Data as application level metadata. That is not, and never has been the case.
Some examples of Reference Data used to manage business rules and data validations include:
- Places – Location Data (Geographic and Political data)
- Continent, Region, Country, State/Province, County, City, & Postal code
- Census Tract, Economic Designation, Political Designation, Zoning Codes, …
- Things
- Monetary – Currency Types & Codes, Exchange Codes, Transfer Codes, & Account Type
- System or enterprise specific objects – GL Accounts, Cost Codes, Industry code, & Organization Hierarchy
- People
- Customer Type, Contact Type, Account Status Code, Risk Rating, Occupation Type
- Counter-Party, Supply Partner, Industry Code, Risk Ratings, Relationship Codes
Challenges of Reference Data
The governance of Reference Data has a number of challenges in most organizations. Let’s see which of these you can relate with.
- Lack of accountability or ownership as a data asset. “Just make up something for the application to work.”
- Confusion and/or conflict over accountability and sourcing responsibilities. “My business unit and budget should not be responsible for managing and supplying this data to the rest of the enterprise; you need it, you go get it.”
- Perception that Reference Data is only an IT problem to resolve. “Just add the value to the dropdown box.”
- Reference Data never changes so we don’t need a management infrastructure and processes. “We just need to get through QA testing and it never changes anyway.”
- We all know what it means so why define it. “Its just a loan application status code.”
- No Data Catalog or Sharing Agreements for Data Consumers to know what is available and how to use it. “Go look at the dropdown box and the application user manual done in 1994.”
- Reference Data has no relationships; it is flat with no dependencies or hierarchies. “All the metadata values are independently entered by the coders.”
- Very little (let’s be honest it’s none) data quality testing is done on Reference Data prior to its implementation in production. “It’s just a select box value.”
- There is a lack of Data Governance processes over the life-cycle of Reference Data. “What life-cycle; it’s not business data.”
- Reference Data values used across different business units or in Analytics and reporting can be different than those used in operational applications. “My hierarchies are better and more accurate than yours; but mine are for G/L accounting.”
Wonderful, So What Steps Do We Take to Govern Reference Data!
As always, we can’t start the “12 step program” until we accept that we have an issue and we want to resolve that issue. Both recognition and desire to fix the issue are critical. Nothing can actually happen until we have the desire to fix the issue. I suggest that it is the responsibility of the CDO or Data Governance team to raise the issue and drive recognition that a solution and infrastructure must be established. Acceptance of the issue and the drive to fix the issues is often considered an organizational change or cultural change challenge. Those concepts and actions are not new to most organizations so leverage what has worked in the past. The solutions for Data Governance issues like this are generally driven by a cross-functional Data Governance Committee. Awareness and communication across the organization are critical to resolve the challenges. The following is my 12 step program.
- I suggest that the Data Governance principles state that “all Reference Data will comply with the policies and operating practice of Data Governance (yes let’s start with the desire that “all” is an enterprise asset). Generally, we see the principles simply stated such as “all Reference Data will be managed as an enterprise asset similar to the management of our data content.” The policy level can then allow for some Reference Data to be managed in a centralized infrastructure while certain Reference Data is managed as a federated infrastructure. The Reference Data Policy can then define and describe the expectations and standards for Reference Data.
- Accountability and Stewardship roles and responsibilities should be identified and accepted. This will be by the type of Reference Data. For example, your Commercial business unit may be determined to be accountable, stewards, and the authoritative source for NAICS codes. They will assume the roles and responsibilities associated with NAICS codes.
- Harmonized business terms, definitions and understanding for Reference Data will be across the enterprise and managed in the Business Glossary. Each type of Reference Data should be identified in your Business Glossary along with the valid value set, and business rules. These will be for example, Country Code, Currency Code, and Standard Industry Classification Code. These processes for identifying, harmonizing, and management of business terms are complex, but I’m keeping it to one step in this list.
- Periodic data profiling is used to validate the Business Term and Business Definition of Reference Data. Data profiling should be used to validate the valid value set identified for the Business Term. The Business Term Name and Definition should be applicable for the value set implemented.
- Each type of Reference Data will have life-cycle management processes and infrastructure roles and responsibilities defined. The Reference Data acquisition processes, maintenance, timing and architecture must be well defined and communicated. One and only one source should exist for each type of Reference Data. It is fine if your architecture is centralized for some Reference Data and federated for other data.
- Authorized Sources and Authorized Distribution processes and systems must be identified (preferably in a Data Catalog). People need to know the definition, availability, where to source, and how to source the data they need.
- Data Quality management is used to ensure the defined valid values and business rules in the Business Glossary are physically implemented in the Reference Data Management infrastructure. A Data Quality dashboard is important for the Data Consumer community. Remember this dashboard will be limited in scope to the Reference Data, not the data content occurrences using the Reference Data.
- Data Consumers are identified and included in the accountability roles and responsibilities. This is a step that is an on-going process. The number of Data Consumers should expand as your Reference Data usage grows. Teams will come and go, but the accountability for the usage of specific Reference Data should not change. The accountability of Data Consumers usage of data is often over looked.
- Data Sharing Agreements should be established for consistent Reference Data usage including the processes for aggregations and “roll-up” hierarchies. Hierarchies must be considered a component of the Reference Data, as applicable. Two different valid value sets for hierarchies for the same Reference Data must be considered as two different concepts and managed separately. Else we have semantic confusion and reporting differences. Different Business Terms for the different hierarchies are likely important to identify.
- Issue management and issue escalation processes are needed for Reference Data Change Management. However, Reference Data does not need an issue management process different than the rest of your data. Use the processes that you have for the issue management for all data.
- Reference Data must be available to all authorized Data Consumers, either individuals or applications. Multiple distribution formats are likely necessary (such as Web service, XML or flat file distribution).
- While Reference Data may not have specific security or data protection requirements, you will likely have Data Retention standards applicable for each source. The Data Governance team can help identify and resolve these requirements.
Reference Data management is far more important and complex in most organizations as we consider the Big Data analytics, management/audit reporting and regulatory reporting environments we are operating under today. The Reference Data used in our operational applications has to be integrated with the data aggregation and hierarchies used in our reporting applications. Most of those are very separate applications today. Great Reference Data Management and Governance is critical to enable the alignment, as well as reduce the cost of business operations. Your Business Glossary will be an effective enabler of your Reference Data program. It is OK, stay calm and allow your Business Glossary to prosper.