Hands down one of the most frequent observations when walking the data factory at different clients is the excessive use of spreadsheets for data collection and purification. These spreadsheets are part of a critical data enrichment process for getting reports out the door on time. However, these same spreadsheets represent a significant control problem exposing a company to problems inherent with inaccurate data.
It’s time to address the elephant in the room and stop avoiding this conversation. We understand the reluctance to acknowledge data gaps, especially after spending millions of dollars on new transactional systems or data warehouse projects. We often hear, “The new system was supposed to eliminate the need for a spreadsheet.” Welcome to the world of business and technology project management.
The solution requires controlled data enrichment outside of a company’s transactional systems. In fairness to people and their spreadsheets, it simply is not practical to provide an automated or transactional source for all data given the complexity of business processes and vast availability of data. In most cases, we are talking about the missing data sources for 2 to 5 percent of a company’s data. The trouble is that this data often represents a high risk of financial reporting errors.
Technologists attempt to solve this problem with master data management systems (MDM). MDM is a powerful tool assuming the integration is seamless throughout the organization. More likely is a well-designed MDM solution to tackle a specific data problem such as customer or product masters. In this case, holistic integration with other systems or warehouses falls short, leading to stove-piped data flows and ultimately an increase in the use of spreadsheets. Here the cure is worse than the problem.
As Lean Governance practitioners, we propose a solution to reduce bad data risk, one that transforms your raw data assets into finished goods (reports and models) with the least amount of time, effort, and resources. We have previously written about the value of an integrated data governance framework, one that completely defines the relationship between a business and its data as a major metadata puzzle. The result is total awareness of the governance parameters across the business and its data factories.
This integrated governance framework encompasses the complete definition of enterprise governance including organizational structure, information, data assets, business glossaries, business process, security, data loss prevention, and data quality controls. The real opportunity here is to add a very simple yet powerful technology and process component to this framework stack — a data enrichment module built around the core governance metadata that defines a company.
Data needs to be enriched or supplemented outside of production systems. Anyone who disputes these needs only to walk through their data factory to see the control gaps that exist. MS Access and MS Excel remain two of the most widely used data processing tools in the industry. Data enrichment can involve very complex user-developed applications. Companies have annual goals (often not met) to eliminate a specific number of end-user computing (EUC) or user-defined applications (UDA) that are designed for data enrichment, collection, or processing.
A data enrichment facility incorporated into an overall governance framework should be deployed in the context of required SOX or other controls. If the overall implementation cannot be designed to be under production control, it will be of little use. Locking away a spreadsheet on a secured drive does little to minimize overall data risk given the reality of the chain of control over the file.
Data enrichment can be a business tool that goes beyond traditional applications. There are times when IT needs to enrich data across the organization with domain codes, tags, or other taxonomy groupings. We have eliminated countless spreadsheets using tactical data enrichment, allowing organizations to certify their business processes in a control framework.
The last area concerns auditability and access control. Answer the key questions: 1) Who changed what, and when? 2) What were the previous values? and 3) Which business user has access to alter a certain data domain? To achieve Lean Governance, you want to retain lineage across your entire data factory. Sorry to say, lineage usually stops with a spreadsheet. When loading a transactional system or data warehouse your data enrichment facility becomes the production data source.
It’s time to address the elephant in the room. Walk your organization’s data factory looking for spreadsheets used for data enrichment. Bring them into full view with an eye on compliance and controls. Challenge your governance technology team to incorporate data enrichment into your organization’s governance framework. If they give you that puzzled look — like my old beagle with her head cocked sideways — be ready for a frank discussion concerning a new approach to data and information governance.
Note: I happen to love elephants and have a deep respect for them. Their size, which makes them so amazing, is likely the reason they have been used to describe something so big in the room that no one can miss.