Many organizations are focusing their data management and “data as an asset” governance programs on improving data for analytical purposes. Several of my present clients are following this trend. One problem that they are facing is that their data has been unattended for many years, leading to lower than desired levels of data quality and an inability to integrate the data in a cost-effective manner. Improved analytical capabilities begin with quality data that is defined, produced, and used effectively.
The quality of data available to provide analytical capabilities is a sore subject for many organizations. Every organization wants to be able to predict customer behavior, improve efficiency, and effectiveness of their supply chain, reduce production costs, … or whatever the business use of “good data” may be. It’s all in the data.
– – – – –
This column is the first in a three-part series that will address what it takes to achieve “good data.” For this column, I will focus on the first of the three areas where organizations can simply and logically break down the activities that are required to achieve “good data.” I have mentioned these activities in the past. The three areas include:
- Improving Data Definition
- Improving Data Production
- Improving Data Usage
The activity of improving data definition is a vital part of improving overall data discipline. I am starting this series with the activity that becomes the underlying determinant of quality in the production and usage phases of data. I will explain as I go along. Stay tuned for those columns in the next few issues of TDAN.com Magazine.
Improving Data Definition
Start with Business Glossaries, Data Dictionaries, and Metadata Management
These three items are past, present, and future industry buzz words. Organizations create these metadata (data about data) records of business terminology, database description, and end-to-end knowledge of the data that their information systems contain to provide improved knowledge and understanding of data and data-related assets. There are many webinars, white papers, and articles on what to include in these resources.
To take advantage of these resources, people are required to have the responsibility to consistently and methodically capture, validate, share, and maintain information about the data that our organizations use to operate the business, report accurately, make good decisions, and provide improved analytical capabilities based on knowledge and understanding of the data being analyzed.
Practice Data Modeling Best Practice
Wikipedia tells us that data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Data modeling is also a process of relating data to other data for purposes of bringing data together to solve business problems, as well as address business operational needs. Data modeling tools provide the ability to capture these requirements through data and turn the requirements into physical data stores that become the backbone of the business.
To take advantage of what data models can provide, people must be responsible for consistently and methodically modeling the organization’s data. Organizations that follow the Agile approach to system development, software package implementation, and enterprise transformation must become convinced that good data is a result of a disciplined approach to defining the data in a way that satisfies their Agile methods. There is work to be done on this.
Build Out Data Catalogs
Data catalogs are not as well defined or industry-wide as glossaries, dictionaries, and metadata. Data catalogs often contain information about how data is being used across the organization. Simply stated, a data catalog focuses on the data that is readily available to your business communities. This data can reside in data reporting tools (lists of available reports) or documentation about data in the data warehouse, data marts, data lakes, or basically wherever you have been storing your data for people to consume.
To take advantage of data catalogs, people must consistently and methodically document the data that is being available for business consumption. Data catalogs can become a valuable resource for people that repeatedly say that they need better access to the data. I am guessing this is said in your organization.
Conclusion
In a recent TDAN.com piece I wrote about organizations having the Data Flu and the symptoms of “sick” data or data that is not fit for purpose. In this column, I have started to simplify some of the steps required to turn average data into “good data” for analytical purposes. I have included the list here, as part of the activities associated with good data definition, but it only scratches the surface of activities that will lead to improved analytical capabilities.
There are two consistent thoughts carried by each item in the data definition list. The first is the need to execute and enforce authority over the management of data, meaning that the activities to build and provide these data resources must become built into how organizations act. The second is the need to formalize accountability for the actions of governing data. Data Governance, or as many organizations are now stating – managing “data as an asset” – is becoming accepted widely as the only way to achieve “good data.”