The management of data is very similar to the management of one’s health. One way they are similar became very apparent to me recently: if you do not take care of your personal health, poor health has a way of catching up on you. Once the poor health becomes an issue, it can take a long time or even forever to get past the issues, depending on how far south your health has gone. For me, I am still young (young enough) – and I have been given a second chance to become healthier.
But what about your data? If you do not take care of your organization’s data, then your organization’s data health can go south as well. Or … If you have not been taking good care of your data’s health for a long time, you may have a serious data health issue on your hands that doesn’t just call for small changes to your data behavior here and there. Very poor data health might call for major surgery such as the re-engineering of your organization’s entire data infrastructure.
Gauge Your Organization’s Data Health
Just like when you go to the doctor, the first thing that must take place when gauging your organization’s data health is an evaluation of your present state. This must always take place before prescribing a remedy. There are several ways to evaluate your organization’s data health. Consider these three ways of evaluating your present condition:
Assess Best Practice
Organizations must take a “Ready, Aim, Fire” approach to gauging their data health, which requires the definition of best practices for each of the data disciplines that will be assessed. The steps to assess the organization include defining why the best practice makes sense for the organization, observing and recording present practices that can be leveraged to support the best practice, observing and recording opportunities for the organization to improve its data health, articulating the gap between the present practice and the best practice, and the defining the risk associated with the gap. Once this assessment and analysis are complete, it is possible to use this information to determine the appropriate path to achieve the best practices.
Use Industry Models
There are several industry models that can be used to evaluate your present data health. These models provide a basis for comparison to numerous industry-specialist defined data management disciplines. Three industry models that I suggest for your consideration include:
- Capability Management Maturity Institute (CMMI) DMM
http://cmmiinstitute.com/data-management-maturity
- EDM Council – DCAM
https://www.edmcouncil.org/dcam
- DAMA International – Body of Knowledge (DMBOK)
http://dama.org/content/body-knowledge
Ask Your Customers
Lastly, I suggest that you ask your business and technical stakeholders what they think of how you enable them through each data discipline. Many organizations survey their business and technical communities of interest regarding how they are doing and where they can be more effective. Typical surveys focus on customer satisfaction, value add, and return on investment.
Change the Way You Define Data
Your organization’s data health depends on the actions of how well you define, produce and use your data. Improvements in data management are typically tied to one or more of these actions. I will use these three actions to focus on ways you can change your data habits.
Model Your Data
Data modeling is an important discipline that is becoming a lost art. Data modeling was the healthy heart of data when I got started in the data management field and there was (and still is) good reason. Organizations that logically and physically model their data have well-defined data and data structures resulting in strong definitional metadata management and efficient and effective database design. The data modeling disciple is evolving as organizations extend into new technologies and methods including NoSQL databases, Big Data sources, and the Agile project management methodology.
Manage Your Metadata
Metadata management, or the discipline of managing what you know about your data, is extremely important, and provides relevance to the main information nervous system of the organization. It is impossible to have strong data health without a focus on managing metadata. Organizations struggle to leverage their data investments because of a lack of emphasis on building and maintaining business glossaries and data dictionaries. You don’t have to implement a centralized metadata repository to provide positive benefits, but it certainly does not hurt.
Involvement in Agile
The Agile project methodology is being followed by organizations that are looking to deliver high quality projects quickly, effectively, and incrementally. Organizations typically select high profile projects and projects that would take a long time to complete when they select this methodology, emphasizing speed over good data health. Data lies at the heart of these projects (i.e. ERP transformations, system consolidations, analytical transformations), which makes the attention to the quality of the data that much more important. Organizations evaluating their data health must focus on looking at the relationship between the management of data and the management of Agile projects to find common ground that applies the appropriate amount of time and resources to focus on the data.
Change the Way You Produce Data
The era of Big Data calls for organizations to improve their data health and increase their ability to find and use all forms of data including the growing numbers of structured data and unstructured sources. Organizations are finding ways to produce new and better data from old data every time they integrate data sources to solve a problem. The rate of data growth is astonishing. This growth makes it obvious that organizations evaluating ways to change data habits should focus on data production as an area that will lead to better data health.
Assess Data Sources
The ability to leverage ownership – or better said, stewardship of the data – is very important to the governance of data. Data sources that are important enough to manage and utilize for business decision making and improving operations, are also important enough to be well-documented and understood. Organization’s that are changing their data habits should put time and effort into assessing existing data sources for what they know and do not know so that they can fill in the missing pieces and improve their data health.
Control Entry Points
Letting bad data into systems is never a good idea. In fact, if we could stop this from happening it would solve all our data quality problems. So where do all the data quality problems come from? The control (or lack thereof) of the entry point of data is an important contributor to the organization’s data health. Whether that entry point is manual entry, data transformation, or new data sources, it is important for an organization to evaluate how well they manage data entry points when considering how to improve data quality, and thus data heath.
Manage Data Quality
Data quality can also be evaluated using the same three actions – definition, production and usage. Improvement to data health can involve improvement in the quality of data definition as discussed in the prior section of this article, improvement through quality data production as was just mentioned, and improvement in the quality of data usage through improved access, understanding, and protection. Organizations are making changes to their data quality habits by applying data governance – the execution and enforcement of authority – to all three data actions. This is because they realize that data quality is a key contributor to data health.
Change the Way You Use Data
Some organizations choose to focus on data usage habits first when looking to gain management’s support and conviction for data governance. Protecting sensitive data is easy for management to understand because they recognize this action as being “not optional.” The same holds true for following regulatory guidelines. Management knows that the rules associated with protection and regulations are being dictated to them. However, the use of data for analytical purposes is a conscious decision, often made by management, that requires serious data health and discipline. Assessing your habits associated with data usage, including how well the data is understood, classified and protected is critical to improving the value your organization gets from using its data.
Improve User Understanding
Improving the organization’s metadata management capabilities is one of the keys to improving the organization’s understanding of the data. Organizations often begin by focusing on business glossaries and data dictionaries because they represent the understanding of a small subset of important data such as data in their business intelligence, data warehousing, and master data environments. Therefore, this is a good place to start improving understanding by focusing on metadata health associated with the most important data. The development of a rich metadata management repository requires resources similar to those needed to build a data warehouse. Evaluate what your stakeholders need to know about the data, compared to what is already recorded and made available, to improve your organization’s data health.
Classify Your Data
These last two aspects of changing the way you use your data are tightly related. Before you can claim perfect health around protecting sensitive data, it is important to classify your data (i.e. highly classified, sensitive, public) based on the rules that must be followed for each of the classification levels. Data must be classified first, and the rules associated with handling data per each classification must be defined, communicated to the masses, followed, and enforced to satisfy a core component of governing your organization’s data.
Protect Sensitive Data
Data protection is a concern of every organization. Whether the data is personally identifiable information (PII), personal health information (PHI), intellectual property (IP) or personal information that falls under the European Union’s GDPR (General Data Protection Regulation), the rules associated with protecting sensitive data are forever changing and part of your data health depends on how well you protect your data. Changing your habits associated with protecting sensitive data is a requirement of all organizations that cannot be overstated.
Summary
In this article, I presented an easy way to look at changing the data habits of your organization. Organizations should focus on how well they define, produce and use data as the three primary actions that can be taken with data as they look to evaluate their present data health, as well as toward good health in the future. As I mentioned in the introduction, data health echoes people’s personal health. If health problems are ignored, they tend to grow more and more difficult to resolve. Let’s all raise a glass and wish for better health all around.