This online publication, The Data Administration Newsletter (TDAN.com), will celebrate its twentieth anniversary in the coming months of 2017. Twenty years is a long time. Think about where you were in life twenty years ago. Think about where your career was and how far data and information systems have grown in becoming an integral dependency of all life functions. It’s all in the data.
Technology growth has been a big part of the data explosion. I pride myself in staying somewhat technically savvy but I also recognize that there are many technologies that I know little or nothing about. Relational databases are still being used while NoSQL and Big Data databases more than dominate the landscape. Data Warehouses and Business Intelligence reporting have turned into the use of complex data mining algorithms, the use of the internet, the use of the cloud, blue tooth and the transfer of data between devices, the use of wireless services, the Internet of Things, … data and information have overtaken every aspect of our lives.
Organization struggled, twenty years ago, to manage their data and database environments. To manage the data environment back then meant to focus on the definition of the data through data modeling and relational databases, to focus on the production of the data through system and software development and controlled edits, and to focus on the usage of data and only allowing data access to approved people.
The More Things Change …
Some things have not changed. In fact, … the same three simple actions (definition, production and usage) are still the ONLY actions being taken on data today. However, the complexities and the technologies associated with defining, producing and using data HAVE changed significantly. And some organizations are slipping further into the data management discipline void. This is a problem. This is where Data Governance often leads the way out of the darkness.
Let’s take a brief look at how actions have changed over twenty years:
Data Definition
Data definition before meant that your organization focused on data modeling and that metadata management, data dictionaries and business glossaries were important. Back then, knowing what data existed in each system, the definition of that data, and the lineage and linkage between data (logical and physical) was the key to building well understood data capabilities and meant that you had a chance at improving data quality.
Data definition today means the same thing. However, today’s organization is required to work with data sources that are constantly being redefined. Not only are there a dramatically increased number of data sources, these sources take the form of structured and unstructured data that often is unstandardized and not understood.
Observation: The same problems exist as existed twenty years ago. The complexities are much greater and a concentrated effort to focus on improved data definition are an important part of an organization’s data governance effort.
Data Production
Data production before meant that the information systems that were being developed played a key role in controlling how data was produced. The screen edits and the data integration transfer quality checks were integral to evaluating the data you were receiving and exceptions were marked and managed.
Data production today means the same thing. Well … almost. The problem is that data sources are more plentiful in more formats and they lack proper metadata while entering the organization at break-neck speed. Human data errors continue to plague data production streams as businesses become dependent on data sources that are unreliable leading to data quality problems.
Observation: The same problems exist as existed twenty years ago. The complexities, again, are much greater as organizations embrace big data and associated technologies. Devices that run our lives including mobile devices and anything associated with internet are some of the fastest growing data producers. Organizations governing their data as a valued asset must focus on managing the production of the data they are dependent on.
Data Usage
Data usage before was rather binary. Either a person was given access to the data or they were not. There were rules associated with who could see what and how data could be shared. Departments were set up to manage appropriate access. Occasionally organizations would be audited to demonstrate that they were knowledgeable about who they were giving access to the data. Data breaches occurred and information security was growing in importance. People used data as they deemed appropriate.
Data usage today means exactly the same thing. That is, now the whole concept of using data has exploded. There are many more people watching. The rules are more plentiful as regulators are asking for more and very specific data… And customers are demanding more data, and that their data be protected… And your own company plans to analyze and make better decisions from the data… Are all growing in leaps and bounds every single day.
How people are using the data they can access is changing too. The people that want to use the data for your supposed benefit are being heavily scrutinized. The people that want to use your data for your detriment or their gain are finding more and more ways of destroying people’s livelihoods based on what they can learn about you.
Observation: Again, the same problems exist as existed twenty years ago. The complexities, technology and ethically based, are greater than ever and are calling for increased levels governance. The world of privacy, protection, regulatory control and compliance, to name a few, will require that organizations have auditable forms of data governance to stay in business.
The Same Problems Exist as Twenty Years Ago
What have we accomplished in the past twenty years? For one thing, the importance of governing data to operate an effective, efficient, and profitable business is recognized now more than ever. Increasing numbers of organizations recognize that data must be managed and governed as a valued enterprise asset.
Organizations are hiring Chief Data Officers at an increasing rate and it appears that more money than ever is being allocated to manage the data environments within these organizations. That does not mean that there are not great opportunities for improvement even in the best of managed data environments.
The staples of data management must still be addressed. Organizations that are governing their data must focus on how the data is defined, produced, and used, and they must address the complexities that are new and constantly evolving in order to get the most out of their data.
Heck, twenty years ago we were just getting started paying for the data we used. Now we pay for data in many more ways than one. It’s all in the data.