Data collection is getting more dispersed and voluminous every day. Corporate entities are constantly merging or splitting, and new lines of business are created in order to meet changing business challenges.
Enterprises create and collect information from a variety of data sources which may include websites, mobile devices, customers, vendors, and other numerous sources. These data points are indispensable for creating a complete detailed view of the business.
Analysis of the data collected could expose hidden opportunities that may gain a competitive advantage or pending trouble if not addressed quickly. Data is only valuable if you have a way to make sense of it. Organizations should understand where data is located, what it means, how it moves, when it is used, and who uses it. It is important to address these data management functions as part of an overall data strategy.
What is the Data Called?
When communicating between parties it is important to speak the same language or have a way to translate. Business speaks its own lexicon while technology speaks another. Let’s talk about names for a minute. Creating a name is fundamental to being able to find or locate data. From a business perspective, a name is a business term. Business should agree on a common name for data elements that are important to the business. For example, birth date, birthday, and date of birth all refer to the same idea. Agreeing on one term makes connecting data simpler. The same could be said for information technology. Technology has some constraints imposed by software. Data storage software may have a maximum character count of 30. It is common to abbreviate words to accommodate the limitation. Just like business terms, a common abbreviated name helps simplify connections. For example, customer could be abbreviated as CUST or CSTMR. Searching for the location of an instance of customer is easier if the abbreviation is standardized.
How Do You Describe Data?
A consistent name will help find data elements, but it does not help with understanding. The glue that binds names is meaning and meaning comes from definitions. Clear consistent definitions go a long way in establishing a common understanding. What is the difference between a customer, client, and consumer? It all depends on the meaning. Names that differ but have the same meaning are called synonyms. Synonyms can cause a great deal of confusion. Defining things helps route out the synonyms or clarify the nuances that make the names different. The opposite problem are the homographs, often called homonyms. Homographs are words that are spelled the same but have different meanings. The difference in meaning is usually based on a difference in context.
Context is everything when it comes to definitions. Here is a simple example, the business term ‘order’ could be defined as “a confirmed request by one party to another to acquire goods or services.” From the Sales perspective, it means ‘revenue’ but from the Purchasing perspective it means ‘expense’. The meaning in this case is not consistent and completely opposite from one another. This is when a Business Glossary comes in handy. The Business Glossary helps with meaning but also provides some structure. An Order could be classified as a Sales Order or a Purchase Order with references to related terms like Contract or Agreement. This structure is known as a taxonomy.
A collection of technical data descriptions is known as a data dictionary. The dictionary is oriented towards the structure of the data. It answers questions like “how many numbers or letters will fit in this field” or “what word or number choices are allowed in this field”. For example, an ‘order status’ could have choices like back ordered, pending approval, or delivered. A good set of standards can help in moving data around later.
How Can Data Be Used?
Business rules determine who, where, and how data can be used. Business is governed by a wide variety of rules. Business rules can be external like government legislation or internal like best practices. Rules are especially important when it comes to identifying and securing private and personal data. Most businesses want to reduce their exposure to risk while maximizing revenue and minimizing expenses. Control over who has access to data, where it is stored, and how it should be distributed helps reduce the risk.
How Does Data Get Created?
I previously mentioned that governance should be applied to who has access to data. Control should also be exercised over the creation of data. The proliferation of duplicate data in various areas of the business adds to the risk exposure and decreases trust in data accuracy. Data about business subjects should be created and managed closest to where it originates.
For example, data about customers should be created and managed in the system where customer is first known to the enterprise. Usually this is a Customer Resource Management (CRM) system. All other systems, such as Billing or Marketing, needing customer information can access it as read only. The system that creates and manages the data is sometimes known as the System of Record (SOR). The SOR becomes the trusted source for data relating to that business subject.
How Does Data Move?
Capturing data in the SOR gives us data points in time. That is, the accumulation of business facts that tell you what happened on a certain business day. Business wants to monitor how things are happening over time. In addition, they want to augment SOR data with external data to see trends and make predictions. Data moves from where it is captured, the source, to where it can be analyzed.
The best data governance program cannot account for data created outside the enterprise’s control. These are instances when data needs to be examined against quality standards. Quality standards might refer to the actual data or the structure of the data. If the standard is not met, the data needs to be modified to conform to the standard.
This is a process commonly known as Extract, Transform, and Load (ETL). Extract is the process of copying relevant data from the source or sources. Transform is the process that modifies the data. Load is the process of inserting the conforming data into the target data storage. The ETL process is a technology process done by software machines. The transformation process is governed by rules and standards.
The business rules, defined in the language of the business, are codified into software instructions that act upon the data or the structure of data, also known as metadata. A common example would be the transformation of dates. Businesses have many occurrences of dates like hire date, last update date, birthdate, and so on. Businesses in the United States format dates as month/day/year while Europe uses the more logical day/month/year format. Imagine the confusion when confronted with the date 4/3/19. It could be April third or March fourth. Once transformation is complete the data is suitable to be used by the business for analysis. Moving data for analysis is just one example of why data moves.
Data can migrate from one technology platform to another. Technology is always changing. At one time data was stored in a single place called the mainframe. Technology advanced and networks became prevalent, so data was stored in many places. Now the internet has advanced to the point where data can be stored in virtual cloud locations.
Managing the Complexities
We just examined the various data management contexts. It is evident that data touches a lot of areas which creates a rather complex web of interconnections and extra-connections. How can we identify what process creates data, what is its name, how do you describe it, where is it stored, who uses it, and what rules govern its use? The simple answer is software tools. What kind of tool is best suited to this task? That is the more difficult question. Let’s look at three possibilities.
Data modeling tools have been around for a long time and for good reason. Data modeling tools excel at graphically depicting data structures and the rules that connect them. They can collect business definitions as well as build data dictionaries. Certain data modeling tools can create maps between data fields. Data modeling tools are not so good at documenting the business processes that create, read, update, or delete (CRUD) data.
Enterprise Architecture (EA) tools take a broad look at the big picture. EA tools can connect the people with the process to the software systems that capture, manipulate, and store data. They can show data elements that appear on reports, computer screens, and web forms. EA tools are not so good at capturing data structure and physical storage of data like data modeling tools do.
Data Governance/Cataloging tools are metadata connectors. DG tools can create the connections between the data dictionary and the business glossary. They are very good at documenting the business rules as well as the transformation rules. They can show who is responsible for what data. They fall short when it comes to showing the CRUD processes.
There is one thing they all have in common and that is the ability to generate an impact and lineage analysis view. Data lineage and impact analysis tracks the life cycle of data as it is taken in, processed, and output by the system. This functionality provides visibility into the data flow and simplifies tracing errors back to their sources or down to their targets.
Summary
Data creation and collection is not going to slow down anytime soon. The complexities of aligning data connections within the organization is going to be a daunting task. Data tools provide a method to understand where data is located, what it means, how it moves, and where it is used. Data is only valuable if you have a way to make sense of it.