Tips to Create Effective Data Quality Rules

ART02x - edited feature imageYou know you can rule the quality of your data. So why don’t you? Read on to learn simple but effective tips on how to make superb data quality possible.

The bad news is: Low-quality data costs companies about $15 million per year, according to Gartner’s Data Quality Market Survey.

The good news is: Your business can avoid such losses if you manage to improve the quality of your data. And here, we share 5 tips on how to do this by creating effective data quality rules.

What are data quality rules
and why do you need them?

Data quality rules are the requirements that businesses set to their data. These requirements are aimed at meeting two interdependent objectives:

  1. To define the format the data should comply with and the dependencies that should exist among data elements.
  2. To serve as references for a business to measure and check the quality of their data against these requirements.

An example to make it clearer

Let’s say, you are planning to launch an email marketing campaign. Striving to optimize your efforts and costs, you decide to sift out invalid emails. To do that, you create simple rules:

  • An email must contain the ‘@’ sign.
  • ‘@’ must be used only once.
  • An email must contain any or all of the following: letters, digits, non-alphabetic characters, such as, ! # $ % & ‘ * + – / = ? ^ _ ` { .

By sending messages to the verified emails only, a business will be able to improve its communication with customers, as well as better assess the campaign’s success in terms of email open rates, click-throughs, etc.

However, this set of email data quality rules is by no means exhaustive. For example, if you conduct targeted marketing campaigns, it makes sense to create one more rule. This will help you avoid an embarrassing ‘Dear N/A’ or ‘Dear …’ in the opening of your message:

  • The ‘Customer’s full name’ field an email refers to must not be ‘Null’.

If you want to verify the spelling of the name to address the customer properly, you can come up with extra rules:

  • Customer’s full name must consist only of letters; no other characters allowed.
  • Only first letters in customer name, middle name (if any), and surname must be capitalized.

Tips to create effective data quality rules

Using tips shared below, I hope that you will be one step closer to setting up an effective data quality management process.

1. Involve subject matter experts from various departments

Different departments have diverging priorities. To devise comprehensive data quality rules, you should correctly define all the subject matter experts and wisely integrate their requirements. If you don’t look at your data from different departments’ perspectives, you may undermine all your data management efforts. For example, a bank’s Lending Department will consider a loan amount, a loan expiry date, and a monthly interest rate as critical data. It’s likely that they will think that a prefix that goes with the customer name or the branch a customer visited are not worth filling in. On the contrary, their Marketing Department won’t be interested much in loan-related information but they will find customer data highly important for communicating efficiently and for creating a single customer view.

2. Be moderate with the number of rules

In the fight for improved data quality, it’s important not to get carried away and end up with too many rules. You can significantly reduce the system’s performance this way, as you need more computing power and time to run a check. Checking the field against 10 rules is not the same as checking it against 100. So, you should find a good balance here.

3. Favor a step-by-step approach

You don’t have to create rules that cover all your data. Neither do you have to solve the issue in one go. You should classify your data and define critical elements that require your immediate attention and maximum efforts. For example, to run targeted marketing campaigns, ‘Customer name’, ‘Date of birth,’ and ‘Email’ fields are of critical importance, while home address can be considered as good-to-know extra information. After you finish with critical data, you can proceed with #2 priority and so on. In a word, imagine that you run a relay race, not a sprint, and act accordingly.

4. Treat each field of the database individually and create rules accordingly

There are multiple data quality characteristics out there, and your task is to decide which ones are the most appropriate for a particular field. Let’s take ‘Employee full name’ and ‘Employee contact number’ as an example. The first field contains critical information, while the second doesn’t. So, ‘Employee full name’ should satisfy completeness, uniqueness, and accuracy requirements, while ‘Employee contact number’ – accuracy and orderliness. These characteristics should be reflected in data quality rules, for example:

  • Employee full name must not be N/A (to ensure completeness).
  • Only one ‘Employee full name’ must correspond to one ‘Social security number’ (to ensure uniqueness).
  • Employee full name must include at least one space, must consist only of letters, no figures or other characters allowed (to ensure accuracy and completeness).
  • Employee contact number must include only digits (to ensure accuracy and orderliness).
  • Employee contact number must be in the format +1 NXX-NXX-XXXX, where N means digits from 2 till 9 and X means digits from 0 till 9 (to ensure accuracy and orderliness).

5. Decide between a centralized and a local storage for your data quality rules

If you are a large business with multiple directions, you should decide whether you want to store the rules centrally or locally with each business unit. Each approach has its advantages and disadvantages. For example, in case of complete centralization, you’ll have a standard approach to capturing a customer’s name no matter what business they interacted with. In case of decentralized data management, you can get more flexibility, as you’ll focus only on the data rules that refer to a certain business direction.

Let’s quickly recap

If you decide to manage the quality of your data, you’ll inevitably have to deal with data quality rules. Here, we shared five ideas on how to make these rules effective:

  1. Consider the input from different departments.
  2. Don’t create too many rules, otherwise this can hurt your system’s performance.
  3. You don’t have to do everything in one go.
  4. Choose individual data quality characteristics for each field of the database.
  5. Make a centralized vs. local storage decision for data quality rules.

I hope that these tips will come helpful on your path to effective data quality management.

Share

submit to reddit

About Irene Makaranka

Irene Makaranka is a Data Analytics Researcher at ScienceSoft, an IT consulting and software development company headquartered in McKinney, Texas. With a focus on business intelligence, big data and data science, Irene explores trends, technologies, key challenges and solutions in the world of data analytics.

Top
We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept