Data Quality: what it is, why adopt it and how to apply it

DataQuality che cos'è, perchè adottarla e come applicarla

According to the Global Data Management Community (DAMA), Data Quality consists in “the planning, implementation and control of the activities that apply quality management techniques to data, in order to assure it is fit for consumption and meets the needs of data consumers.

But why should a company adopt a Data Quality system?

According to Gartner (Gartner: “5 Steps to Build a Business Case for Continuous Data Quality Assurance”, 20 April 2020, Saul Judah, Alan D. Duncan, Melody Chien, Ted Friedman) poor data quality destroys business value. Recent research shows organizations estimate the average cost of poor data quality at $10.8 million per annum. This number is likely to rise as business environments become increasingly digitalized and complex”.

Data are at the base of every business process. That’s why the quality of the collected, stored and used data inevitably impact the business today and tomorrow. In other words, poor data quality destroys business value, because data leads to the information that constitutes business knowledge and insights. In turn, they bring competitive advantages and guarantee market positioning. Data can be compared to a foundation of a building. Only if it is solid, we can expect to withstand even earthquakes.

Dates of birth in 2190, sequences of identical VAT numbers, addresses with only the name of the street indicated – these are but a few anomalies found in companies’ databases. A wrong address can result in failing to contact a existing or potential customer and thus generate a loss. If incorrect data is used in determining the risk profile, yet other consequences will follow. Even more dangerous is creating reports for the company’s management using incorrect data. Those may lead to “distorted” strategic decisions and impacts on the organization’s financial performance.

Besides, all this creates lack of employees’ trust in the data, undermining their credibility and use.

To be competitive, it is essential to build a data quality verification system. It will ensure data reliability for the expected business use respecting the process cut-off time. It should also activate well-constructed diagnostics and removing structural anomalies.

What are the application modes and the major control criteria?

Creating a Data Quality system is a long-term task. If we went into the detail of every phase, we would end up writing a book, not an article. However, we can give a summary of some key steps:  

  • defining a company policy with the “rules of the game” for all the actors involved;
  • determination of a driving environment to identify the data present in different process phases. These data undergo appropriate transformation and control phases using a system of rules, also those in natural language (technical rules, such as data format verification, and business rules, such as whether a repaid loan has a zero balance, or reconciliation rules, after the data to be compared has been appropriately normalized);  
  • maintaining the system in full operation and monitoring data quality trends using a number of indicators;
  • potentially take actions to remove the detected anomalies and improve the structure;
  • expansion towards new destinations of use.

The most time-consuming part is probably the definition of control systems. They verify whether the data complies with certain criteria, produce results and allow to intercept anomalous data. In 2013 DAMA UK (DAMA-DMBOK Chapter 13) identified six dimensions to which technical and business controls should converge:  

  • Completeness: the percentage of stored data with respect to the potential 100%;
  • Uniqueness: no instance (thing) of an entity should be registered more than once based on how it is identified;
  • Timeliness: the level to which the data represent the reality at the requested time;
  • Validity: the data are valid if compliant with the syntax (format, type, range) of their definition;
  • Accuracy: the level at which the data correctly describe the corresponding “real world” object or event;
  • Consistency: the absence of difference when comparing two or more representations of a “thing” with a definition.

Naturally, the above criteria are only a selection of a broader set of Data Quality criteria known in literature.

How to structure Data Quality metrics and indicators?

The correct and full operation and performance improvement of a Data Quality system are impossible without a set of available measures. You cannot improve what you cannot measure. A metric system must reflect the basic informative needs:

  • it is a good practice to select a few key measures and focus on them the reporting activities. If it is true that “what cannot be measured cannot be managed”, it is also true that “measurements cost”;
  • in general, it is good for the metrics to be to the greatest possible extent systematically united, that is to be cohesive and connected to one another via a logical scheme. It is good to always look for consistency in metrics’ terms and definitions;
  • the system must be balanced. That is, it must include various types and perspectives, weighted for representativeness;
  • it is a good practice to subdivide the metrics into groups and related types;
  • the purpose of metrics in a Data Quality system is not to measure the productivity of the people and stimulate competition between people/offices, but to measure the quality of the product (the data) and the processes. For example: instead of measuring the number of validated streams per day, it is better to measure the number of streams without errors. Measuring people’s performance can also be a temptation. Yet it is definitely one of the most damaging things for a quality initiative. In the end, the only acceptable performance measurement is that at a working group level;
  • a metric should always be validated empirically in a variety of contexts before being published.

Why is Irion EDM a valuable technological support for an effective Data Quality system?

The number of data scopes to control and the number of controls to manage, qualify, measure and periodically run continues to grow. Hence it is vital for a quality system to use technological tools. They automate the most challenging phases, such as periodic execution of controls, evaluation of quality metrics, production of reports.

Irion has carried out many projects in this field. We developed a platform that allows to reduce the preparation time and accelerate the application of control procedures in full compliance with the company policies. Examples?  

  • powerful control engines that perform 2.5 million controls per minute, verifying over 60 million records;
  • a flexible and collaborative Data Quality Governance system that supports the interaction between different data specialists;
  • an effective system to manage poor data quality issues and remediation;
  • a module that allows to adopt the already tested metrics, or to define, calculate and analyze any type of indicator on any type of business process;
  • automations that generate technical rules based on metadata in a smart way and in a few seconds.
Related links

Want to know more?

Delve deeper into the subject:

Data Governance Conoscere e governare gli asset informativi aziendali

Data Governance
Get to know and govern the business data assets in order to sustain the Enterprise Value

by Mauro Tuvo

Data Governance Le dimensioni del Data Lineage

Data Governance
Data Lineage Scope

by Mauro Tuvo

Leggi il whitepaper di Giovanni Scavino Irion EDM in Depth

Irion EDM In-Depth
Reducing time and costs in Enterprise Data Management projects

by Giovanni Scavino