What is Augmented Data Quality and how does it work?

Che cos'è l'Augmented Data Quality

Augmented Data Quality means applying advanced functions to automate some Data Quality (DQ) processes with the help of “active metadata” and such technologies as Artificial Intelligence and Machine Learning. 

Many DQ tasks can be automated, for example profiling, data matching, automatic linking between entities, merging, cleansing, monitoring, automatic alignment between business and IT control rules, troubleshooting anomalies or poor quality warnings. Governing data means creating and maintaining the conditions that allow to have the necessary data when needed, ensure they are complete and accurate, and thus maximize the benefits of their use.  But what if the information is unreliable? If the data are erroneous, what will be the consequences for decision-making?

Augmented Data Quality aims at ensuring reliable and high quality data, which is vital to organizations. Its purpose is also to cut down on manual tasks in DQ practices, reducing human intervention in favor of automated workflows within processes and, consequently, saving time and resources.

Augmented Data Quality AI&ML “based” rule recommentdation

Intervention at the DataManagement4Value webinar, organized by Politecnico di Milano, by Gabriele Seno, Irion Product Evangelism & Advisory Leader,
May 13, 2021

How does it work?

Information has always been fundamental for businesses. But if, on the one hand, data value becomes more and more a factor of competitive advantage, on the other hand, the exponential growth of the available data makes it difficult to identify those useful at a given time and for a certain purpose. It also becomes challenging to understand data origin and accountability, verify their reliability and freshness, find out the possible regulatory requirements to comply with.  

In this ecosystem, according to Gartner, it is possible to enable Augmented Data Quality in three specific “areas”: 

  • Discovery. These features are developed using the potential of active metadata and reference data in distributed environments with a large number of (internal and external) data assets: on cloud or, if necessary, even multicloud, or on premise. These include the techniques to find where the data reside, to classify, for instance, sensible data for privacy purposes (automatically detecting them due to their own characteristics), or to reveal correlations between data residing in different sources.
  • Suggestion. Beginning with metadata, it is possible to profile business terms so as to suggest automatic enrichment of Data Catalog, propose to connect certain attributes to a specific entity, suggest remediation actions to correct possible detected anomalies (learning from user behavior), identify potential lineage connections between business process entities, propose data control rules, or use the data that can be deduced from reading application log. 
  • Automation. Many common practices can be automated. These include correcting anomalies above a certain confidence threshold, applying rules to certain data types, such as sensitive ones for privacy purposes. For example, a verbalization engine can drastically reduce the time needed for writing up-to-date documentation in compliance with the rules and control procedures while ensuring consistency between Business and Technical rules in the case of an inspection.  

But let us go into further detail considering a practical example. As Gartner demonstrates in the scheme below, in a typical DQ process it is possible to identify in every phase some “actions” that can be automated.

Tipico-ciclo-di-Data-Quality-con-Capabilities-di-Augmented-Data-Quality

All these assumptions demonstrate that an effective and advanced DQ tool, capable of verifying via controls the data’s compliance with the technical and business requirements, should be complemented with a Data Governance tool, or a metadata management system. The latter manages the “identity card” of the company’s data, including all the business (semantics, ownership, impacted processes, quality and retention rules, etc.) and technical entities (formats, originating applications, physical controls, etc.) that characterize them, their attributes and mutual relations. These two components are closely interconnected, just as Data Quality and Data Governance are two inseparable disciplines that reciprocally support each other. The Irion EDM platform offers all the tools for both Data Quality and Data Governance systems integrated in a single environment. It ensures flexibility and scalability, adjusts to the context and is future-proof.

Want to know more?

We will provide you with illustrative examples of how other organizations have already started their transformation.

CONTACT US

Scroll to Top