What does metadata driven mean in practice?

The word “metadata” evokes in many of us a sense of abstract, diaphanous, evanescent. We know that metadata, properly managed, are powerful tools to make things happen. This concretization of the metadata concept is the basis of a metadata driven functioning model. Applied to data management disciplines, a metadata driven approach consists in piloting the functioning of the data management system through a governance model consisting of a series of entities, attributes, relationships, rules and metadata.


How does it work in practice? If we talk about data quality, we know we can implement a control system by creating a series of programs, each of which will have the objective of performing one or more data tests. Since we do not have information to accompany the checks, we will not be able to understand which checks have been applied to which data and what they do, if not by reading each control’s code or the related documentation. In addition, each control can be implemented with different programming styles, and methods for detecting and recording the results. In a metadata driven approach, the governing and executive components are decoupled. On one hand, a metadata system will describe the characteristics and rules of execution of the controls: what data is involved, the formula or algorithm to be performed, the quality criteria to be verified, the frequency of execution, execution modality, collection and registration of outcomes, and so on.


Based on some of this metadata a data quality engine will perform the checks; other metadata will be used by a system for measuring data quality indicators. These results in significant benefit in terms of standardization of controls, performed by a generalized instrument based on a single metadata toolkit, and overall governing of the control systems.


Nevertheless, this operation model can be generalized for other data management disciplines.  A data discovery engine can look for physical fields that in computer systems represent a tax code or an email address based on discovery rules(ex. regular expressions), that are registered as attributes of the corresponding entities in the metadata system. A data masking engine can apply the most appropriate masking techniques to a table or a database, registered as rules in the metadata system together with the coordinates of the data to be pseudonymized and so on.


Of course, such an approach has rather stringent instrumental requirements. A metadata driven solution must have the two types of components: The metadata system and the implementation engines of data management techniques (masking, profiling, discovery, quality, ingestion, aggregation, reporting , etc…). These components must be connected seamlessly; also because in some cases the engines will generate new metadata (for example the relationship between a “Tax Code” entity and the physical fields that represent it in IT systems) that will enrich the overall data governance model.

Based on this and other principles, we have developed our EDM platform; for a truly useful and future-proof data management and governance.