What does metadata driven mean in practice?


In many of us, the word metadata evokes a sense of something abstract, translucent, evanescent. Yet, we know that appropriately managed metadata is a powerful enabling tool. The concept of metadata thus comes into practice on the basis of a metadata-driven operational model. Applied to Data Management disciplines, a metadata-driven approach means driving a Data Management system through a governance model consisting of entities, attributes, relationships, rules, in short, of metadata.


How does it work in practice? If we speak about data quality, we know we can implement a control system by creating a series of programs, each performing a test on data (one or more items). Without any accompanying information to the controls, we won’t figure out which of them have been applied to what data and what they do unless we read the code of each control or its documentation. Besides, each control can be implemented using a different programming style and have different ways of detection and registration of results. In a metadata-driven approach, the governing and the executive components are decoupled. On the one hand, a metadata system describes the characteristics of controls and rules of their execution: the data involved, the formula or algorithm to execute, the quality criterion to verify, the modes and periodicity of execution, result capture and registration, and so on.


Some of this metadata will be the basis for a data quality engine to perform controls. Other metadata will be used by a data quality indicator measurement system. Thus, there are significant advantages in terms of standardizing controls executed by a generalized tool on a single set of metadata and the overall governance of control systems.


This operational model can be generalized to other data management disciplines. A data discovery engine can look for the physical fields representing a taxpayer’s number or an email based on discovery rules (e.g., regular expressions) that are registered as attributes of the corresponding entities in the metadata system. A data masking engine can apply to a table or a database the most fitting masking techniques, registered as rules in the metadata system along with the coordinates of the data to pseudonymize. And so on.


Naturally, this kind of approach has quite strong requirements on the tools. A metadata-driven solution must have two types of components: the metadata system and the engines activating Data Management techniques (masking, profiling, discovery, quality, ingestion, aggregation, reporting , etc.). These components must be seamlessly connected, partly because, in some cases, the engines are to generate new metadata (for example, the relation between a “Taxpayer’s Number” entity and the fields that represent it in computer systems). The new metadata will enrich the overall data governance model.

Based on this and other principles, we have developed our EDM platform; for a truly useful and future-proof data management and governance.