If you don’t have a metadata management program or initiative in place, there are a lot of reasons why you should consider investing into it. In fact, the following 4 main roles of metadata should provide you with a glimpse into the benefits of managing it:
1. Classification
Data can have a lot of characteristics by which it can be grouped or classified. Why do you care? Because having these categories will allow you to organize it and manage it. Data can be classified by each and any of the following:
- Subject – Ex: financial data, student data, fundraising data, health data, product data, etc.
- Usage – Ex: transactional, analytical, regulatory, etc.
- Time – Ex: live and current data, historical, predictive, etc.
- Content – Ex: geo-spatial data, machine data, structured vs unstructured data, etc.
- Scope – Ex: enterprise, external, departmental, master data, etc.
Managing data based on classification or groups allows you to apply the same standards, procedures and processes, as well as data stewards and owners. Though you can have the same data falling under multiple groups which adds another layer of complexity into how it should be managed. For example you can have the same metadata indicating it’s transactional, that falls under GDPR, it’s enterprise wide, as well as live and unstructured health data. Usually these groups can be placed within a hierarchy to determine which classification should take precedence over others.
2. Description
Describing the data helps you understand both of its logical and physical aspects. Described data should include:
- Data meaning – Business definitions, data modeling entities and attributes
- Data structure – Description of data objects (entities, tables, records, etc.), their logical groupings and relationships
- Data content – The types of data such as date, currency, text, number, etc.
- Data values – What values are allowed, what reference data is available, what patterns or value ranges should it follow, what constraints should it meet, etc.
- Data lineage – What is the data source, how was the data created, derived, and/or calculated, how was it transformed, etc.
Without this description you will be treading water in collecting data, integrating it with the internal or external systems, maintaining it, or deriving useful information out of it.
3. Guidance
Metatada can serve as a guide to any technical or business user to find the data they need, through search engines, or other processes. This guidance metadata can be comprised of:
- Keywords – This could be any metadata described so far
- Taxonomies – Yet another example of how classification helps
- Date/ time stamps – Usually automatically added at the table or row level
- Associated reports, processes, people – Knowing where data surfaces, who the data users or data stewards are, how data is captured and transformed could serve as a good starting point for finding what you need
- Synonyms, aliases, related terms
Providing your stakeholders with the guidance to find the data they need for reporting, analyzing, testing, prototyping, troubleshooting, etc, saves time and makes better use of available resources.
4. Control
Metadata can provide the necessary knowledge to figure out what controls should be enforced upon the data and what data should be controlled. It enforces constraints due to:
- Regulatory compliance & internal policies
- Retention & archival
- Privacy & security
- Service levels & business requirements
- Technical requirements
These controls help ensure compliance with internal and external rules and regulations, policies, and business and technical requirements.
Conclusion
These metadata items are not mutually exclusive. From the examples above you might have already identified how taxonomy helps with the classification role, as well as providing control and guidance. A single metadata item can serve multiple roles and it is this fact that increases its value.
Are you managing your metadata? What benefits have you found from managing your metadata?
Article originally written by George Firican and published on https://www.lightsondata.com/
About the Author
George Firican is the Director of Data Governance and Business Intelligence at the University of British Columbia, which is ranked among the top 20 public universities in the world. His passion for data led him towards award-winning program implementations in the data governance, data quality, and business intelligence fields. Due to his desire for continuous improvement and knowledge sharing, he founded LightsOnData, a website which offers free templates, definitions, best practices, articles and other useful resources to help with data governance and data management questions and challenges.
He also has over twelve years of project management and business/technical analysis experience in the higher education, fundraising, software and web development, and e-commerce industries.