DataOps is a combination of collaborative practices and key data management techniques. It focuses on improving communication, integration and automation of data streams between a company’s data managers and data users.
DataOps is a portmanteau term of “Data” & “Operations”. It originates from the DevOps discipline (Software Development & IT Operations) but adapts it, obviously, to typical data analysis problems.
DataOps arrived on the trend of agile methodologies. It aims at providing new working and collaboration models to produce value for the company faster. DataOps uses the technology to automate data design, management and delivery with appropriate levels of governance. It engages metadata to improve data usability and value in dynamic environments.
How does it work?
In Data & Analytics projects, the longer is the time elapsed between the initial definition and the moment when the solution goes live, the more changes are made to the requirements and their importance. These changes often result in a complete reworking of the data pipeline. This is due to the lack of a common vision of dependencies and artifacts that different roles of the team produce.
Besides, today the volumes of data in companies grow continuously. The DataOps manifesto highlights certain key points that handle these cases and make processes more efficient. These are: collaboration (continuous feedback exchange between stakeholders), data quality, agile-style frequent releases, traceability and Data Governance rules designed to provide data pipeline to the right people, at the right time, from any source.
But let us go into detail. A recent Gartner paper (Introducing DataOps Into Your Data Management Discipline, 31 October 2019, Ted Friedman & Nick Heudecker) recaps the main challenges that Data & Analytics leaders face in implementing and executing modern data-intensive solutions. It also suggests a way to establish a management model that can overcome these challenges. The main characteristics of this paradigm are:
- Increased frequency of releases — customers appreciate quick and fast releases of new features. Besides, it facilitates managing requirement changes both during the development and throughout the solutions’ life cycles;
- Test automation — reducing human involvement in testing (e.g. non-regression testing) allows to speed up releases;
- Metadata management and versioning — the increased frequency and number of releases requires a version control system. Besides, each version of a data-intensive solution expresses variations through metadata. The latter, made available to all the roles participating in the data pipeline, guarantee an effective and common change management;
- Constant monitoring — continuous tracing of the pipeline’s functioning and use. It allows to identify and address malfunctions, as well as opportunities for improving functionality and performance;
- Collaboration between all roles — constant communication between all interested actors based on the metadata available in a collaborative environment is essential for ensuring fast releases of quality components.
Therefore, to enable the DataOps practices you need to:
- be able to extract data from a myriad of sources (wherever they reside, in a cloud or on premise),
- align them in a flexible metadata catalog (where data can be easily accessed, tagged, recorded, enriched and shared),
- automate testing,
- enable continuous monitoring. It will allow organizations to effectively orchestrate their current data management systems, boost their performance and thus facilitate the tasks for Data Engineers, Data Scientists, Data Analyst, Data Stewart, Data Owner, Data User, and others.
All these capabilities are fundamental for valuing the company’s data assets at the pace and with the quality that our age requires. These are the principles that have always guided the evolution of Irion EDM, at the service of our customers.
Want to know more?
We will provide you with illustrative examples of how other organizations have already started their transformation.