Data Integration: what is it? And what is it for?

Data Integration

According to DAMA, Data Integration “describes the processes related to the movement and consolidation of data within and between data stores, applications and organizations”.

To put it simple, these are all the actions necessary to unify various data sources and create one common vision of a certain process. Most organizations with hundreds or thousands of databases and archives are driven primarily by the need to efficiently manage the transfer of these assets. Yet in the age of digital transformation, efficient data transfer is not enough. It is just as necessary to manage structured (internal or coming from external sources) and unstructured (i.e. from social media) data streams. And these pour from seemingly infinite sources. Data integration consolidates it into a consistent form, physical or virtual, that meets the requirements of “use” by all business applications and processes.

Data integration is fundamental for a variety of reasons:

  • managing, processing, comparing, enriching different types of data with the aim of conducting advanced analysis
  • keeping the data in a way that is secure and compliant with the regulations, in the required format and at the right time
  • reducing the costs and the complexity of solution management, unifying the systems and improving collaboration
  • finding latent patterns and relationships between different sources
  • mapping the Data Lineage
  • data migration or unification of systems in case of mergers.

Data Integration is an essential prerequisite for Data Warehousing, Data Management, Business Intelligence and Big Data management. It used to be normal for IT departments to create data silos, separate for each department. Today, the introduction of Big Data and Cloud imposes the need for a more modern architectural configuration.

Big Data tends to integrate different types of data, including:

  • structured and stored in databases,
  • unstructured text in documents or files,
  • other unstructured types such as audio, video and streaming.

Yet it is now clear that the value of Big Data comes not so much from its volume, but from the correlation between the diversity of sources, types and formats. Heterogeneous data management, data integration, and data governance continue to be challenges that many organizations face on a daily basis, but not always in the optimal way.

How to apply Data Integration?

As mentioned above, many techniques to integrate different data types. The most common in recent decades is the ETL (Extract, Transform, Load) method, while in ELT the last two activities are performed in reverse order to maximize functionality.

There are three phases of ETL:

Phase 1 – Extraction: selecting the required data from one or more sources. The extracted data is then organized in a physical data store.

Phase 2 – Transformation: transforming the data based on a set of rules to fit the data warehouse model or operational needs. Typical examples of transformations include format changes, data concatenation, elimination of null values to avoid possible erroneous results during analysis, or changing the order of data elements or records to fit a set pattern.

Phase 3 – Loading: storing or physically representing the result of the transformations in the target system. There are two different types of loading. One is in batch mode, when the data is completely rewritten and replaces the previous one. The other one is the periodic incremental mode, which identifies what has changed from the previous load and inserts these changes into the data warehouse.

In the course of time, however, this system has shown certain limitations in its application:

  • an increasing complexity of the orchestration of the transformation paths,
  • forcing a detailed description of the process, it does not allow optimizations of the processing, either based on the current data distribution or in response to software improvements,
  • it is not autonomous in terms of functional potential and often has to rely on external support systems,
  • lthe need to proceed with other tools and in uncoordinated ways to define view tables and different support infrastructures,
  • overrunning of costs and implementation time,
  • reduced computing power,
  • increased maintenance and change management costs,
  • the impossibility of parallel and coordinated development and test cycles,
  • the near total impossibility of documenting and tracing processes, with all due respect to the requirements of lineage and repeatability.
  • it repeatedly moves significant volumes of data from staging areas to processing servers and vice versa. Instead of applying the processing logic where the data is stored, it moves gigabytes of data to where the functional transformations are performed.

ELT is an emerging technology designed to overcome the “drawbacks” of ETL. The order of the phases is changed into Extraction, Loading, Transformation. Transformations occur after the data is loaded onto the target system, often as part of the process. In essence, ELT allows to instantiate the original data on the target system as raw data that can be used in other processes. The changes are then made in the target system. This has become more common with the proliferation of Big Data environments where the ELT process loads the Data Lake.

This “phase variance” brings certain benefits. The most important ones are:

  • it quickly analyzes large data pools and requires less maintenance
  • it is a less expensive process, as it requires less time to load the data
  • it facilitates project management as the data is loaded and transformed in smaller batches
  • it uses the same hardware for processing and storage thus reducing the additional hardware cost
  • it can process both semi-structured and unstructured data

What makes Irion EDM a unique platform for effective management of Data Integration projects with large volumes of data?

Irion EDM is not a procedural ETL system. Its declarative approach is “disruptive” towards old and traditional systems. Years of experience in mission critical and data-intensive contexts have led to the development of advanced technologies to overcome their limits:

  • Irion EDM uses the technology called DELT® (Declarative, Extract, Load and Transform) that goes beyond ELT. In fact, besides the reversion of the phases, the entire process is compliant with the declarative model.
  • Irion EDM is a Metadata Driven platform: it uses the power of metadata. Not only can you find, identify, and catalog it using advanced metadata ingestion and translation techniques. It helps you make it even more useful by transforming passive metadata into active.
  • Thanks to the EasT® (Everything as a Table) technology, each dataset used in the processing is virtually displayed as if it were a table (or a set of tables). The platform runs all the necessary transformations implicitly to properly map the data available in any format (CSV, Excel, XML, Cobol, DB, Web Services, API, SAP, etc).
  • Thanks to IsolData® (another proprietary technology), the data processed by the application modules does not persist in the system, but is managed automatically and in a codeless way. Isoldata reflects Irion EDM’s ability to isolate in a special workspace all that is necessary (input, output, temporary data) to run a single processing unit of a solution.
  • Irion EDM platform can connect to a wide range of sources. Hundreds of connectors are available for a variety of information and application structures, old and modern, structured and unstructured, on-premise and multi-cloud. Besides, there is a possibility to develop special connectors thanks to the platform’s powerful built-in functions. All the data available in the different sources is accessible from the modules as virtual tables.
  • Irion EDM automatically coordinates multiple teams working simultaneously on the same project. It is meant for use by business analysts, IT technicians, data officers and has special features for individual roles.
  • There is no need to learn a new language to use Irion EDM. Hands-on experience with SQL or knowledge obtained at school is enough.
  • and much more…

Want to know more?

We will provide you with illustrative examples of how other companies
have already started their transformation.