Data Integration: ETL vs. ELT

December 21, 2010 – 4:06 pm by Subhankar Das

In the last few years, the rapid increase in data volume has forced companies to have a robust and scalable data integration strategy. There are many data integration tools available that support various techniques for integrating data. Before choosing the right tool, the key question that needs to be addressed is “Where will the data integration work be done, inside or outside the data warehouse?”

The choices above are better known as Extract Transform Load (ETL) and Extract Load Transform (ELT) in data warehousing parlance. In this article, we will explore these two approaches and discuss the pros and cons to consider in order to choose the right strategy and tool for your needs.

Option 1: Extract Transform Load – ETL

ETL stands for Extract Transform and Load. The diagram below illustrates the ETL process:

 Picture1

A few years back, ETL was the preferred data integration strategy because databases then did not have the processing power to handle complex transformation. In the case of ETL, data is moved to an intermediate platform where the transformation is done before loading the data into the warehouse. Instead of manually coded ETL systems, advanced ETL tools like Informatica PowerCenter, IBM InfoSphere, and DataStage are leveraged for this transformation.

PROS:

  • No transformation workload is required on the data warehouse server
  • Separate ETL hardware for scaling and load balancing
  • Choice of many third party ETL tools is available
  • ETL systems are independent of source and target systems
  • Modular and structured ETL design with reusable components handling specific operations
  • Row by row processing, leading to easier integration with rules engine and data quality tools
  • ETL tools provide visual IDE with built-in data cleansing and transformation functions along with graphical data mapping and lineage
  • Only the relevant data is included in the extraction process, thereby avoiding transformation of unnecessary data

CONS:

  • Additional hardware investment is needed for ETL engine
  • Extra cost of building ETL system or licensing ETL tool
  • Reduced performance of row-based approach
  • Specialized skills and learning curve required for implementing ETL tool
  • Reduced flexibility due to dependency on ETL tool vendor
  • Data needs to travel across one more layer before it lands into Datamart

Option 2: Extract Load Transform – ELT

ELT stands for Extract Load and Transform. The following diagram illustrates the ELT process:

Picture2
 

With the advent of hardware and new database technology (e.g. mpp systems, columnar db, flash memory), Extract Load Transform (ELT) is now a viable option. ELT uses the native data warehouse tools (e.g. sqlloader, fast load, multi load) to optimize loading, and then structured query language (SQL) is leveraged for the bulk transformation of data. Synposis, one of the data integration tools, is built using ELT architecture, which offers superior performance and scalability compared to traditional ETL products. Even industry standard ETL tools like Informatica added an ELT option in Power Center 8 (Power Center 8 Pushdown Optimization), which suggests that this approach is gaining momentum in the marketplace.

PROS:

  • Better performance leveraging database technology
  • Leverages DBMS engine hardware for scalability
  • Simple transformation specification using SQL
  • Utilizes existing skill set
  • Less network traffic due to data movement
  • Extract and Load processes are isolated from Transform, helping make the process more manageable
  • Data cleansing is done at the staged area, thereby ensuring only the checked data is loaded for transformation
  • Changes to ETL routines involve less cost, time and risk, since the processes are independent

CONS:

  • Limited tools available with full support for ELT
  • Loss of detailed run-time monitoring statistics and data lineage
  • Loss of modularity due to set based design for performance
  • Transformations would utilize Database resources potentially impacting BI-reporting performance

To summarize, both approaches have their own advantages and disadvantages, so a hybrid approach is often suggested to distribute the load between the DBMS server and the ETL server. Hence you may see various combinations of ETL and ELT being used, for example TELT, ETLT, TETLT. Ultimately, you may not need to choose between the two strategies, as most of the major ETL tools vendors have already started providing support for ELT architecture.

Contributors: Prashant Bhavaraju, Sujit Sahoo, Sekar Satagopan

  1. One Response to “Data Integration: ETL vs. ELT”

  2. Hello Subhankar,

    I was referred to your article by the site: westbury-it.com, where I accidentally came across this topic. I just wanted to say that the content is of excellent value.

    Thank you.

    By Information Technology on Mar 22, 2011

Post a Comment