A Proposed Model for ETL Process and Testing
ETL refers as Extract, Transform and load. Comprehensive testing of a data warehouse at every point throughout the ETL process is becoming increasingly important as more data is being collected and used for strategic decision-making. In ETL process data is extracted from OLTP (Online transaction processing) database, transformed to match the data warehouse schema and loaded into the data warehouse database.
There are mainly three steps in ETL process.
- In the first step, ETL process is mapping the data between source and target database (data warehouse and data mart).
- The second step is to cleansing of source data in staging area.
- The third step is to transforming cleansed source data and then loading into target system.
Why ETL Required: As data sources change, the data warehouse will periodically updated. An ETL system consists of three consecutive functional steps: extraction, transformation and loading:
Extract Transform Load is the process to enable businesses to consolidate their data while moving it from place to place (i.e.) moving data from source systems into the data warehouse. The data can arrive from any source. e.g. XML, Flat file ,Database Excel, Report Data warehouse.
Also, as business changes the Data Warehouse system needs to change – in order to maintain its value as a tool for decision makers, as a result of that the ETL also changes and evolves. The ETL processes must be designed for ease of modification. The ETL processes must be designed for ease of modification. For the success of data warehouse project, ETL process should be well written and documented.
ETL testing is often initiated as a result of mergers and acquisitions, compliance and regulations, data consolidation, and the increased reliance on data-driven decision making. ETL testing is commonly implemented either manually or with the help of a tool (functional testing tool, ETL tool, proprietary utilities).
Extract – It can be defined as extracting the data from source system and make it available for further processing. Transform – Applying the business logic as specified by the business on the data derived from sources. Load – Pumping the data into the final warehouse after completing the above two process.
Terms used in ETL testing:
A database, application, file or other storage facility from which the data in data warehouse is derived.
The definition of the relationship and data flow between source and target objects.
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.
A place where data is processed before entering the data warehouse.
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process. E.g. Making identifiers unique, convert null values into standardized form etc. This is one of the most important step as it ensure quality of data in data warehouse.
The process of manipulating. Any manipulation beyond copying is a transformation. In this process set of rules applies to transform data from source to target. Example include cleansing, aggregating, and integrating data from multiple source.
The process of moving copied or transformed data from source to a data warehouse.
A database, application, file or other storage facility to which the “transformed source data” is loaded in data warehouse.
Posted on July 22, 2014, in Agile Testing, Manual Testing and tagged ETL, ETL Process, ETL testing, Extract Transform Load, Meenakshi Gangwar, Mindfire Solutions, QA Engineer, Sample ETL Process, Terms used in ETL testing, Why ETL Required. Bookmark the permalink. 1 Comment.