This post has already been read 239 times!
So you need to perform simple Extract Transform Load (ETL) from different databases to a data warehouse to perform some data aggregation for business intelligence. There are alot of ETL products out there which you felt is overkilled for your simple use case.
Here I am going to walk you through on how to Extract data from mysql, sql-server and firebird, Transform the data and Load them into sql-server (data warehouse) using python 3.6.
Before we begin, let’s setup our project directory:
python_etl |__main.py |__db_credentials.py |__variables.py |__sql_queries.py |__etl.py
Aside from having
sql-server, mysql and firebird installed, we also need the below 3 python modules, run
pip install [module_name] and we are ready to go.
Setup Database Credentials and Variables
Setup a variable to store the data warehouse database name in
datawarehouse_name = 'your_datawarehouse_name' Setup all your source databases and target database connection strings and credentials in your
db_credentials.pyas shown below. Store the config as
listso we can iterate through many databases later.
sql_queries.py, here is where you will store all your sql queries for extracting from source databases and loading into your target database (data warehouse)
As we are dealing with different data platforms, we can use different syntax for each data platform by separate the queries according to the database type.
Extract Transform Load
etl.py import the following python modules and variables to get started.
Here we will have two methods,
etl_process() is the method to establish database source connection according to the database platform, and call the
etl() method, first it will run the extract query, store the sql data in the variable
data, and insert it into target database which is your data warehouse. Transformation of data can be done by manipulating the
data variable which is of type
Putting it All Together
main.py we can loop through all credentials and perform the etl for all databases.
Import all relevant variables and methods:
# variables from db_credentials import datawarehouse_db_config, sqlserver_db_config, mysql_db_config, fbd_db_config from sql_queries import fbd_queries, sqlserver_queries, mysql_queries from variables import *
# methods from etl import etl_process
In your terminal hit
python main.py and voila, you have just build a etl using pure python script.