It has been designed to look a simple yet professional and a lot less scientific. All be it this is an Interim solution producing Matplotlib time series charts, as well as Seaborn for correlation analysis. I have created functions to accept multiple times series data. This is the more meticulous part of the process, there was more time spent here putting together the visuals. Also, being familiar with what the data represents and the actual data itself will also be an advantage. To caveat my solution to date, you will have to know or search the name of the indicator within Quandl’s database. df = sql_to_df(conn, 'select MULTPL_SHILLER_PE_RATIO_MONTH.Date, MULTPL_SHILLER_PE_RATIO_MONTH.Value as SP500_PE_RATIO, NASDAQOMX_XQC.Index_Value as NASDAQ_Price from MULTPL_SHILLER_PE_RATIO_MONTH INNER JOIN NASDAQOMX_XQC ON MULTPL_SHILLER_PE_RATIO_MONTH.Date=NASDAQOMX_XQC."Trade Date"') Or as below in the next section choose to extract data from my database into a dataframe to plot them through a function. load/ refresh the datasets within the database. In the main function, I usually just include behavioural aspects that I want to perform, e.g. Using the main function to perform ETL then analysis on selected instruments from Quandl. def ETL(table_collection: list): global count list_len=len(table_collection) for pair in table_collection: data = data_extract(pair, freq='daily', key=quandl_api_key) print(".Loading data.**") print(pair+" extracted from API") data = data_tranform(data) print(pair+" transformed") data_load(table_name=pair, data=data, conn=conn) print(pair+" loaded into SQL db") count += 1 output = dict(count=count, list_len=list_len) print("".format(**output)) Main Function The idea behind this function was to allow to iterate over lists of different tables to extract, transform and load into the database. def data_load(table_name: str, data: pd.DataFrame, conn): data.to_sql(name=table_name.replace("/", "_"), con=conn, if_exists='replace') # update, append new values only, not full overwrite mit() Putting the ETL parts together into a single function The commit() function will commit changes to the database. This stage again fairly simple with the module pandas provides base connectivity to SQL databases. def data_tranform(data) -> pd.DataFrame: df = pd.DataFrame(data) df.rename(columns=lambda x: x.replace(" ", "_"), inplace=True) return df Load For sanity reasons, column headers spaces are replaced with underscores prior to loading the data into the SQLite Database. Object returned from Quandl’s API is converted into a pandas Dataframe. The transformation function is fairly simple as the data does not need refined or cleaned for this example. def data_extract(pair: str, freq: str, key: str): _key = key return quandl.get(pair, collapse=freq) Transform After creating an account it is as simple as collecting an API token, in which can be used to access the data. There are 2 parts required in order to create the function to connect to the API. SQLStudioLite Building Blocks - Project imports import pandas as pd import sqlite, quandl import seaborn as sns import matplotlib.pyplot as plt import matplotlib.dates from datetime import date, datetime, timezone import matplotlib.ticker as ticker import matplotlib.patches as patches from matplotlib import gridspec import matplotlib.font_manager as fm import idspec as gridspec import matplotlib.patches as mpatches Extract ![]() ![]() ![]() The structure of my approach project is the following: by creating user defined functions to handle each stage extract, transform, load, database query and charts. All of which is derived from Quandl’s API. Including data from Yale Department of Economics. I will look at a range of data from Federal Reserve Economic Data (FRED) at federal reserve bank of St. Running this locally will help me keep the project alive. Even if you can manage to get free compute and storage, most certainly for a limited period of time or limited capability. This also serves as an alternative to paying for cloud compute resources. However, for this project, I have opted to keep and manage my data locally. This of-course can be achieved with cloud platforms. ![]() I have included some snippets of code to give an idea of how I have pieced it all together. This is a fairly straight forward solution, and is just illustrative of how simple ETL can be achieved with python and the various modules available.
0 Comments
Leave a Reply. |