Skip to content

Latest commit

 

History

History
42 lines (36 loc) · 2.44 KB

README.md

File metadata and controls

42 lines (36 loc) · 2.44 KB

Breathe in Breathe out

Repository created for the "Dynamic Neural Assimilation: a Deep Learning and Data Assimilation model for Air Quality predictions." paper. Air quality data used in our calculations can be found here.

Alt text

  • src folder contains all of the source code implementation of the classes seen in the picture.
  • hyperparameter search folder contains parallelised hyperparameter optimization Python program, bash script used for optimization execution and optimization results.
  • industrial data folder contains number of industries in Italy we used from the year 2007, 2010, 2013

Tutorial

Pipeline class is the final workflow developed in the project, however other classes like DNA_regressor, LSTM_regressor, Dataset and etc can still be used separately without being involved in the Pipeline class.

Example of LSTM_regressor training without the Pipeline

#preparing input dataset where df is pandas dataframe of historical modelled values
dataset = Dataset(df, n_input=22, scaled_cols_oth=None)

regressor = LSTM_regressor(dataset, n_units=[40, 40], inp_shape=5)
regressor.fit(epochs=20)
regressor.predict()


Example of Pipeline execution
Pipeline object requires three files as an input - historical observed and modelled values for the LSTM and DNA training, latest observed values for the final data assimilation and industrial data file for the correlation measures. Currently default parameters are chosen to match our dataset. The full pipeline execution then looks something like this:

# LSTM and DNA model configurations
lstm_config = {'seq_length': 13, 'neurons': [40, 40], 'lr': 0.01}
dna_config = {'seq_length': 16, 'neurons': [45, 20], 'lr': 0.01}

#initialisation can also be done with pretrained models instead by passing the models as parameters
pipeline = Pipeline('hist_data.csv', 'obs_data.csv', 'industries.csv', lstm_config, dna_config)

pipeline.train_pipeline(50, 50) #LSTM and DNA model training
pipeline.generate_model_predictions() #LSTM and DNA predictions
pipeline.generate_assimilations() #Optimal Interpolation assimilations
pipeline.plot_corr() #correlation calculation and plotting


Dependencies
Main dependencies are tensorflow, numpy, sklearn, pandas and in case hyperparameter optimization is needed - keras-tuner.