From c623385d0c83f5746e69cdb55b656430050ed950 Mon Sep 17 00:00:00 2001 From: jatkinson1000 <109271713+jatkinson1000@users.noreply.github.com> Date: Mon, 19 Feb 2024 17:43:13 +0000 Subject: [PATCH 1/3] Create .gitignore Added the default python .gitignore to the repo to keep repo and git usage clean --- .gitignore | 160 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..68bc17f --- /dev/null +++ b/.gitignore @@ -0,0 +1,160 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +# For a library or package, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# poetry +# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. +# This is especially recommended for binary packages to ensure reproducibility, and is more +# commonly ignored for libraries. +# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control +#poetry.lock + +# pdm +# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. +#pdm.lock +# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it +# in version control. +# https://pdm.fming.dev/#use-with-ide +.pdm.toml + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# PyCharm +# JetBrains specific template is maintained in a separate JetBrains.gitignore that can +# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore +# and can be added to the global gitignore or merged into this file. For a more nuclear +# option (not recommended) you can uncomment the following to ignore the entire idea folder. +#.idea/ From b926a75091e3b072a57a110071299036633e49de Mon Sep 17 00:00:00 2001 From: jatkinson1000 <109271713+jatkinson1000@users.noreply.github.com> Date: Mon, 19 Feb 2024 17:44:46 +0000 Subject: [PATCH 2/3] Update .gitignore with patterns for *venv/ and venv*/ --- .gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.gitignore b/.gitignore index 68bc17f..6b864c8 100644 --- a/.gitignore +++ b/.gitignore @@ -127,6 +127,8 @@ venv/ ENV/ env.bak/ venv.bak/ +venv*/ +*venv/ # Spyder project settings .spyderproject From 8dad9bfe23a979ad889503636643225d21964901 Mon Sep 17 00:00:00 2001 From: Surbhi Goel <62606832+surbhigoel77@users.noreply.github.com> Date: Mon, 18 Mar 2024 16:11:11 +0000 Subject: [PATCH 3/3] Update readme (#14) * Updating README * Updating readme * Updated readme * Rebased of main- resolving conflicts * Added training details * Added info on normalisation of input * Clubbed architecture, dataset, training under model description * Added reference paper name and authors * Added info on batching of input data * Added usage instructions - repo cloning * Added usage instructions - installing packages * Added license * Removed the unused subheadings & moved git repo cloning to installations * Updated the folder structure --------- Co-authored-by: Surbhi Goel --- README.md | 81 +++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 58 insertions(+), 23 deletions(-) diff --git a/README.md b/README.md index 635c061..ee234ad 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,71 @@ -# newCAM-Emulation -This is a DNN written with PyTorch to Emulate the gravity wave drag (GWD, both zonal and meridional ) in the WACCM Simulation. +# Overview +The repository contains the code for a machine learning model that emulates the climatic process of gravity wave drag (GWD, both zonal and meridional). The model is a part of parameterization scheme where smaller and highly dynamical climatic processes are emulated using neural networks. +Gravity waves, also called buyoncy waves are formed due to displacement of air in the atmosphere instigated by differnt physical mechanisms, such as moist convection, orographic lifting, shear unstability etc. These waves can propagate both vertically and horizontally through the lift and drag mechanism respectively. This ML model focuses on the drag component of gravity waves. -# DemoData -Sample output data from CAM. -It is 3D global output from the mid-top CAM model, on the original model grid. - -However, the demo data here is one very small part of the CAM output due to storage limit of Github. NN trained on this Demodata will not work. +The long-term goal of the model is to be coupled with a larger fortran-based numerical weather prediction model called the Mid-top CAM Model (Community Atmospheric Model). +https://www.cesm.ucar.edu/models/cam. # Installing - -Clone this repo and enter it.\ -Then run: -``` -pip install . -``` -to install the neccessary dependencies.\ +1. Change your current working directory to the location where you want to clone the repository + ```bash + git clone git@github.com:DataWaveProject/newCAM_emulation.git + ``` + to clone via ssh, or + ```bash + git clone https://github.com/DataWaveProject/newCAM_emulation.git + ``` + to clone via https +2. Then run below command to install the neccessary dependencies: + ``` + pip install . + ``` It is recommended this is done from inside a virtual environment. -# data loader -load 3D CAM data and reshaping them to the NN input. -# Using a FNN to train and predict the GWD -train.py train the files and generate the weights for NN. +# Model Description + +## Architecture +The machine leaning model is a Feed Forward Neural Network (FFNN) with 10 hidden layers and 500 neurons in +each layer. The activation used at each layer is a Sigmoid Linear Unit (SiLU) activation function. + +## Dataset +The dataset available in the `Demodata` is a sample output data from CAM. It is 3D global output from the mid-top CAM model, on the original model grid. The demo data here is one very small part of the CAM output and is only for demo purpose. + +- Input variables: pressure levels, latitude, longitude + +- Output variables: zonal drag force, meridional drag force + +The data has been split in a ratio of 75:25 into training and validation sets. The input variables have been normalised using mean and standard deviation before feeding them to the model for training. Normalisation allows all the inputs to have similar ranges and distribution, hence preventing variables wiht large numerical scale to dominate the predictions. + +## Training +The model is trained using the script `train.py` using the demo data. The optimiser used is an `Adam` optimiser with a `learning rate` of 0.001. The data is divided into 128 batches for faster training and effcient memory usage and is run on the model for 100 `epochs`. The training comprises of an `early stopping` mechanism that helps prevent overfitting of the model. The loss in making the predictions is quantified in the form of an `MSE` (mean squared error). The + +## Repository Layout +The `Demodata` folder contains the demo data used to train and test the model + +The `newCAM_emulation` folder contains the code that is required to load data, train the model and make predictions which is structured as following: +> `train.py` - train the model + +> `NN-pred.py` - predict the GWD using the trained model + +> `loaddata.py` - load the data and reshape it to the NN input -NN-pred.py load the weights and do prediction. +> `model.py` - define the NN model -# Coupling ? future work -replace original GWD scheme in WACCM with this emulator. +## Usage Instructions +To use the repository, following steps are required: +1. For example, to run the `train.py` script to train the model, run the below command: + ```bash + python3 train.py + ``` -a. the emulator can be trained offline +### Reference Paper: -b. training the emulator online +**Data Imbalance, Uncertainty Quantification, and Generalization via Transfer Learning in Data-driven Parameterizations: Lessons from the Emulation of Gravity Wave Momentum Transport in WACCM.** + *Authors: Y. Qiang Sun and Hamid A. Pahlavan and Ashesh Chattopadhyay and Pedram Hassanzadeh and Sandro W. Lubis and M. Joan Alexander and Edwin Gerber and Aditi Sheshadri and Yifei Guan* +https://arxiv.org/pdf/2311.17078.pdf +### License: +The repository is licensed under MIT License - see the [LICENSE](LICENSE) file for details. \ No newline at end of file