Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalising the training module to accomodate varying input #20

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

surbhigoel77
Copy link
Collaborator

@surbhigoel77 surbhigoel77 commented Mar 4, 2024

Closes #19

In order to accomodate data coming from multiple sources, we need to replace the hardcoded inputs with variables whose values are extracted from the dataset during the code run.

Files that need updating:

  1. loaddata.py
  2. Model.py
  3. train.py

In train.py, the .nc data files are read and the variables are extracted and normalised. This part should be taken out of train.py and be kept in a separate preprocessing.py file, to keep train.py common for different data sources. train.py can be converted into a module instead. We can also have a test notebook in the repo.

@surbhigoel77 surbhigoel77 self-assigned this Mar 4, 2024
@surbhigoel77 surbhigoel77 added enhancement New feature or request Python-repo Part of the python NN repo labels Mar 4, 2024
@surbhigoel77 surbhigoel77 marked this pull request as draft March 4, 2024 18:19
@surbhigoel77
Copy link
Collaborator Author

surbhigoel77 commented Mar 4, 2024

In Model.py, the architecture of the NN is defined. The input/output layer of the NN is hardcoded with the number of input/output variables :

  • 93 vertical levels
  • 7 input variables that change with vertical levels
  • 4 input variables that do not change with vertical levels
  • 2 output variables that change with vertical levels

Assuming that the number of inputs would be different for all three sources, we need to generalise the code for Model.py as well (for varying no. of input/output nodes for different sources).

@surbhigoel77
Copy link
Collaborator Author

We can include the normalisation of the variables in the neural network architecture instead of train.py

@jatkinson1000 jatkinson1000 linked an issue Mar 18, 2024 that may be closed by this pull request
@surbhigoel77 surbhigoel77 marked this pull request as ready for review July 30, 2024 09:46
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better @surbhigoel77 with a few comments on there.
There also appear to be a number of conflicts with the main branch, so perhaps a rebase is needed (though I suggest testing this on a new branch.)

The big thing that is missing is documentation on how to use it which would be nice to allow handing over to collaborators for future re-use. Perhaps #14 can help?

The other thing to consider is some CI though this requires out collaborators understanding how to run formatters and linting. And I don;t think there are any tests yet...?
Was there any way to check that the same results are being generated after the refactor?

Comment on lines +96 to +136
# if model is not None:
# # torch.save(model.state_dict(), 'conv_torch.pth')
# torch.save(model.state_dict(), 'trained_models/weights_conv')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dead code, or should it be wrapped in a conditional of some sort, or replaced by a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure as to it's purpose, ask @yqsun91.

Comment on lines +23 to +68
layers = []
input_size = in_ver * ilev + in_nover
for _ in range(hidden_layers):
layers.append(nn.Linear(input_size, hidden_size, dtype=torch.float64))
layers.append(nn.SiLU())
input_size = hidden_size
layers.append(nn.Linear(hidden_size, out_ver * ilev, dtype=torch.float64))
self.linear_stack = nn.Sequential(*layers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief comment to summarise what this code is doing for those used to writing it in the previous layout might be useful here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is looking much better, the net is much cleaner and the docstrings really help understand what is going on.




class EarlyStopper:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately clear to me why this should be a class rather than a function, and whether it belons here with the Model, or if it would be better being moved to train.py with the training loop/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surbhigoel77 agrees this should be a singe function with inputs, and moved to train.py

newCAM_emulation/NN_pred.py Outdated Show resolved Hide resolved
dim_NNout = int(out_ver * ilev)
x_train = np.zeros([dim_NN, Ncol])
y_train = np.zeros([dim_NNout, Ncol])
target_var = ['UTGWSPEC', 'VTGWSPEC']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have made the input names a variable input, but hardcoded the output variables.
Is it worth making both variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this presents a much cleaner overview of the development, with details abstracted away into modules.

for t in range(epochs):
if t % 2 ==0:
print(f"Epoch {t+1}\n-------------------------------")
def train_loop(dataloader, model, loss_fn, optimizer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambiguous name?
This seems to be training a single epoch i.e. a single iteration of the full training loop?

if early_stopper.early_stop(val_loss):
print("BREAK!")
if early_stopper.early_stop(val_loss, model):
# print("BREAK!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code?
Or, more likely, should we be writing some output to tell the user that training has finished because early stopping criteria was met?

Comment on lines +106 to +107
ilev : int
Number of vertical levels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing some docs for input variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python-repo Part of the python NN repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalise the training module
2 participants