Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adam tv detection #16

Open
wants to merge 47 commits into
base: main
Choose a base branch
from
Open

Adam tv detection #16

wants to merge 47 commits into from

Conversation

adam-peaston-SC
Copy link
Contributor

@adam-peaston-SC adam-peaston-SC commented Sep 6, 2023

tv-detection & tv-segmentation

Description

Commit includes model training scripts for the "tv-detection" and "tv-segmentation" validated configurations sections.

Model / backbone details

MaskRCNN: Resnet101
RetinaNet: Resnet101
Deeplabv3: Mobilenet_v3_large
FCN: ResNet101

Training parameters

MaskRCNN: nodes=11, batch_size=2 (eff. 132), lr=0.02
RetinaNet: nodes=11, batch_size=2 (eff. 132), lr=0.01
Deeplapv3: nodes=11, batch_size=2 (eff. 132), lr=0.01
FCN: nodes=11, batch_size=4, (eff. 264) lr=0.02

Results

Gists of training logs:
MaskRCNN: https://gist.github.com/adam-peaston-SC/e5e5f3dbd1469bf8d7bd0e8d41f471ed
RetinaNet: https://gist.github.com/adam-peaston-SC/5d476bf2beafe9724c3cf21f339110bc
Deeplabv3: https://gist.github.com/adam-peaston-SC/8e09722761712c5f9badf68ec4f8831b
FCN: https://gist.github.com/adam-peaston-SC/730da428954ee6b4299c270946f33bdd

Things done

  • Atomic saving - saving is done atomically to avoid corruption
  • Checkpointing - model can successfully save a checkpoint and resume.
  • Completed a full run

@StrongChris
Copy link
Contributor

Please remove the WIP from the readme next to the seg and detect model listings

@StrongChris
Copy link
Contributor

I ran both of the detection models overnight and they don't appear to be making progress every epoch. Can we please ensure that if progress IS being made, it is being printed out? And if progress is NOT being made, please instrument the startup with printouts of the various steps it needs to go through before making progress and how long they take. e.g. imports, process group setup, model construction, dataset construction, dataloader construction, loading from checkpoint, ddp setup etc.

@StrongChris
Copy link
Contributor

I will review and run the segmentation fixes today too.

@StrongChris StrongChris removed their request for review September 21, 2023 01:42
@StrongFennecs
Copy link
Collaborator

StrongFennecs commented Sep 21, 2023

Hi @adam-peaston-SC can you do a general clean up:

  • delete the forked monai (put in separate repo if you want to fix / apply some patches)
  • delete the lora/hello world examples, or add new PR's for them
  • delete the brats v0 example (you said its just an attempt to get cycling, v2 is proper run)
  • move out the brats v2 example to a new fork and create a PR for that

@StrongFennecs
Copy link
Collaborator

Please delete the logs file.

@StrongFennecs
Copy link
Collaborator

How's this going?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants