Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Tuner #26

Merged
merged 31 commits into from
Jun 13, 2024
Merged

Updated Tuner #26

merged 31 commits into from
Jun 13, 2024

Conversation

klemen1999
Copy link
Collaborator

  • Updated Optuna requrements to a newer version compatible with pytorch-lightning>=2.0
  • Added continue_existing_study: bool = True parameter to the config. If this is set to True (default) then it will use existing study with that name otherwise if name already exists and this is set to False it will throw an error.
  • Added logging to Tuner which gives best study parameters at the end

Copy link

github-actions bot commented May 13, 2024

Test Results

  6 files    6 suites   37m 35s ⏱️
 57 tests  25 ✅  25 💤  7 ❌
342 runs  178 ✅ 150 💤 14 ❌

For more details on these failures, see this check.

Results for commit 88b5eab.

♻️ This comment has been updated with latest results.

Copy link
Collaborator

@tersekmatija tersekmatija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we can nest better than recursively in MLFlow, otherwise looks good.

@kozlov721 kozlov721 added the fix Fixing a bug label May 15, 2024
Copy link
Collaborator

@kozlov721 kozlov721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@klemen1999
Copy link
Collaborator Author

Now tuning runs are nested together under common parent experiment and can be more easily compared in MLFLow. Parent tracker is only used to report hyperparameters that are currently the best in the whole study. CC: @tersekmatija
image
image

**tracker_params,
)
if self.parent_tracker.is_mlflow:
run = self.parent_tracker.experiment["mlflow"].active_run()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will parent run remain active long enough for this to not cause any issues?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this run is active until the very end when whole study finishes. Basically every trial in the study is its own training run (like we would run luxonis_train train...). And every trial gets its own tracker. To nest them together we need one top-level tracker - parent tracker. In the backend mlflow has a stack of all active runs and when nest=True it binds it to the previous run in the stack. So this top level tracker is only used for binding all trials together and logging best hyperparameters in the whole study - for easy of use, to quickly get the results of the study.

)
if self.parent_tracker.is_mlflow:
run = self.parent_tracker.experiment["mlflow"].active_run()
self.parent_run_id = run.info.run_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used somewhere? What is this set to if MLFlow is not used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MLFlow is not used then still this first tracker will report the best hyperparameters but nesting won't be done because it is not supported by e.g. Tensorboard. So every trial will have it's own tensorboard logs with no connection between them. That's why when tuning using MLFlow is recommended.
The whole part in this if block seems like it is not actually needed but if I remove it then nesting is not doen for some reason. Looking into it and I'll update you

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good. Not critical. I think looks good, maybe we just add a comment in code to note this is required for MLFlow to work correctly, then we can merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it out: The way tracker is written it creates the actual mlflow run once it is interacted with. And since in our case we need parent tracker to be created first (to then nest all children runs to that one) we need to kep this line in the code. I added a note to the code.
CC: @kozlov721 I think now we can merge

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to "mlflow" then, or should this be present for other underlying trackers as well? @klemen1999

Copy link
Collaborator Author

@klemen1999 klemen1999 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the case for all trackers. LuxonisTracker is sctructured very similar to other default PL loggers (e.g. MLFlowLogger) where the actual objects are created when .experiment property is first called. This ensures that LuxonisTracker is nicely itegrated with PL but we have to be mindfull of that when using it like this. Although I don't think there will be many cases where one would create just a placeholder tracker, like we are doing for this specific case, so it shouldn't cause too many problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we always call this instead of only on:
if self.parent_tracker.is_mlflow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlflow is the only tracker that this nesting applies to so that is why I'm doing it just for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I think we can proceed with the merge in this case.

@kozlov721 kozlov721 changed the title Fix/tuner Updated Tuner May 16, 2024
@klemen1999
Copy link
Collaborator Author

I've added graceful stop for LuxonisTrackerPL with function finalize() - this is automatically called when PL process is finished. This should remove the need for explicit mlflow.end_run in #30 @kozlov721

Copy link

github-actions bot commented May 23, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4923 3787 77% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
luxonis_train/core/tuner.py 81% 🟢
luxonis_train/utils/config.py 95% 🟢
luxonis_train/utils/tracker.py 50% 🟢
TOTAL 76% 🟢

updated for commit: 6a9bceb by action🐍

CaptainTrojan and others added 20 commits June 2, 2024 00:27
…put_sources, which tells the LuxonisModel which loader sub-elements it wants to load, and LoaderConfig has an images_name parameter which identifies the image-like input among the sub-elements for compatibility with visualizers etc.
…to code structure requirements. Added missing tests for evaluation, export, and inference.
@kozlov721 kozlov721 merged commit bb9b01d into dev Jun 13, 2024
6 checks passed
@kozlov721 kozlov721 deleted the fix/tuner branch June 13, 2024 03:15
@kozlov721 kozlov721 mentioned this pull request Jul 30, 2024
kozlov721 added a commit that referenced this pull request Oct 9, 2024
Co-authored-by: Martin Kozlovsky <[email protected]>
Co-authored-by: Michal Sejak <[email protected]>
Co-authored-by: GitHub Actions <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants