Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_performance_dictionary doesn't provide the desired metric #48

Open
andres-fr opened this issue Jun 23, 2022 · 1 comment
Open

get_performance_dictionary doesn't provide the desired metric #48

andres-fr opened this issue Jun 23, 2022 · 1 comment
Assignees
Labels
🆕 Status: New New Issue 🐛 Type: Bug Something isn't working

Comments

@andres-fr
Copy link

The interface of the function is

def get_performance_dictionary(
    optimizer_path, mode="most", metric="valid_accuracies", conv_perf_file=None
):

But, despite providing e.g. "valid_accuracies", the function returns sometimes the "test_accuracies".
The explanation can be found in the following line of code (permalink to dev branch):

metric = "test_accuracies" if "test_accuracies" in sett.aggregate else "test_losses"

This line overrides the metric provided by the user in all cases, making it redundant.
A proposed fix is to delete this line, or to remove the metric parameter from the function. I personally think the former is more meaningful, since it provides more flexibility to the end user.

@andres-fr andres-fr added 🆕 Status: New New Issue 🐛 Type: Bug Something isn't working labels Jun 23, 2022
@fsschneider
Copy link
Owner

I think, the issue is a little bit more complicated.

As the docstring says, the metric that is passed to the get_performance_dictionary determines "how to decide the best setting". Currently, this can be influenced by the user, e.g. ranking hyperparameters by valid_accuracies or train_losses.

The line that you quoted, determines which metric is used to determine the "performance". This currently is indeed hard-coded to be either test_accuracies or test_losses. However, this does not affect the ranking, only which metric is used to report performance.

In general, with DeepOBS we very much encourage users to report test_accuracies as performance measures, so hard-coding it, doesn't sound too bad.
If we want to change it, we should have two parameters controlling the behavior of the analysis part. Something like ranking_metric and performance_metric. This would require more thorough changes than just removing the line you quoted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 Status: New New Issue 🐛 Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants