You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But, despite providing e.g. "valid_accuracies", the function returns sometimes the "test_accuracies".
The explanation can be found in the following line of code (permalink to dev branch):
This line overrides the metric provided by the user in all cases, making it redundant.
A proposed fix is to delete this line, or to remove the metric parameter from the function. I personally think the former is more meaningful, since it provides more flexibility to the end user.
The text was updated successfully, but these errors were encountered:
I think, the issue is a little bit more complicated.
As the docstring says, the metric that is passed to the get_performance_dictionary determines "how to decide the best setting". Currently, this can be influenced by the user, e.g. ranking hyperparameters by valid_accuracies or train_losses.
The line that you quoted, determines which metric is used to determine the "performance". This currently is indeed hard-coded to be either test_accuracies or test_losses. However, this does not affect the ranking, only which metric is used to report performance.
In general, with DeepOBS we very much encourage users to report test_accuracies as performance measures, so hard-coding it, doesn't sound too bad.
If we want to change it, we should have two parameters controlling the behavior of the analysis part. Something like ranking_metric and performance_metric. This would require more thorough changes than just removing the line you quoted.
The interface of the function is
But, despite providing e.g. "valid_accuracies", the function returns sometimes the "test_accuracies".
The explanation can be found in the following line of code (permalink to dev branch):
DeepOBS/deepobs/analyzer/analyze.py
Line 532 in 9782c0b
This line overrides the metric provided by the user in all cases, making it redundant.
A proposed fix is to delete this line, or to remove the
metric
parameter from the function. I personally think the former is more meaningful, since it provides more flexibility to the end user.The text was updated successfully, but these errors were encountered: