-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to configure for multi objective optimization #531
base: master
Are you sure you want to change the base?
Conversation
Optimization metrics are those the AutoML framework should optimize for and evaluation metrics are metrics the final model is evaluated on. Optimization metrics will automatically also be used as evaluation metrics, but evaluation metrics may be defined to have additional metrics.
@eddiebergman You are one of the people that requested this feature, how do you feel about these changes? Do they work for you? |
Yes this along the lines of what was requested before. I'm not sure if a warning is enough as typically this gets lost in all the other logs produced and lead to false conclusions when comparing frameworks. I would even lean more towards explicitly raising an error in this case. I'm not sure how to handle this flexibly though. One option is to have a To be fair, as long as the documentation is quite clear on this, it should be okay. One other thing is in terms of parsing the csv, it's a bit easier when things are not kept as a tuple, I believe during our own hack of it, we had something like a column per metric, i.e. |
I believe the actual file still also has per-metric columns (if not, I need to add it). The result and metric columns have two functions here: communicate which metric was actually optimised towards (often some auxiliary metrics are calculated even if they are not optimised for) and having a consistent column with results regardless of optimization metric (so you can refer to a results column regardless of regression, binary or multi class classification). At the very least I think it is good to keep the information about which metrics were optimised towards around in the file is prudent. I am open to suggestions on a format that is easier to work with (I haven't had too many difficulties with tuples myself, though I will admit it is not elegant). |
Though every script only optimizes towards one metric, will ask the framework authors to update accordingly.
Two thing to note on this: Infinite log_lossIf primary If Decision ThresholdsHow are we allowing scoring in the case Which of the following is true:
For example, AutoGluon currently uses threshold 0.5 / max class proba for all metrics (not ideal for 'f1'), but in v0.8 this could change to 2 or 3. Depending on the decision made in this PR for the benchmark logic, this would inform how I would implement it in AutoGluon. |
I am not sure this is a problem,
I am not entirely sure yet. I would propose the AutoML framework 'is expected' to calibrate internally w.r.t. In general, there is a larger unsolved problem here in that when the AutoML framework is tasked with MOO, it will produce a Pareto Front of solutions (well, in many cases with multiple solutions, anyway). It is unclear on how to evaluate this Pareto front. However, I still think it is useful to take the first step here. It makes for a much easier starting point when people (researchers) want to experiment with MOO. |
If you do plan to evaluate metrics that are sensitive to decision threshold (f1, balanced_accuracy) please let me know which ones. I may try to add adaptive decision thresholds in AutoGluon v0.8, and would prioritize if it is part of the upcoming benchmark evaluation. (Without threshold adjustment AutoGluon would do poorly in metrics such as |
The upcoming evaluation will stick with the same evaluation metrics: |
We remove the the `metric` column is now updated to be `optimization_metrics`, which is a character-separated field listing each metric. We removed the result column, since it would otherwise contain tuples, and the data is duplicate with the individual metric columns (except for the convenience of having an 'higher is better' column, which we now lose).
Just pushed some more changes, I think this is pretty much where I would leave it for this PR. The new result file and summary drop the
For the minimal overview printed to console, only those metrics which were optimized toward for any of the shown asks are visible. The loss of the result column is slightly annoying, as it always had a "higher is better" score and was named identically regardless of task. But it didn't really work too well with multi-objective optimization (as cells would contain tuples). Note that It requires slightly more post-processing to visualize the results now, but isn't much compared to the overall work to produce meaningful tests/plots. We can add a small module that does most of the work. |
Alternative idea: Only report (or additionally report) the I'd hate to revert back to requiring bug-prone manual knowledge of if a result is higher-is-better. I'd also generally prefer that the I would be ok with removing the result column if we convert to using An alternative is having a function / logic in AutoMLBenchmark to convert the new results format to the old format and/or the old format to the new format, to reduce bugs when comparing results obtained with the old format (such as the AMLB 2022 results) with those obtained in the new format. |
FWIW, I think the original logic mentioned in the PR description that used tuples for |
Previously tasks could have multiple metrics defined, e.g.
metric: [acc, balacc, logloss]
, but this was interpreted as "optimize towards the first element, and evaluate results on all metrics". This PR instead moves to a more explicit model:each task has now has
optimization_metrics
andevaluation_metrics
.The
optimization_metrics
define which metrics should be forwarded to the AutoML framework to be used during optimization. If an AutoML framework does not support multi-objective optimization, the integration script should issue a warning but proceed with single objective optimization towards the first metric in the list.The
evaluation_metrics
define any additional metrics which should be calculated on the produced predictions. The model will always also be evaluated onoptimization_metrics
(there is no need to put a metric in both lists).evaluation_metrics
are optional.The score summary will now contain tuples under the
result
andmetric
columns.TODO: