Skip to content

Latest commit

 

History

History
54 lines (54 loc) · 2.18 KB

2024-10-09-hirzel24a.md

File metadata and controls

54 lines (54 loc) · 2.18 KB
title openreview abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Training and Cross-Validating Machine Learning Pipelines with Limited Memory
4LkaPSHUQQ
While automated machine learning (AutoML) can save human labor in finding well-performing pipelines, it often suffers from two problems: overfitting and using excessive resources. Unfortunately, the solutions are often at odds: cross-validation helps reduce overfitting at the expense of more resources; conversely, preprocessing on a separate compute cluster and then cross-validating only the final predictor saves resources at the expense of more overfitting. This paper shows how to train and cross-validate entire pipelines on a single moderate machine with limited memory by using monoids, which are associative, thus providing a flexible way for handling large data one batch at a time. To facilitate AutoML, our approach is designed to support the common sklearn APIs used by many AutoML systems for pipelines, training, cross-validation, and several operators. Abstracted behind those APIs, our approach uses task graphs to extend the benefits of monoids from operators to pipelines, and provides a multi-backend implementation. Overall, our approach lets users train and cross-validate pipelines on simple and inexpensive compute infrastructure.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
hirzel24a
0
Training and Cross-Validating Machine Learning Pipelines with Limited Memory
13/1
25
13/1-25
13
false
Hirzel, Martin and Kate, Kiran and Mandel, Louis and Shinnar, Avraham
given family
Martin
Hirzel
given family
Kiran
Kate
given family
Louis
Mandel
given family
Avraham
Shinnar
2024-10-09
Proceedings of the Third International Conference on Automated Machine Learning
256
inproceedings
date-parts
2024
10
9