Changes between December, 5th and December, 11th
What's Changed
Empirical Study βοΈ
- Add implementation and tests for
FTTransformer
π¦Ύ by @KarelZe in #74. Adds a tuneable implementation of theFTTransformer
from https://arxiv.org/abs/2106.11959. Most of the code is based on the author's code published by Yandex. Wrote additional tests and made the code work with our hyperparameter search. - Add implementation and tests for
TabNet
π§ by @KarelZe in #75.TabNet
is another transformer-based architecture published in https://arxiv.org/abs/1908.07442 and the last model to be implemented. π Code is based on a popular PyTorch implementation. Made it work with our hyperparameter search and training pipeline and wrote additional tests. - Add tests for all objectives π― by @KarelZe in #76. All training objectives defining the hyperparameter search space and training procedure now have tests.
- Add intermediate results of
TabTransformer
andCatBoostClassifier
π by @KarelZe in #71. Results as discussed in the last meeting with @CaroGrau. - Accelerate models with
datapipes
andtorch.compile()
π by @KarelZe in #64. Tested how the new features (datapipes
andtorch.compile()
) could be used in my project. Still to early as discussed in the meeting with @CaroGrau. - Make calculations data parallel π£οΈ by @KarelZe in #77. All models can now be trained on multiple gpus in parallel, which should speed up training considerably. BwHPC provides up to four gpus that we can use. For gradient boosting, features are split among devices. For neural nets batches are split.
- Add pruning support for Bayesian search π§ͺ by @KarelZe in #78. I added support to prune unsuccessful trials in our Bayesian search. This should help with training and finding better solutions faster. Additional to the loss, the accuracy is also reported for all neural nets. Moreover, I integrated early stopping into the gradient boosting models, which should help to increase the performance. Also widened the hyperparameter search space for gradient boosted trees, which should help to find better solutions. Still have to verify with large studies on the cluster.
Writing π
- Add questions for this week π by @KarelZe in #70
- Connect and expand notes π©βπ by @KarelZe in #65. Was able to slightly decrease the pile of papers. However, also found several new ones, like the linformer paper (https://arxiv.org/abs/2006.04768).
Other Changes
- Bump google-auth from 2.14.1 to 2.15.0 by @dependabot in #66
- Bump fastparquet from 2022.11.0 to 2022.12.0 by @dependabot in #69
Outlook πͺ
- Finalize notes on decision trees / gradient boosting. Prepare the first draft.
- Update table of contents.
- Go back to eda. Define new features based on papers. Revise existing ones based on kde plots.
- Create a notebook to study feature transformations / scaling e. g., log transform / robust scaling, systematically.
- Study learning curves for gradient boosting models and transformers with default configurations. Verify the settings for early stopping.
- Perform adversarial validation more thoroughly. Answer questions like: which features drive the difference between training and test set? What aspect does time play? What would happen, if problematic features were excluded?
- Increase test accuracy by 4 %.
Full Changelog: v0.2.4...v0.2.5