Skip to content

Changes between December, 5th and December, 11th

Compare
Choose a tag to compare
@KarelZe KarelZe released this 11 Dec 16:05
· 298 commits to main since this release
5fac116

What's Changed

Empirical Study βš—οΈ

  • Add implementation and tests for FTTransformer 🦾 by @KarelZe in #74. Adds a tuneable implementation of the FTTransformer from https://arxiv.org/abs/2106.11959. Most of the code is based on the author's code published by Yandex. Wrote additional tests and made the code work with our hyperparameter search.
  • Add implementation and tests for TabNet 🧠 by @KarelZe in #75. TabNet is another transformer-based architecture published in https://arxiv.org/abs/1908.07442 and the last model to be implemented. πŸŽ‰ Code is based on a popular PyTorch implementation. Made it work with our hyperparameter search and training pipeline and wrote additional tests.
  • Add tests for all objectives 🎯 by @KarelZe in #76. All training objectives defining the hyperparameter search space and training procedure now have tests.
  • Add intermediate results of TabTransformer and CatBoostClassifier 🐈 by @KarelZe in #71. Results as discussed in the last meeting with @CaroGrau.
  • Accelerate models with datapipes and torch.compile() πŸš• by @KarelZe in #64. Tested how the new features (datapipes and torch.compile()) could be used in my project. Still to early as discussed in the meeting with @CaroGrau.
  • Make calculations data parallel πŸ›£οΈ by @KarelZe in #77. All models can now be trained on multiple gpus in parallel, which should speed up training considerably. BwHPC provides up to four gpus that we can use. For gradient boosting, features are split among devices. For neural nets batches are split.
  • Add pruning support for Bayesian search πŸ§ͺ by @KarelZe in #78. I added support to prune unsuccessful trials in our Bayesian search. This should help with training and finding better solutions faster. Additional to the loss, the accuracy is also reported for all neural nets. Moreover, I integrated early stopping into the gradient boosting models, which should help to increase the performance. Also widened the hyperparameter search space for gradient boosted trees, which should help to find better solutions. Still have to verify with large studies on the cluster.

Writing πŸ“–

  • Add questions for this week πŸ“ by @KarelZe in #70
  • Connect and expand notes πŸ‘©β€πŸš€ by @KarelZe in #65. Was able to slightly decrease the pile of papers. However, also found several new ones, like the linformer paper (https://arxiv.org/abs/2006.04768).

Other Changes

Outlook πŸ’ͺ

  • Finalize notes on decision trees / gradient boosting. Prepare the first draft.
  • Update table of contents.
  • Go back to eda. Define new features based on papers. Revise existing ones based on kde plots.
  • Create a notebook to study feature transformations / scaling e. g., log transform / robust scaling, systematically.
  • Study learning curves for gradient boosting models and transformers with default configurations. Verify the settings for early stopping.
  • Perform adversarial validation more thoroughly. Answer questions like: which features drive the difference between training and test set? What aspect does time play? What would happen, if problematic features were excluded?
  • Increase test accuracy by 4 %.

Full Changelog: v0.2.4...v0.2.5