Changes between December, 12th and December, 18th
What's Changed
Empirical Study โ๏ธ
- Add
clnv
results ๐ฏ by @KarelZe in #82. Add results for CLNV method as discussed in the meeting with @CaroGrau . - Add learning curves for CatBoost ๐ by @KarelZe in #83. Helps to detect overfitting/underfitting. Learning curves are now also logged/tracked.
- Improve accuracy [~1.2 %] by @KarelZe in #79. Most of the time was spent on improving the first model's accuracy (gbm).. Planned to improve by 4 %, achieved an improvement of 1.2 % compared to the previous week. Obtaining this improvement required a deep dive into gradient boosting, the catboost library and a bit of feature engineering. Roughly 1/3 of the improvement in accuracy comes from improved feature engineering, 1/3s from early stopping, and 1/3 from larger ensembles/fine-grained quantization/sample weighting. I tried to link quantization found in gradient boosting with quantile transformation from feature engineering, but it didn't work out. Did some sanity checks like comparing implementation with
lightgbm
, time-consistency analysis or updated adversarial validation, - Also, spent quite a bit of time researching on feature engineering techniques, focusing on features that can not be synthesized by neural nets or tree-based approaches.
Writing ๐
- Add reworked TOC and drafts ๐ by @KarelZe in #80 as requested by @CaroGrau.
- Draft for chapters trees, ordered boosting, and imputation๐ฎ by @KarelZe in #81. Continued research and drafting chapters on decision trees, gradient boosting, and feature scaling and imputation. Requires more work e. g., derivations of loss function in gradient boosting for classification was more involved than I expected. The draft is not as streamlined as it could be.
Outlook ๐
- Focus on drafting chapters only on gradient boosting, basic transformer architectures and specialized architectures.
- Train transformers until meeting with @CaroGrau, but spent no time optimizing/improving them.
Full Changelog: v0.2.5...v0.2.6