diff --git a/reports/Content/main-summary.tex b/reports/Content/main-summary.tex index b4ed747c..fdff213d 100644 --- a/reports/Content/main-summary.tex +++ b/reports/Content/main-summary.tex @@ -56,9 +56,9 @@ \section{Results} \section{Disucssion} -Advancements in classical trade classification have been fueled by relying more complex decision boundaries, e.g., by fragmenting the spread \autocites{ellisAccuracyTradeClassification2000}{chakrabartyTradeClassificationAlgorithms2007} or by assembling multiple heuristics \autocite{grauerOptionTradeClassification2022}. It is thus likely, that the outperformance of our \gls{ML} estimators is due to the more complex, learned decision boundaries. +Advancements in classical trade classification have been fueled by drawing on more complex decision boundaries, e.g., by fragmenting the spread \autocites{ellisAccuracyTradeClassification2000}{chakrabartyTradeClassificationAlgorithms2007} or by assembling multiple heuristics \autocite{grauerOptionTradeClassification2022}. It is thus likely, that the outperformance of our \gls{ML} estimators is due to the more complex, learned decision boundaries. -The results sharply contradict those of \textcite[][]{ronenMachineLearningTrade2022}, who benchmark random forests and \glspl{FFN} for trade classification in the equity and bond market and find clear dominance of the tree-based approach. First, unlike \gls{FFN}, the FT-Transformer is tailored to learn on tabular data through being a non-rotationally-invariant learner. Second, our data preprocessing and feature engineering are adapted to the requirements of neural networks. Without these measures, tree-based approaches excel due to their robustness in handling skewed and missing data. +The strong results of Transformers sharply contradict those of \textcite[][]{ronenMachineLearningTrade2022}, who benchmark random forests and \glspl{FFN} for trade classification in the equity and bond market and find clear dominance of the tree-based approach. First, unlike \gls{FFN}, the FT-Transformer is tailored to learn on tabular data through being a non-rotationally-invariant learner. Second, our data preprocessing and feature engineering are adapted to the requirements of neural networks. Without these measures, tree-based approaches excel due to their robustness in handling skewed and missing data. An explanation as to why pre-training improves performance on \gls{ISE} but not \gls{CBOE} trades, may be found in the pre-training data and setup. It is conceivable, that pre-training encodes exchange-specific knowledge, such as trading regimes. Trades used for pre-training are recorded at the \gls{ISE} only and are repeatedly shown to the model. While our pre-training objective is stochastic with different features being masked in each step, past research has shown that repeatedly presenting the same tokens in conjunction with a small-sized pre-training dataset, can degrade performance on the downstream classification task. For instance, \textcite[][]{raffelExploringLimitsTransfer2020} document in the context of language modeling that a high degree of repetition encourages memorization in the transformer, but few repetitions are not harmful.