diff --git a/docs/option_trade_classification.md b/docs/option_trade_classification.md index 0306004..c30f117 100644 --- a/docs/option_trade_classification.md +++ b/docs/option_trade_classification.md @@ -8,7 +8,7 @@ This tutorial aims to reproduce plots from a working paper by Grauer et al. [-@g There's a lot going on.🥵 -To match the author's description, we first set up `layers`. We use the `tclf` implementation of the [tradesize](https://github.com/KarelZe/tclf/blob/main/src/tclf/classical_classifier.py#L336), [quote](https://github.com/KarelZe/tclf/blob/main/src/tclf/classical_classifier.py#L154), and [depth rule](https://github.com/KarelZe/tclf/blob/main/src/tclf/classical_classifier.py#L362C1-L363C1), as well as [reverse tick test](https://github.com/KarelZe/tclf/blob/main/src/tclf/classical_classifier.py#L137). The subset named "ex" refers to exchange-specific data, "best" to the NBBO and "all" for inter-exchange level data. Identical to the paper, the reverse tick test is applied at the inter-exchange level, due to the devastating results of tick-based algorithms at the exchange level. The authors perform random classification on unclassified trades, hence we choose `strategy="random"`. +To match the author's description, we first set up `layers`. We use the `tclf` implementation of the [tradesize](rules.md#trade-size-rule), [quote](rules.md#quote-rule), and [depth rule](rules.md#depth-rule), as well as [reverse tick test](rules.md#reverse-tick-test). The subset named "ex" refers to exchange-specific data, "best" to the NBBO and "all" for inter-exchange level data. Identical to the paper, the reverse tick test is applied at the inter-exchange level, due to the devastating results of tick-based algorithms at the exchange level. The authors perform random classification on unclassified trades, hence we choose `strategy="random"`. ```python from tclf.classical_classifier import ClassicalClassifier @@ -45,7 +45,7 @@ clf.fit(X) >>> ValueError: Expected to find columns: ['ask_best', 'ask_size_best', 'bid_best', 'bid_size_best', 'trade_price', 'trade_size']. Check the naming/presence of columns. See: https://karelze.github.io/tclf/naming_conventions/ ``` -The calculation of the [depth rule](https://github.com/KarelZe/tclf/blob/main/src/tclf/classical_classifier.py#L362C1-L363C1) requires the columns `ask_{subset}`, `bid_{subset}`, and `trade_price`, as well as `ask_size_{subset}`, `bid_size_{subset}` and `trade_size`. The columns `BEST_ASK`, `BEST_BID`, `TRADE_PRICE`, and `TRADE_SIZE` are renamed to match our naming conventions of `ask_{subset}`, `bid_{subset}`, `trade_price`, and `trade_size`. +The calculation of the [depth rule](rules.md#depth-rule) requires the columns `ask_{subset}`, `bid_{subset}`, and `trade_price`, as well as `ask_size_{subset}`, `bid_size_{subset}` and `trade_size`. The columns `BEST_ASK`, `BEST_BID`, `TRADE_PRICE`, and `TRADE_SIZE` are renamed to match our naming conventions of `ask_{subset}`, `bid_{subset}`, `trade_price`, and `trade_size`. As there is no `{ask/bid}_size_best` at the NBBO level (`subset="best"`), I copy the columns from the trading venue. This allows us to mimic the author's decision to filter for mid-spread at the NBBO level, but classify by the trade size relative to the ask/bid size at the exchange.