sync readme.md with index.md

KarelZe · Dec 4, 2023 · 599e356 · 599e356
1 parent 1040f8c
commit 599e356
Show file tree

Hide file tree

Showing 3 changed files with 95 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -1,41 +1,100 @@
-![GitHubActions](https://github.com/karelze/tclf//actions/workflows/tests.yaml/badge.svg)
-![Codecov](https://codecov.io/gh/karlze/tclf/branch/master/graph/badge.svg)
+# Trade classification for python 🐍
 
-# tclf 💸
+`tclf` is a [`scikit-learn`](https://scikit-learn.org/stable/)-compatible implementation of trade classification algorithms to classify financial markets transactions into buyer- and seller-initiated trades.
 
-[`scikit-learn`](https://scikit-learn.org/stable/)-compatible implementation of popular trade classification algorithms to classify financial markets transactions into buyer- and seller-initiated trades.
+The key features are:
 
-## Algorithms
+* **Easy**: Easy to use and learn.
+* **Sklearn-compatible**: Compatible to the sklearn API. Use sklearn metrics and visualizations.
+* **Feature complete**: Wide range of supported algorithms. Use the algorithms individually or stack them like LEGO blocks.
 
-- Tick test
-- Quote rule
-- LR algorithm
-- EMO rule
-- CLNV rule
-- Depth rule
-- Tradesize rule
+## Installation
+```console
+$ pip install .
+---> 100%
+Successfully installed tclf-0.0.0
+```
+
+## Minimal Example
 
-## Usage
+Let's start off simple: classify all trades by the quote rule and all other trades, which cannot be classified by the quote rule, randomly.
 
+Create a `main.py` with:
 ```python
->>> X = pd.DataFrame(
-... [
-...     [1.5, 1, 3],
-...     [2.5, 1, 3],
-...     [1.5, 3, 1],
-...     [2.5, 3, 1],
-...     [1, np.nan, 1],
-...     [3, np.nan, np.nan],
-... ],
-... columns=["trade_price", "bid_ex", "ask_ex"],
-... )
->>> y = pd.Series([-1, 1, 1, -1, -1, 1])
->>> clf = ClassicalClassifier(layers=[("quote", "ex")], strategy="const")
->>> clf.fit(X, y)
-ClassicalClassifier(layers=[('quote', 'ex')], strategy='const')
->>> pred = clf.predict_proba(X)
+import numpy as np
+import pandas as pd
+
+from tclf.classical_classifier import ClassicalClassifier
+
+X = pd.DataFrame(
+    [
+        [1.5, 1, 3],
+        [2.5, 1, 3],
+        [1.5, 3, 1],
+        [2.5, 3, 1],
+        [1, np.nan, 1],
+        [3, np.nan, np.nan],
+    ],
+    columns=["trade_price", "bid_ex", "ask_ex"],
+)
+y = pd.Series([1, 1, 1, 1, 1, 1])
+
+clf = ClassicalClassifier(layers=[("quote", "ex")], strategy="random")
+clf.fit(X, y)
+probs = clf.predict_proba(X)
+print(probs)
+```
+Run your script with
+```console
+python main.py
 ```
-A detailled documentation is available [here](https://KarelZe.github.io/tclf/).
+In this example, input data is available as a pd.DataFrame/Series with columns conforming to our [naming conventions](naming_conventions.md).
+
+The parameter `layers=[("quote", "ex")]` sets the quote rule at the exchange level and `strategy="random"` specifies the fallback strategy for unclassified trades. The true label `y` is not used in classification and only for API consistency by convention.
+
+## Advanced Example
+Often it is desirable to classify both on exchange level data and nbbo data. Also, data might only be available as a numpy array. So let's extend the previous example by classifying using the quote rule at exchange level, then at nbbo and all other trades randomly.
+
+```python hl_lines="6  16 17 20"
+import numpy as np
+from sklearn.metrics import accuracy_score
+
+from tclf.classical_classifier import ClassicalClassifier
+
+X = np.array(
+    [
+        [1.5, 1, 3, 2, 2.5],
+        [2.5, 1, 3, 1, 3],
+        [1.5, 3, 1, 1, 3],
+        [2.5, 3, 1, 1, 3],
+        [1, np.nan, 1, 1, 3],
+        [3, np.nan, np.nan, 1, 3],
+    ]
+)
+y_true = np.array([-1, 1, 1, -1, -1, 1])
+features = ["trade_price", "bid_ex", "ask_ex", "bid_best", "ask_best"]
+
+clf = ClassicalClassifier(
+    layers=[("quote", "ex"), ("quote", "best")], strategy="const", features=features
+)
+clf.fit(X, y_true)
+
+y_pred = clf.predict(X)
+print(accuracy_score(y_true, y_pred))
+```
+In this example, input data is available as np.arrays with both exchange (`"ex"`) and nbbo data (`"best"`). We set the layers parameter to `layers=[("quote", "ex"), ("quote", "best")]` to classify trades first on subset `"ex"` and remaining trades on subset `"best"`. Additionally, we have to set `ClassicalClassifier(..., features=features)` to pass column information to the classifier.
+
+Like before, column/feature names must follow our [naming conventions](naming_conventions.md).
+
+## Supported Algorithms
+
+- (Rev.) Tick test
+- Quote rule
+- (Rev.) LR algorithm
+- (Rev.) EMO rule
+- (Rev.) CLNV rule
+- Depth rule
+- Tradesize rule
 
 ## References
 

diff --git a/docs/index.md b/docs/index.md
@@ -1,5 +1,8 @@
 # Trade classification for python 🐍
 
+![GitHubActions](https://github.com/karelze/tclf//actions/workflows/tests.yaml/badge.svg)
+![Codecov](https://codecov.io/gh/karlze/tclf/branch/master/graph/badge.svg)
+
 `tclf` is a [`scikit-learn`](https://scikit-learn.org/stable/)-compatible implementation of trade classification algorithms to classify financial markets transactions into buyer- and seller-initiated trades.
 
 The key features are:
@@ -48,7 +51,7 @@ Run your script with
 ```console
 python main.py
 ```
-In this example, input data is available as a pd.DataFrame/Series with columns conforming to our [naming conventions](naming_conventions.md).
+In this example, input data is available as a pd.DataFrame/Series with columns conforming to our [naming conventions](https://karelze.github.io/tclf/naming_conventions/).
 
 The parameter `layers=[("quote", "ex")]` sets the quote rule at the exchange level and `strategy="random"` specifies the fallback strategy for unclassified trades. The true label `y` is not used in classification and only for API consistency by convention.
 
@@ -84,7 +87,7 @@ print(accuracy_score(y_true, y_pred))
 ```
 In this example, input data is available as np.arrays with both exchange (`"ex"`) and nbbo data (`"best"`). We set the layers parameter to `layers=[("quote", "ex"), ("quote", "best")]` to classify trades first on subset `"ex"` and remaining trades on subset `"best"`. Additionally, we have to set `ClassicalClassifier(..., features=features)` to pass column information to the classifier.
 
-Like before, column/feature names must follow our [naming conventions](naming_conventions.md).
+Like before, column/feature names must follow our [naming conventions](https://karelze.github.io/tclf/naming_conventions/).
 
 ## Supported Algorithms
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -17,6 +17,7 @@ edit_uri: ""
 nav:
   - Home: index.md
   - API reference: reference.md
+  - Naming conventions: naming_conventions.md
 
 markdown_extensions:
   - toc: