-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undraft references #75
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,59 @@ | ||
;; # Machine learning | ||
|
||
;; author: Carsten Behring | ||
;; Preface: machine learning models in Noj | ||
;; | ||
;; latest update: 05.10.2024 | ||
;; ML models in Noj are available as different plugins to the | ||
;; `metamorph.ml` library. | ||
|
||
;; The `metamorph.ml` library itself has no models (except for a linear regression model), | ||
;; but it contains the various functions to "train" and "predict" based on data. | ||
|
||
;; Models are available via Clojure wrappers of existing ML libraries. | ||
;; These are currently part of Noj: | ||
|
||
^{:kindly/hide-code true | ||
:kindly/kind :kind/hiccup} | ||
(->> [ | ||
[ "Tribuo" "scicloj.ml.tribuo"] | ||
[ "Smile" "scicloj.ml.smile"] | ||
[ "Xgboost4J" "scicloj.ml.xgboost"] | ||
[ "scikit-learn" "sklearn-clj"] | ||
] | ||
(map (fn [[library wrapper]] | ||
[:tr | ||
[:td library] | ||
[:td wrapper] | ||
])) | ||
(into [:table [:tr [:th "Library" ] [:th "Clojure Wrapper"]]])) | ||
|
||
|
||
;; These libraries do not have any functions for the models they contain. | ||
;; `metamorph.ml` has instead of funtcions per model the concept of each model having a | ||
;; unique `key`, the :model-type , which needs to be given when calling | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to consistently use backticks for keywords |
||
;;`metamorph.ml/train` | ||
;; | ||
;; The model libraries register their models under these keys, when their main ns | ||
;; is `require`d. (and the model keys get printed on screen when getting registered) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some periods (.) are missing at the end of sentences. |
||
;; So we cannot provide cljdoc for the models, as they do no have corresponding functions. | ||
;; | ||
;; Instead we provide in the the last chapters of the Noj book a complete list | ||
;; of all models (and their keys) incl. the parameters they take with a description. | ||
;; For some models this reference documentation contains as well code examples. | ||
;; This can be used to browse or search for models and their parameters. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very nice |
||
|
||
;; The Tribuo plugin and their models are special in this. | ||
;; It only contains 2 model types a keys, | ||
;; namely :scicloj.ml.tribuo/classification and :scicloj.ml.tribuo/regression. | ||
;; The model as such is encoded in the same ways as the Triuo Java libraries does this, | ||
;; namely as a map of all Tribuo components in place, of which one is the model, | ||
;; the so called "Trainer", always needed and having a certin :type, the model class. | ||
;; | ||
;; The reference documentation therefore lists all "Trainer" and their name incl. parameters | ||
;; It lists as well all other "Configurable" which could be refered to in a component map. | ||
|
||
|
||
|
||
;; ML tutorial | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be a heading? |
||
;; In this tutorial we will train a simple machine learning model | ||
;; in order to predict the survival of titanic passengers given | ||
;; their data. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,11 +10,17 @@ | |
[noj-book.utils.render-tools-sklearn] | ||
[scicloj.sklearn-clj.ml])) | ||
|
||
;;## Sklearn model reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you use a top-level heading like in other chapters? This is important for the table of contents to appear correctly on the main page. |
||
|
||
;;Below we find all sklearn models with their parameters and the original documentation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need more context here:
|
||
;; | ||
;;The parameters are given as Clojure keys in kebap-case. As the document texts are | ||
;;imported from python they refer to the python spelling of the parameter. | ||
;; | ||
;;But the translation between the two should be obvious. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I suggest the initial paragraphs will be above the |
||
|
||
;;## Sklearn model reference - DRAFT 🛠 | ||
|
||
;;## Example: logistic regression | ||
;;Example: logistic regression | ||
|
||
(def ds (dst/tensor->dataset [[0 0 0] [1 1 1] [2 2 2]])) | ||
|
||
|
@@ -66,14 +72,6 @@ | |
|
||
|
||
|
||
;;Below all models are listed with their parameters and the original documentation. | ||
;; | ||
;;The parameters are given as Clojure keys in kebap-case. As the document texts are | ||
;;imported from python they refer to the python spelling of the parameter. | ||
;; | ||
;;But the translation between the two should be obvious. | ||
|
||
|
||
|
||
;;## :sklearn.classification models | ||
^:kindly/hide-code | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,9 +12,17 @@ | |
|
||
|
||
|
||
;; ## Smile classification models reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More context is needed, as in the Sklearn chapter. |
||
;; In the following we have a list of all model keys of Smile classification models | ||
;; including parameters. | ||
;; They can be used like this: | ||
|
||
(comment | ||
(ml/train df | ||
{:model-type <model-key> | ||
:param-1 0 | ||
:param-2 1})) | ||
|
||
;; ## Smile classification models reference - DRAFT 🛠 | ||
|
||
|
||
(render-key-info :smile.classification) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,20 @@ | |
[scicloj.ml.smile.projections] | ||
[noj-book.utils.render-tools :refer [render-key-info]])) | ||
|
||
;; ## Smile other models reference - DRAFT 🛠 | ||
;; ## Smile other models reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More context is needed, like the above. |
||
;; In the following we have a list of all model keys of Smile model-like | ||
;; algorithms including parameters. | ||
;; They can be used in the same way as other models: | ||
(comment | ||
(ml/train df | ||
{:model-type <model-key> | ||
:param-1 0 | ||
:param-2 1})) | ||
|
||
;; Some do not support `ml/predict` and are defined as `unsupervised` learners. | ||
;; Clustering and PCA are in this group. | ||
|
||
|
||
;; ## Smile manifolds | ||
|
||
^:kindly/hide-code | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,16 @@ | |
^:kindly/hide-code | ||
(require '[scicloj.ml.smile.regression]) | ||
|
||
;; ## Smile regression models reference - DRAFT 🛠 | ||
;; ## Smile regression models reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More context is needed, like the above. |
||
;; In the following we have a list of all model keys of Smile regression models | ||
;; including parameters. | ||
;; They can be used like this: | ||
|
||
(comment | ||
(ml/train df | ||
{:model-type <model-key> | ||
:param-1 0 | ||
:param-2 1})) | ||
|
||
^:kindly/hide-code | ||
(render-key-info :smile.regression) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
[scicloj.ml.smile.metamorph :as smile-mm] | ||
[scicloj.ml.smile.nlp :as nlp] | ||
[scicloj.ml.smile.projections :as projections] | ||
[scicloj.ml.smile.clustering :as clustering] | ||
[tablecloth.api :as tc] | ||
[tech.v3.dataset :as ds] | ||
[tech.v3.dataset.categorical :as ds-cat] | ||
|
@@ -17,6 +18,8 @@ | |
[tech.v3.dataset.print])) | ||
|
||
|
||
|
||
|
||
^:kindly/hide-code | ||
(defn docu-fn [v] | ||
(let [m (meta v)] | ||
|
@@ -29,7 +32,7 @@ | |
(kind/md "----------------------------------------------------------")])))) | ||
|
||
|
||
;; ## Transformer reference - DRAFT 🛠 | ||
;; ## Transformer reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More context is needed, like the above. We cannot expect the reader to know what Transformers are, even if that was mentioned in a paragraph many chapters ago. This is especially important, since Transformers have different meanings nowadays. |
||
|
||
(docu-fn (var nlp/count-vectorize)) | ||
|
||
|
@@ -378,4 +381,7 @@ data | |
;; able to predict well the material from the 2 PCA components. | ||
|
||
;; It even seems, that the reduction to 2 dimensions removes | ||
;; too much information for predicting of the material for any type of model. | ||
;; too much information for predicting of the material for any type of model. | ||
|
||
|
||
(docu-fn (var clustering/cluster)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,10 +4,28 @@ | |
[clojure.java.classpath] | ||
[clojure.reflect] | ||
[scicloj.ml.tribuo] | ||
[noj-book.utils.tribuo-render-tools :refer [trainer-infos all-non-trainer render-configurables]])) | ||
[noj-book.utils.tribuo-render-tools :refer [trainer-infos all-non-trainer render-configurables]] | ||
[scicloj.kindly.v4.kind :as kind] | ||
[scicloj.metamorph.ml :as ml])) | ||
|
||
|
||
;; ## Tribuo reference - DRAFT 🛠 | ||
;; ## Tribuo reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More context is needed, like the above. |
||
;;The following is a refeference for all Tribuo trainers. | ||
;; They can be used as the model specification in `ml/train` on the :type | ||
;; of the tribuo trainer | ||
(comment | ||
(ml/train | ||
ds | ||
{:model-type :scicloj.ml.tribuo/classification | ||
:tribuo-components [{:name "random-forest" | ||
:type "org.tribuo.classification.dtree.CARTClassificationTrainer" | ||
:properties {:maxDepth "8" | ||
:useRandomSplitPoints "false" | ||
:fractionFeaturesInSplit "0.5"}}] | ||
:tribuo-trainer-name "random-forest"})) | ||
|
||
;; There is as well a reference on all non-trainer compotents of Tribuo. | ||
;; These could potentialy as well be used in Tribuo model specs. | ||
|
||
; ### Tribuo trainer reference | ||
^:kindly/hide-code | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,15 @@ | |
[noj-book.utils.render-tools :refer [render-key-info]])) | ||
|
||
|
||
;; ## Xgboost model reference - DRAFT 🛠 | ||
;; ## Xgboost model reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More context is needed, like the above. What is XGBoost, etc. |
||
;; In the following we have a list of all model keys of Xgboost models | ||
;; including parameters. | ||
;; They can be used like this: | ||
(comment | ||
(ml/train df | ||
{:model-type <model-key> | ||
:param-1 0 | ||
:param-2 1})) | ||
|
||
^:kindly/hide-code | ||
(render-key-info :xgboost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit abstract.
How are :fit and :transform different from a certain notion of "fit" and "transform"?
I think this paragraph leaves the reader confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how to express the concept of the metamorph context in simple words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the concept of "keeping state", while still have pure functions.
combined with "transformers" which (might !) behave different in the "two passes" over a pipeline,
in "train" and "predict" (= :fit and :transform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combined with the concept that existing dataset->dataset functions can be "lifted automatically by a macro" to become "context"->"context" functions, so we don't need to rewrite tablecloth to become "metamorh compliant"
This is "not simple"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combined with the fact that we can avoid pipelines as well,
or split it into a dataset transformation pipeline (= using a normal threading macro)
and a metamorph pipeline in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks