Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undraft references #75

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions notebooks/noj_book/automl.clj
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,16 @@ ctx-after-train
;; So now we can add more operations to the pipeline,
;; and nothing else changes, for example drop columns.

;; While most metamorph compliant operations behave the same in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit abstract.

How are :fit and :transform different from a certain notion of "fit" and "transform"?

I think this paragraph leaves the reader confused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how to express the concept of the metamorph context in simple words.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the concept of "keeping state", while still have pure functions.
combined with "transformers" which (might !) behave different in the "two passes" over a pipeline,
in "train" and "predict" (= :fit and :transform)

Copy link
Member Author

@behrica behrica Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined with the concept that existing dataset->dataset functions can be "lifted automatically by a macro" to become "context"->"context" functions, so we don't need to rewrite tablecloth to become "metamorh compliant"

This is "not simple"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined with the fact that we can avoid pipelines as well,
or split it into a dataset transformation pipeline (= using a normal threading macro)
and a metamorph pipeline in most cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

;; :fit and :transform, there are some which do behave differently.
;; They have a certain notion of "fit" and "transform".
;;
;; They are therefore called "transformer" and are listed in the
;; "Transformer reference"
;; at the end of the Noj book.
;;
;; Some transformers exist as well as model and can be used with
;; function `ml/model`


;; ## Automatic ML with `metamorph.ml`
Expand Down
54 changes: 52 additions & 2 deletions notebooks/noj_book/ml_basic.clj
Original file line number Diff line number Diff line change
@@ -1,9 +1,59 @@
;; # Machine learning

;; author: Carsten Behring
;; Preface: machine learning models in Noj
;;
;; latest update: 05.10.2024
;; ML models in Noj are available as different plugins to the
;; `metamorph.ml` library.

;; The `metamorph.ml` library itself has no models (except for a linear regression model),
;; but it contains the various functions to "train" and "predict" based on data.

;; Models are available via Clojure wrappers of existing ML libraries.
;; These are currently part of Noj:

^{:kindly/hide-code true
:kindly/kind :kind/hiccup}
(->> [
[ "Tribuo" "scicloj.ml.tribuo"]
[ "Smile" "scicloj.ml.smile"]
[ "Xgboost4J" "scicloj.ml.xgboost"]
[ "scikit-learn" "sklearn-clj"]
]
(map (fn [[library wrapper]]
[:tr
[:td library]
[:td wrapper]
]))
(into [:table [:tr [:th "Library" ] [:th "Clojure Wrapper"]]]))


;; These libraries do not have any functions for the models they contain.
;; `metamorph.ml` has instead of funtcions per model the concept of each model having a
;; unique `key`, the :model-type , which needs to be given when calling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to consistently use backticks for keywords
:model-type, etc.

;;`metamorph.ml/train`
;;
;; The model libraries register their models under these keys, when their main ns
;; is `require`d. (and the model keys get printed on screen when getting registered)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some periods (.) are missing at the end of sentences.

;; So we cannot provide cljdoc for the models, as they do no have corresponding functions.
;;
;; Instead we provide in the the last chapters of the Noj book a complete list
;; of all models (and their keys) incl. the parameters they take with a description.
;; For some models this reference documentation contains as well code examples.
;; This can be used to browse or search for models and their parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice


;; The Tribuo plugin and their models are special in this.
;; It only contains 2 model types a keys,
;; namely :scicloj.ml.tribuo/classification and :scicloj.ml.tribuo/regression.
;; The model as such is encoded in the same ways as the Triuo Java libraries does this,
;; namely as a map of all Tribuo components in place, of which one is the model,
;; the so called "Trainer", always needed and having a certin :type, the model class.
;;
;; The reference documentation therefore lists all "Trainer" and their name incl. parameters
;; It lists as well all other "Configurable" which could be refered to in a component map.



;; ML tutorial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a heading?

;; In this tutorial we will train a simple machine learning model
;; in order to predict the survival of titanic passengers given
;; their data.
Expand Down
18 changes: 8 additions & 10 deletions notebooks/noj_book/sklearn_reference.clj
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,17 @@
[noj-book.utils.render-tools-sklearn]
[scicloj.sklearn-clj.ml]))

;;## Sklearn model reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use a top-level heading like in other chapters?
# and not ##.

This is important for the table of contents to appear correctly on the main page.


;;Below we find all sklearn models with their parameters and the original documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more context here:

  • Link to the ml_basic chapter for background.
  • Explain what sklearn is, link to the sklearn docs, and remind the reader that Noj contains sklearn-clj, which is a Clojure wrapper of Sklearn.

;;
;;The parameters are given as Clojure keys in kebap-case. As the document texts are
;;imported from python they refer to the python spelling of the parameter.
;;
;;But the translation between the two should be obvious.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I suggest the initial paragraphs will be above the ns definition, so that the chapter can begin in a clearer way.


;;## Sklearn model reference - DRAFT 🛠

;;## Example: logistic regression
;;Example: logistic regression

(def ds (dst/tensor->dataset [[0 0 0] [1 1 1] [2 2 2]]))

Expand Down Expand Up @@ -66,14 +72,6 @@



;;Below all models are listed with their parameters and the original documentation.
;;
;;The parameters are given as Clojure keys in kebap-case. As the document texts are
;;imported from python they refer to the python spelling of the parameter.
;;
;;But the translation between the two should be obvious.



;;## :sklearn.classification models
^:kindly/hide-code
Expand Down
10 changes: 9 additions & 1 deletion notebooks/noj_book/smile_classification.clj
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,17 @@



;; ## Smile classification models reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context is needed, as in the Sklearn chapter.
We should explain what Smile is, mention this is Smile v2, etc.

;; In the following we have a list of all model keys of Smile classification models
;; including parameters.
;; They can be used like this:

(comment
(ml/train df
{:model-type <model-key>
:param-1 0
:param-2 1}))

;; ## Smile classification models reference - DRAFT 🛠


(render-key-info :smile.classification)
Expand Down
15 changes: 14 additions & 1 deletion notebooks/noj_book/smile_others.clj
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,20 @@
[scicloj.ml.smile.projections]
[noj-book.utils.render-tools :refer [render-key-info]]))

;; ## Smile other models reference - DRAFT 🛠
;; ## Smile other models reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context is needed, like the above.

;; In the following we have a list of all model keys of Smile model-like
;; algorithms including parameters.
;; They can be used in the same way as other models:
(comment
(ml/train df
{:model-type <model-key>
:param-1 0
:param-2 1}))

;; Some do not support `ml/predict` and are defined as `unsupervised` learners.
;; Clustering and PCA are in this group.


;; ## Smile manifolds

^:kindly/hide-code
Expand Down
11 changes: 10 additions & 1 deletion notebooks/noj_book/smile_regression.clj
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,16 @@
^:kindly/hide-code
(require '[scicloj.ml.smile.regression])

;; ## Smile regression models reference - DRAFT 🛠
;; ## Smile regression models reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context is needed, like the above.

;; In the following we have a list of all model keys of Smile regression models
;; including parameters.
;; They can be used like this:

(comment
(ml/train df
{:model-type <model-key>
:param-1 0
:param-2 1}))

^:kindly/hide-code
(render-key-info :smile.regression)
Expand Down
10 changes: 8 additions & 2 deletions notebooks/noj_book/transformer_references.clj
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
[scicloj.ml.smile.metamorph :as smile-mm]
[scicloj.ml.smile.nlp :as nlp]
[scicloj.ml.smile.projections :as projections]
[scicloj.ml.smile.clustering :as clustering]
[tablecloth.api :as tc]
[tech.v3.dataset :as ds]
[tech.v3.dataset.categorical :as ds-cat]
Expand All @@ -17,6 +18,8 @@
[tech.v3.dataset.print]))




^:kindly/hide-code
(defn docu-fn [v]
(let [m (meta v)]
Expand All @@ -29,7 +32,7 @@
(kind/md "----------------------------------------------------------")]))))


;; ## Transformer reference - DRAFT 🛠
;; ## Transformer reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context is needed, like the above.

We cannot expect the reader to know what Transformers are, even if that was mentioned in a paragraph many chapters ago.

This is especially important, since Transformers have different meanings nowadays.


(docu-fn (var nlp/count-vectorize))

Expand Down Expand Up @@ -378,4 +381,7 @@ data
;; able to predict well the material from the 2 PCA components.

;; It even seems, that the reduction to 2 dimensions removes
;; too much information for predicting of the material for any type of model.
;; too much information for predicting of the material for any type of model.


(docu-fn (var clustering/cluster))
22 changes: 20 additions & 2 deletions notebooks/noj_book/tribuo_reference.clj
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,28 @@
[clojure.java.classpath]
[clojure.reflect]
[scicloj.ml.tribuo]
[noj-book.utils.tribuo-render-tools :refer [trainer-infos all-non-trainer render-configurables]]))
[noj-book.utils.tribuo-render-tools :refer [trainer-infos all-non-trainer render-configurables]]
[scicloj.kindly.v4.kind :as kind]
[scicloj.metamorph.ml :as ml]))


;; ## Tribuo reference - DRAFT 🛠
;; ## Tribuo reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context is needed, like the above.

;;The following is a refeference for all Tribuo trainers.
;; They can be used as the model specification in `ml/train` on the :type
;; of the tribuo trainer
(comment
(ml/train
ds
{:model-type :scicloj.ml.tribuo/classification
:tribuo-components [{:name "random-forest"
:type "org.tribuo.classification.dtree.CARTClassificationTrainer"
:properties {:maxDepth "8"
:useRandomSplitPoints "false"
:fractionFeaturesInSplit "0.5"}}]
:tribuo-trainer-name "random-forest"}))

;; There is as well a reference on all non-trainer compotents of Tribuo.
;; These could potentialy as well be used in Tribuo model specs.

; ### Tribuo trainer reference
^:kindly/hide-code
Expand Down
11 changes: 10 additions & 1 deletion notebooks/noj_book/xgboost.clj
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@
[noj-book.utils.render-tools :refer [render-key-info]]))


;; ## Xgboost model reference - DRAFT 🛠
;; ## Xgboost model reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context is needed, like the above.

What is XGBoost, etc.

;; In the following we have a list of all model keys of Xgboost models
;; including parameters.
;; They can be used like this:
(comment
(ml/train df
{:model-type <model-key>
:param-1 0
:param-2 1}))

^:kindly/hide-code
(render-key-info :xgboost)
Loading