diff --git a/docs/index.html b/docs/index.html index 2c738e7..e546e29 100644 --- a/docs/index.html +++ b/docs/index.html @@ -289,7 +289,12 @@

1 Preface

:git/sha "c7a7240"}

Note we are using git coordinates at the moment, in order to expose a few relevant features of the current underlying libraries, which are unreleased yet.

Status: Most of the underlying libraries are stable. The experimental parts are marked as such. For some of the libraries, we use a branch for an upcoming release. The main current goal is to provide a clear picture of the direction the stack is going towards, expecting most of it to stabilize soon.

-

Near term plan - till the end of October 2024 * Work on stabilizing the upcoming releases of the underlying libraries. * Keep documenting core ideas of the underlying librares and ways to combine them in typical workflows. * Keep making the docs generate automatic tests using kindly/check.

+

Near term plan - till the end of October 2024

+

1.1 Existing chapters in this book:

@@ -444,10 +444,10 @@

_unnamed [4 5]:

-+-+ @@ -462,31 +462,31 @@

- - - - + + + + - - - - + + + + - - - - + + + + - - - - + + + +
:sales24.504666547.543199570.00000000.3078270622.628820648.015864020.000000000.35423251
:youtube13.534205780.021232440.00000000.0015688010.493003830.019081870.000000000.00181853
:facebook4.433857920.041470900.00001960.009353232.893307610.031151800.004476620.01076685
:youtube*facebook18.575050740.000856820.00000000.0000461317.103278540.000928310.000000000.00005428
@@ -497,14 +497,14 @@

(-> evaluations flatten first :test-transform :metric)
-
1.3122729270606561
+
1.097119133397347

\(R^2\)

(-> evaluations flatten first :test-transform :other-metrices first :metric)
-
0.9521743826158947
+
0.9761354566466389

\(RMSE\) and \(R^2\) of the intercation model are sligtly better.

These results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.

diff --git a/docs/noj_book.visualizing_correlation_matrices.html b/docs/noj_book.visualizing_correlation_matrices.html index e97f1bb..daed3e9 100644 --- a/docs/noj_book.visualizing_correlation_matrices.html +++ b/docs/noj_book.visualizing_correlation_matrices.html @@ -546,7 +546,7 @@

Note the slider control and the tooltips.

Here is an example with an actual correlation matrix.

diff --git a/docs/search.json b/docs/search.json index 88b2e97..73b84f1 100644 --- a/docs/search.json +++ b/docs/search.json @@ -4,7 +4,7 @@ "href": "index.html", "title": "Scinojure Documentation", "section": "", - "text": "1 Preface\nScinojure (“Noj”) is an entry point to the Clojure stack for data & science.\nIt collects a few of the main libraries and documents common ways to use them together.\nSource:\nDeps:\nNote we are using git coordinates at the moment, in order to expose a few relevant features of the current underlying libraries, which are unreleased yet.\nStatus: Most of the underlying libraries are stable. The experimental parts are marked as such. For some of the libraries, we use a branch for an upcoming release. The main current goal is to provide a clear picture of the direction the stack is going towards, expecting most of it to stabilize soon.\nNear term plan - till the end of October 2024 * Work on stabilizing the upcoming releases of the underlying libraries. * Keep documenting core ideas of the underlying librares and ways to combine them in typical workflows. * Keep making the docs generate automatic tests using kindly/check.", + "text": "1 Preface\nScinojure (“Noj”) is an entry point to the Clojure stack for data & science.\nIt collects a few of the main libraries and documents common ways to use them together.\nSource:\nDeps:\nNote we are using git coordinates at the moment, in order to expose a few relevant features of the current underlying libraries, which are unreleased yet.\nStatus: Most of the underlying libraries are stable. The experimental parts are marked as such. For some of the libraries, we use a branch for an upcoming release. The main current goal is to provide a clear picture of the direction the stack is going towards, expecting most of it to stabilize soon.\nNear term plan - till the end of October 2024", "crumbs": [ "1  Preface" ] @@ -189,7 +189,7 @@ "href": "noj_book.automl.html#the-metamorph-pipeline-abstraction", "title": "7  AutoML using metamorph pipelines", "section": "", - "text": "(require '[scicloj.metamorph.ml :as ml]\n '[scicloj.metamorph.core :as mm]\n '[tablecloth.api :as tc])\n\n\n\n(def titanic ml-basic/numeric-titanic-data)\n\n\n\n(def splits (first (tc/split->seq titanic)))\n\n\n(def train-ds (:train splits))\n\n\n(def test-ds (:test splits))\n\n\n\n\n(def my-pipeline\n (mm/pipeline\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n\nmy-pipeline\n\n\n#function[clojure.core/partial/fn--5908]\n\n\n\n\n\n(def ctx-after-train\n (my-pipeline {:metamorph/data train-ds\n :metamorph/mode :fit}))\n\n\nctx-after-train\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\nGroup: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"7bfd5db1-572a-4d62-9c5d-c46b508a16fa\" {:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"6797a3d7-1171-44de-a7f4-31d7f3971fda\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\n\n(keys ctx-after-train)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"7bfd5db1-572a-4d62-9c5d-c46b508a16fa\")\n\n\n\n(vals ctx-after-train)\n\n(Group: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n\n:fit\n{:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)},\n :options {:model-type :metamorph.ml/dummy-classifier},\n :id #uuid \"6797a3d7-1171-44de-a7f4-31d7f3971fda\",\n :feature-columns [:sex :pclass :embarked],\n :target-columns [:survived],\n :target-categorical-maps\n {:survived\n {:lookup-table {\"no\" 0, \"yes\" 1},\n :src-column :survived,\n :result-datatype :float64}},\n :scicloj.metamorph.ml/unsupervised? nil}\n)\n\n\n\n(def ctx-after-predict\n (my-pipeline (assoc ctx-after-train\n :metamorph/mode :transform\n :metamorph/data test-ds)))\n\n\nctx-after-predict\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [178 1]:\n\n\n\n:survived\n\n\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n...\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :transform\n\n\n\n\n\n\n\n\n#uuid \"7bfd5db1-572a-4d62-9c5d-c46b508a16fa\"\n\n\n\n{\n\n\n:feature-columns [:sex :pclass :embarked]\n\n\n:target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}\n\n\n:target-columns [:survived]\n\n\n:scicloj.metamorph.ml/unsupervised? nil\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/feature-ds\n\n\n\nGroup: 0 [178 3]:\n\n\n\n:sex\n:pclass\n:embarked\n\n\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n1.0\n1.0\n2.0\n\n\n1.0\n2.0\n2.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n1.0\n2.0\n\n\n1.0\n3.0\n0.0\n\n\n...\n...\n...\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n2.0\n\n\n\n\n\n\n\n\n\n:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}\n\n\n:id #uuid \"6797a3d7-1171-44de-a7f4-31d7f3971fda\"\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/target-ds\n\n\n\nGroup: 0 [178 1]:\n\n\n\n:survived\n\n\n\n\n1.0\n\n\n1.0\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n...\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n\n\n\n\n\n\n\n:options {:model-type :metamorph.ml/dummy-classifier}\n\n\n}\n\n\n\n\n\n}\n\n\n\n(-> ctx-after-predict :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]", + "text": "(require '[scicloj.metamorph.ml :as ml]\n '[scicloj.metamorph.core :as mm]\n '[tablecloth.api :as tc])\n\n\n\n(def titanic ml-basic/numeric-titanic-data)\n\n\n\n(def splits (first (tc/split->seq titanic)))\n\n\n(def train-ds (:train splits))\n\n\n(def test-ds (:test splits))\n\n\n\n\n(def my-pipeline\n (mm/pipeline\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n\nmy-pipeline\n\n\n#function[clojure.core/partial/fn--5908]\n\n\n\n\n\n(def ctx-after-train\n (my-pipeline {:metamorph/data train-ds\n :metamorph/mode :fit}))\n\n\nctx-after-train\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\nGroup: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n2.0\n1.0\n\n\n0.0\n3.0\n2.0\n1.0\n\n\n0.0\n2.0\n2.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n1.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"76ae3036-e5a4-4b10-a26b-73a9db3022f0\" {:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"ba63aa50-8a52-41de-b443-f4724f751c29\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\n\n(keys ctx-after-train)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"76ae3036-e5a4-4b10-a26b-73a9db3022f0\")\n\n\n\n(vals ctx-after-train)\n\n(Group: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n2.0\n1.0\n\n\n0.0\n3.0\n2.0\n1.0\n\n\n0.0\n2.0\n2.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n1.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n\n:fit\n{:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)},\n :options {:model-type :metamorph.ml/dummy-classifier},\n :id #uuid \"ba63aa50-8a52-41de-b443-f4724f751c29\",\n :feature-columns [:sex :pclass :embarked],\n :target-columns [:survived],\n :target-categorical-maps\n {:survived\n {:lookup-table {\"no\" 0, \"yes\" 1},\n :src-column :survived,\n :result-datatype :float64}},\n :scicloj.metamorph.ml/unsupervised? nil}\n)\n\n\n\n(def ctx-after-predict\n (my-pipeline (assoc ctx-after-train\n :metamorph/mode :transform\n :metamorph/data test-ds)))\n\n\nctx-after-predict\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [178 1]:\n\n\n\n:survived\n\n\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n...\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :transform\n\n\n\n\n\n\n\n\n#uuid \"76ae3036-e5a4-4b10-a26b-73a9db3022f0\"\n\n\n\n{\n\n\n:feature-columns [:sex :pclass :embarked]\n\n\n:target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}\n\n\n:target-columns [:survived]\n\n\n:scicloj.metamorph.ml/unsupervised? nil\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/feature-ds\n\n\n\nGroup: 0 [178 3]:\n\n\n\n:sex\n:pclass\n:embarked\n\n\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n1.0\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n3.0\n2.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n...\n...\n...\n\n\n1.0\n1.0\n2.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n0.0\n\n\n0.0\n3.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n1.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n\n:model-data {:majority-class 0.0, :distinct-labels (1.0 0.0)}\n\n\n:id #uuid \"ba63aa50-8a52-41de-b443-f4724f751c29\"\n\n\n\n\n\n\n\n\n\n:scicloj.metamorph.ml/target-ds\n\n\n\nGroup: 0 [178 1]:\n\n\n\n:survived\n\n\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n...\n\n\n1.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n0.0\n\n\n0.0\n\n\n1.0\n\n\n1.0\n\n\n0.0\n\n\n\n\n\n\n\n\n\n:options {:model-type :metamorph.ml/dummy-classifier}\n\n\n}\n\n\n\n\n\n}\n\n\n\n(-> ctx-after-predict :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]", "crumbs": [ "Tutorials", "7  AutoML using metamorph pipelines" @@ -200,7 +200,7 @@ "href": "noj_book.automl.html#use-metamorph-pipelines-to-do-model-training-with-higher-level-api", "title": "7  AutoML using metamorph pipelines", "section": "7.2 Use metamorph pipelines to do model training with higher level API", - "text": "7.2 Use metamorph pipelines to do model training with higher level API\nAs user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this\nThe following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.\nIt uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.\nThe ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .\n\n(def train-ctx\n (mm/fit titanic\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n(The dummy-classifier model does not have a lot of state, so there is little to see)\n\ntrain-ctx\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n2.0\n2.0\n1.0\n\n\n...\n...\n...\n...\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"7e92dd57-4cff-4f7b-a32a-889aef90f94e\" {:model-data {:majority-class 1, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"c6c83a0e-71d3-4663-be2c-50a036e2a556\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\nTo show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.\nwe can already chain train and test with usual functions:\n\n(->>\n (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})\n (ml/predict test-ds)\n :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]\n\nthe same with pipelines\n\n(def pipeline\n (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n(->>\n (mm/fit-pipe train-ds pipeline)\n (mm/transform-pipe test-ds pipeline)\n :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]", + "text": "7.2 Use metamorph pipelines to do model training with higher level API\nAs user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this\nThe following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.\nIt uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.\nThe ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .\n\n(def train-ctx\n (mm/fit titanic\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n(The dummy-classifier model does not have a lot of state, so there is little to see)\n\ntrain-ctx\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n2.0\n2.0\n1.0\n\n\n...\n...\n...\n...\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"358900d7-7db0-4859-ad38-b7e20e13b138\" {:model-data {:majority-class 1, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"12433ffd-0ed8-4d0d-b286-f1a62ed6a197\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\nTo show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.\nwe can already chain train and test with usual functions:\n\n(->>\n (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})\n (ml/predict test-ds)\n :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]\n\nthe same with pipelines\n\n(def pipeline\n (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n(->>\n (mm/fit-pipe train-ds pipeline)\n (mm/transform-pipe test-ds pipeline)\n :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000...]", "crumbs": [ "Tutorials", "7  AutoML using metamorph pipelines" @@ -266,7 +266,7 @@ "href": "noj_book.interactions_ols.html#additive-model", "title": "8  Ordinary least squares with interactions", "section": "", - "text": "(def linear-model-config {:model-type :fastmath/ols})\n\n\n(def additive-pipeline\n (mm/pipeline\n {:metamorph/id :model}\n (ml/model linear-model-config)))\n\n\n\n(def evaluations\n (ml/evaluate-pipelines\n [additive-pipeline]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\n\n\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [3 5]:\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n6.86719675\n3.18088388\n2.43049580E-10\n0.46319976\n\n\n:youtube\n26.35671251\n0.04669554\n0.00000000E+00\n0.00177168\n\n\n:facebook\n18.09196597\n0.18932568\n0.00000000E+00\n0.01046463\n\n\n\n\n\n\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.7317173303241642\n\n\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.9163376341056388", + "text": "(def linear-model-config {:model-type :fastmath/ols})\n\n\n(def additive-pipeline\n (mm/pipeline\n {:metamorph/id :model}\n (ml/model linear-model-config)))\n\n\n\n(def evaluations\n (ml/evaluate-pipelines\n [additive-pipeline]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\n\n\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [3 5]:\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n7.48297302\n3.37089670\n9.68780611E-12\n0.45047559\n\n\n:youtube\n25.13895997\n0.04597775\n0.00000000E+00\n0.00182894\n\n\n:facebook\n18.79068352\n0.19490296\n0.00000000E+00\n0.01037232\n\n\n\n\n\n\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.7909556496625239\n\n\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.9041582942916887", "crumbs": [ "Tutorials", "8  Ordinary least squares with interactions" @@ -277,7 +277,7 @@ "href": "noj_book.interactions_ols.html#interaction-effects", "title": "8  Ordinary least squares with interactions", "section": "8.2 Interaction effects", - "text": "8.2 Interaction effects\nNow we add interaction effects to it, resulting in this model equation: \\[sales = b0 + b1 * youtube + b2 * facebook + b3 * (youtube * facebook)\\]\n\n(def pipe-interaction\n (mm/pipeline\n (tcpipe/add-column :youtube*facebook (fn [ds] (tcc/* (ds :youtube) (ds :facebook))))\n {:metamorph/id :model} (ml/model linear-model-config)))\n\nAgain we evaluate the model,\n\n(def evaluations\n (ml/evaluate-pipelines\n [pipe-interaction]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\nand print it and the performance metrics:\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [4 5]:\n\n\n\n\n\n\n\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n24.50466654\n7.54319957\n0.0000000\n0.30782706\n\n\n:youtube\n13.53420578\n0.02123244\n0.0000000\n0.00156880\n\n\n:facebook\n4.43385792\n0.04147090\n0.0000196\n0.00935323\n\n\n:youtube*facebook\n18.57505074\n0.00085682\n0.0000000\n0.00004613\n\n\n\n\nAs the multiplcation of youtube*facebook is as well statistically relevant, it suggests that there is indeed an interaction between these 2 predictor variables youtube and facebook.\n\\(RMSE\\)\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.3122729270606561\n\n\\(R^2\\)\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.9521743826158947\n\n\\(RMSE\\) and \\(R^2\\) of the intercation model are sligtly better.\nThese results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.\n\nsource: notebooks/noj_book/interactions_ols.clj", + "text": "8.2 Interaction effects\nNow we add interaction effects to it, resulting in this model equation: \\[sales = b0 + b1 * youtube + b2 * facebook + b3 * (youtube * facebook)\\]\n\n(def pipe-interaction\n (mm/pipeline\n (tcpipe/add-column :youtube*facebook (fn [ds] (tcc/* (ds :youtube) (ds :facebook))))\n {:metamorph/id :model} (ml/model linear-model-config)))\n\nAgain we evaluate the model,\n\n(def evaluations\n (ml/evaluate-pipelines\n [pipe-interaction]\n (tc/split->seq preprocessed-data :holdout)\n loss/rmse\n :loss\n {:other-metrices [{:name :r2\n :metric-fn fmstats/r2-determination}]}))\n\nand print it and the performance metrics:\n\n(-> evaluations flatten first :fit-ctx :model ml/tidy)\n\n\n_unnamed [4 5]:\n\n\n\n\n\n\n\n\n\n\n:term\n:statistic\n:estimate\n:p.value\n:std.error\n\n\n\n\n:sales\n22.62882064\n8.01586402\n0.00000000\n0.35423251\n\n\n:youtube\n10.49300383\n0.01908187\n0.00000000\n0.00181853\n\n\n:facebook\n2.89330761\n0.03115180\n0.00447662\n0.01076685\n\n\n:youtube*facebook\n17.10327854\n0.00092831\n0.00000000\n0.00005428\n\n\n\n\nAs the multiplcation of youtube*facebook is as well statistically relevant, it suggests that there is indeed an interaction between these 2 predictor variables youtube and facebook.\n\\(RMSE\\)\n\n(-> evaluations flatten first :test-transform :metric)\n\n\n1.097119133397347\n\n\\(R^2\\)\n\n(-> evaluations flatten first :test-transform :other-metrices first :metric)\n\n\n0.9761354566466389\n\n\\(RMSE\\) and \\(R^2\\) of the intercation model are sligtly better.\nThese results suggest that the model with the interaction term is better than the model that contains only main effects. So, for this specific data, we should go for the model with the interaction model.\n\nsource: notebooks/noj_book/interactions_ols.clj", "crumbs": [ "Tutorials", "8  Ordinary least squares with interactions" diff --git a/test/noj_book/automl_generated_test.clj b/test/noj_book/automl_generated_test.clj index ab7f9c7..0e593ec 100644 --- a/test/noj_book/automl_generated_test.clj +++ b/test/noj_book/automl_generated_test.clj @@ -218,14 +218,14 @@ flatten (map (fn* - [p1__65397#] + [p1__65827#] (hash-map :options - (-> p1__65397# :test-transform :ctx :model :options) + (-> p1__65827# :test-transform :ctx :model :options) :used-features - (-> p1__65397# :fit-ctx :used-features) + (-> p1__65827# :fit-ctx :used-features) :mean-accuracy - (-> p1__65397# :test-transform :mean)))) + (-> p1__65827# :test-transform :mean)))) tc/dataset))) @@ -374,9 +374,9 @@ test70 (is ((fn* - [p1__65398#] + [p1__65828#] (-> - p1__65398# + p1__65828# tc/rows (= [[[:sex :pclass :embarked] diff --git a/test/noj_book/ml_basic_generated_test.clj b/test/noj_book/ml_basic_generated_test.clj index 4cb906a..b523c07 100644 --- a/test/noj_book/ml_basic_generated_test.clj +++ b/test/noj_book/ml_basic_generated_test.clj @@ -71,12 +71,12 @@ var17 (map (fn* - [p1__65265#] + [p1__65695#] (hash-map :col-name - p1__65265# + p1__65695# :values - (distinct (get titanic p1__65265#)))) + (distinct (get titanic p1__65695#)))) categorical-feature-columns))