diff --git a/docs/noj_book.automl.html b/docs/noj_book.automl.html index 9dbfd31..1322efd 100644 --- a/docs/noj_book.automl.html +++ b/docs/noj_book.automl.html @@ -279,7 +279,7 @@
author: Carsten Behring
+Author: Carsten Behring
In this tutorial we see how to use metamorph.ml
to perform automatic machine learning. With AutoML we mean to try lots of different models and hyper parameters and rely on automatic validation to pick the best performing model automatically.
ns noj-book.automl
@@ -328,7 +328,7 @@ ( my-pipeline
0x50ea5c9d "clojure.core$partial$fn__5927@50ea5c9d"] #object[clojure.core$partial$fn__5927
0x7597bcc7 "clojure.core$partial$fn__5927@7597bcc7"] #object[clojure.core$partial$fn__5927
This function is metamorph compliant, so it takes a map (my-pipeline {}) and returns a map.
But this map cannot be “arbitrary”, it need to adhere to the metamorph
conventions.
#uuid "2da779d6-e18b-4fd2-aa60-f13ff9134d29" {:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid "4649cb2e-22d9-4b08-bcea-16df5f84911c", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}
}
+:metamorph/mode :fit
#uuid "fc7de623-27ba-49b7-b044-8c5b6c5a0271" {:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid "636ad0f4-4c49-4d42-a887-f947b9a288c5", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {"no" 0, "yes" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}
}
The ctx contains lots of information, so I only show its top level keys
keys ctx-after-train) (
:metamorph/data
(:metamorph/mode
- "2da779d6-e18b-4fd2-aa60-f13ff9134d29") #uuid
This context map has the “data”, the “mode” and an UUID for each operation (we had only one in this pipeline)
{:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)},
:options {:model-type :metamorph.ml/dummy-classifier},
- :id #uuid "4649cb2e-22d9-4b08-bcea-16df5f84911c",
+ :id #uuid "636ad0f4-4c49-4d42-a887-f947b9a288c5",
:feature-columns [:sex :pclass :embarked],
:target-columns [:survived],
:target-categorical-maps
@@ -690,7 +690,7 @@
:metamorph/data
(:metamorph/mode
- "2da779d6-e18b-4fd2-aa60-f13ff9134d29") #uuid
+"fc7de623-27ba-49b7-b044-8c5b6c5a0271") #uuid
For the dummy-model we do not see a trained-model
, but it “communicates” the majority class from the train data to use it for prediction. So the dummy-model
has ‘learned’ the majority class from its training data.
So we can get prediction result out of the ctx:
@@ -723,7 +723,7 @@:metamorph/data
(:metamorph/mode
- "ebaae731-dd9a-44d2-be0b-9b2010f582fc") #uuid
To show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.
We can already chain train and test with usual functions:
@@ -788,19 +788,19 @@(mm/pipeline ops-1)
0x74228714 "clojure.core$partial$fn__5927@74228714"] #object[clojure.core$partial$fn__5927
0x4ab1e66e "clojure.core$partial$fn__5927@4ab1e66e"] #object[clojure.core$partial$fn__5927
-2) (mm/pipeline ops
0x2c5adb25 "clojure.core$partial$fn__5927@2c5adb25"] #object[clojure.core$partial$fn__5927
0x7eee8280 "clojure.core$partial$fn__5927@7eee8280"] #object[clojure.core$partial$fn__5927
-3) (mm/pipeline ops
0x68da9bd6 "clojure.core$partial$fn__5927@68da9bd6"] #object[clojure.core$partial$fn__5927
0x7bdb1918 "clojure.core$partial$fn__5927@7bdb1918"] #object[clojure.core$partial$fn__5927
All three can be called as function taking a dataset iwrapped in a ctx
Pipeline as data is as well supported
@@ -811,7 +811,7 @@(mm/->pipeline op-spec)
0x72cc9983 "clojure.core$partial$fn__5927@72cc9983"] #object[clojure.core$partial$fn__5927
0x79747166 "clojure.core$partial$fn__5927@79747166"] #object[clojure.core$partial$fn__5927
Creating these functions does not yet execute anything, they are functions which can be executed against a context as part of a metamorph pipeline. Executions are triggered like this:
Note the slider control and the tooltips.
Here is an example with an actual correlation matrix.
diff --git a/docs/search.json b/docs/search.json index 0e1f3f2..69f5c45 100644 --- a/docs/search.json +++ b/docs/search.json @@ -200,7 +200,7 @@ "href": "noj_book.automl.html#the-metamorph-pipeline-abstraction", "title": "8 AutoML using metamorph pipelines", "section": "", - "text": "(require '[scicloj.metamorph.ml :as ml]\n '[scicloj.metamorph.core :as mm]\n '[tablecloth.api :as tc])\n\n\n\n(def titanic ml-basic/numeric-titanic-data)\n\n\n\n(def splits (first (tc/split->seq titanic)))\n\n\n(def train-ds (:train splits))\n\n\n(def test-ds (:test splits))\n\n\n\n\n(def my-pipeline\n (mm/pipeline\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n\nmy-pipeline\n\n\n#object[clojure.core$partial$fn__5927 0x50ea5c9d \"clojure.core$partial$fn__5927@50ea5c9d\"]\n\n\n\n\n\n(def ctx-after-train\n (my-pipeline {:metamorph/data train-ds\n :metamorph/mode :fit}))\n\n\nctx-after-train\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\nGroup: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"2da779d6-e18b-4fd2-aa60-f13ff9134d29\" {:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"4649cb2e-22d9-4b08-bcea-16df5f84911c\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\n\n\n(keys ctx-after-train)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"2da779d6-e18b-4fd2-aa60-f13ff9134d29\")\n\n\n\n(vals ctx-after-train)\n\n(Group: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n1.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n0.0\n1.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n1.0\n2.0\n1.0\n\n\n0.0\n1.0\n2.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n\n:fit\n{:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)},\n :options {:model-type :metamorph.ml/dummy-classifier},\n :id #uuid \"4649cb2e-22d9-4b08-bcea-16df5f84911c\",\n :feature-columns [:sex :pclass :embarked],\n :target-columns [:survived],\n :target-categorical-maps\n {:survived\n {:lookup-table {\"no\" 0, \"yes\" 1},\n :src-column :survived,\n :result-datatype :float64}},\n :scicloj.metamorph.ml/unsupervised? nil}\n)\n\n\n\n(def ctx-after-predict\n (my-pipeline (assoc ctx-after-train\n :metamorph/mode :transform\n :metamorph/data test-ds)))\n\n\n(keys ctx-after-predict)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"2da779d6-e18b-4fd2-aa60-f13ff9134d29\")\n\n\n\n\n(-> ctx-after-predict :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]", + "text": "(require '[scicloj.metamorph.ml :as ml]\n '[scicloj.metamorph.core :as mm]\n '[tablecloth.api :as tc])\n\n\n\n(def titanic ml-basic/numeric-titanic-data)\n\n\n\n(def splits (first (tc/split->seq titanic)))\n\n\n(def train-ds (:train splits))\n\n\n(def test-ds (:test splits))\n\n\n\n\n(def my-pipeline\n (mm/pipeline\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n\nmy-pipeline\n\n\n#object[clojure.core$partial$fn__5927 0x7597bcc7 \"clojure.core$partial$fn__5927@7597bcc7\"]\n\n\n\n\n\n(def ctx-after-train\n (my-pipeline {:metamorph/data train-ds\n :metamorph/mode :fit}))\n\n\nctx-after-train\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\nGroup: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n\n\n\n\n\n\n:metamorph/mode :fit#uuid \"fc7de623-27ba-49b7-b044-8c5b6c5a0271\" {:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)}, :options {:model-type :metamorph.ml/dummy-classifier}, :id #uuid \"636ad0f4-4c49-4d42-a887-f947b9a288c5\", :feature-columns [:sex :pclass :embarked], :target-columns [:survived], :target-categorical-maps {:survived #tech.v3.dataset.categorical.CategoricalMap{:lookup-table {\"no\" 0, \"yes\" 1}, :src-column :survived, :result-datatype :float64}}, :scicloj.metamorph.ml/unsupervised? nil}}\n\n\n(keys ctx-after-train)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"fc7de623-27ba-49b7-b044-8c5b6c5a0271\")\n\n\n\n(vals ctx-after-train)\n\n(Group: 0 [711 4]:\n\n\n\n:sex\n:pclass\n:embarked\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n2.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n2.0\n0.0\n1.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n3.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n...\n...\n...\n...\n\n\n0.0\n3.0\n0.0\n0.0\n\n\n0.0\n2.0\n0.0\n0.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n1.0\n3.0\n0.0\n0.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n1.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n1.0\n0.0\n\n\n1.0\n1.0\n2.0\n1.0\n\n\n1.0\n1.0\n0.0\n1.0\n\n\n1.0\n3.0\n1.0\n1.0\n\n\n\n:fit\n{:model-data {:majority-class 1.0, :distinct-labels (0.0 1.0)},\n :options {:model-type :metamorph.ml/dummy-classifier},\n :id #uuid \"636ad0f4-4c49-4d42-a887-f947b9a288c5\",\n :feature-columns [:sex :pclass :embarked],\n :target-columns [:survived],\n :target-categorical-maps\n {:survived\n {:lookup-table {\"no\" 0, \"yes\" 1},\n :src-column :survived,\n :result-datatype :float64}},\n :scicloj.metamorph.ml/unsupervised? nil}\n)\n\n\n\n(def ctx-after-predict\n (my-pipeline (assoc ctx-after-train\n :metamorph/mode :transform\n :metamorph/data test-ds)))\n\n\n(keys ctx-after-predict)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"fc7de623-27ba-49b7-b044-8c5b6c5a0271\")\n\n\n\n\n(-> ctx-after-predict :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]", "crumbs": [ "Tutorials", "8 AutoML using metamorph pipelines" @@ -211,7 +211,7 @@ "href": "noj_book.automl.html#use-metamorph-pipelines-to-do-model-training-with-higher-level-api", "title": "8 AutoML using metamorph pipelines", "section": "8.2 Use metamorph pipelines to do model training with higher level API", - "text": "8.2 Use metamorph pipelines to do model training with higher level API\nAs user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this.\nThe following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.\nIt uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.\nThe ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .\n\n(def train-ctx\n (mm/fit titanic\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n(The dummy-classifier model does not have a lot of state, so there is little to see)\n\n(keys train-ctx)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"ebaae731-dd9a-44d2-be0b-9b2010f582fc\")\n\nTo show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.\nWe can already chain train and test with usual functions:\n\n(->>\n (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})\n (ml/predict test-ds)\n :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]\n\nthe same with pipelines\n\n(def pipeline\n (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n(->>\n (mm/fit-pipe train-ds pipeline)\n (mm/transform-pipe test-ds pipeline)\n :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]", + "text": "8.2 Use metamorph pipelines to do model training with higher level API\nAs user of metamorph.ml we do not need to deal with this low-level details of how metamorph works, we have convenience functions which hide this.\nThe following code will do the same as train, but return a context object, which contains the trained model, so it will execute the pipeline, and not only create it.\nIt uses a convenience function mm/fit which generates compliant context maps internally and executes the pipeline as well.\nThe ctx acts a collector of everything “learned” during :fit, mainly the trained model, but it could be as well other information learned from the data during :fit and to be applied at :transform .\n\n(def train-ctx\n (mm/fit titanic\n (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n(The dummy-classifier model does not have a lot of state, so there is little to see)\n\n(keys train-ctx)\n\n\n(:metamorph/data\n :metamorph/mode\n #uuid \"03fc4a16-6977-4018-adb6-6173f6cf13fd\")\n\nTo show the power of pipelines, I start with doing the simplest possible pipeline, and expand then on it.\nWe can already chain train and test with usual functions:\n\n(->>\n (ml/train train-ds {:model-type :metamorph.ml/dummy-classifier})\n (ml/predict test-ds)\n :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]\n\nthe same with pipelines\n\n(def pipeline\n (mm/pipeline (ml/model {:model-type :metamorph.ml/dummy-classifier})))\n\n\n(->>\n (mm/fit-pipe train-ds pipeline)\n (mm/transform-pipe test-ds pipeline)\n :metamorph/data :survived)\n\n\n#tech.v3.dataset.column<float64>[178]\n:survived\n[1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000...]", "crumbs": [ "Tutorials", "8 AutoML using metamorph pipelines" @@ -222,7 +222,7 @@ "href": "noj_book.automl.html#create-metamorph-compliant-functions", "title": "8 AutoML using metamorph pipelines", "section": "8.3 Create metamorph compliant functions", - "text": "8.3 Create metamorph compliant functions\nAs said before, a metamorph pipeline is composed of metamorph compliant functions / operations, which take as input and output the ctx. There are three ways to create those.\nThe following three expressions create the same metamorph compliant function\n\nimplementing a metamorph compliant function directly via anonymous function\n\n\n(def ops-1\n (fn [ctx]\n (assoc ctx :metamorph/data\n (tc/drop-columns (:metamorph/data ctx) [:embarked]))))\n\n\nusing mm/lift which does the same as 1.\n\n\n(def ops-2 (mm/lift tc/drop-columns [:embarked]))\n\n\nusing a name-space containing lifted functions\n\n\n(require '[tablecloth.pipeline])\n\n\n(def ops-3 (tablecloth.pipeline/drop-columns [:embarked]))\n\nAll three create the same pipeline op and can be used to make a pipeline\n\n(mm/pipeline ops-1)\n\n\n#object[clojure.core$partial$fn__5927 0x74228714 \"clojure.core$partial$fn__5927@74228714\"]\n\n\n(mm/pipeline ops-2)\n\n\n#object[clojure.core$partial$fn__5927 0x2c5adb25 \"clojure.core$partial$fn__5927@2c5adb25\"]\n\n\n(mm/pipeline ops-3)\n\n\n#object[clojure.core$partial$fn__5927 0x68da9bd6 \"clojure.core$partial$fn__5927@68da9bd6\"]\n\nAll three can be called as function taking a dataset iwrapped in a ctx\nPipeline as data is as well supported\n\n(def op-spec [[ml/model {:model-type :metamorph.ml/dummy-classifier}]])\n\n\n(mm/->pipeline op-spec)\n\n\n#object[clojure.core$partial$fn__5927 0x72cc9983 \"clojure.core$partial$fn__5927@72cc9983\"]\n\nCreating these functions does not yet execute anything, they are functions which can be executed against a context as part of a metamorph pipeline. Executions are triggered like this:\n\n(ops-1 {:metamorph/data titanic})\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 3]:\n\n\n\n:sex\n:pclass\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n2.0\n1.0\n\n\n...\n...\n...\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n}\n\n(ops-2 {:metamorph/data titanic})\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 3]:\n\n\n\n:sex\n:pclass\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n2.0\n1.0\n\n\n...\n...\n...\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n}\n\n(ops-3 {:metamorph/data titanic})\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 3]:\n\n\n\n:sex\n:pclass\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n2.0\n1.0\n\n\n...\n...\n...\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n}\nThe mm/lift function transforms any dataset->dataset function into a ctx->ctx function, while using the metamorph convention, as required for metamorph pipeline operations\nFor convenience tablecloth contains a ns where all dataset->dataset functions are lifted into ctx->ctx operations, so can be added to pipelines directly without using lift.\nSo a metamorph pipeline can encapsulate arbitrary transformation of a dataset in the 2 modes. They can be “stateless” (only chaining the dataset, such as drop-columns) or “state-full”, so they store data in the ctx during :fit and can use it in :transform. In the pipeline above, the trained model is stored in this way.\nThis state is not stored globally, but inside the pipeline so this makes pipeline execution “isolated”.\nSo now we can add more operations to the pipeline, and nothing else changes, for example drop columns.", + "text": "8.3 Create metamorph compliant functions\nAs said before, a metamorph pipeline is composed of metamorph compliant functions / operations, which take as input and output the ctx. There are three ways to create those.\nThe following three expressions create the same metamorph compliant function\n\nimplementing a metamorph compliant function directly via anonymous function\n\n\n(def ops-1\n (fn [ctx]\n (assoc ctx :metamorph/data\n (tc/drop-columns (:metamorph/data ctx) [:embarked]))))\n\n\nusing mm/lift which does the same as 1.\n\n\n(def ops-2 (mm/lift tc/drop-columns [:embarked]))\n\n\nusing a name-space containing lifted functions\n\n\n(require '[tablecloth.pipeline])\n\n\n(def ops-3 (tablecloth.pipeline/drop-columns [:embarked]))\n\nAll three create the same pipeline op and can be used to make a pipeline\n\n(mm/pipeline ops-1)\n\n\n#object[clojure.core$partial$fn__5927 0x4ab1e66e \"clojure.core$partial$fn__5927@4ab1e66e\"]\n\n\n(mm/pipeline ops-2)\n\n\n#object[clojure.core$partial$fn__5927 0x7eee8280 \"clojure.core$partial$fn__5927@7eee8280\"]\n\n\n(mm/pipeline ops-3)\n\n\n#object[clojure.core$partial$fn__5927 0x7bdb1918 \"clojure.core$partial$fn__5927@7bdb1918\"]\n\nAll three can be called as function taking a dataset iwrapped in a ctx\nPipeline as data is as well supported\n\n(def op-spec [[ml/model {:model-type :metamorph.ml/dummy-classifier}]])\n\n\n(mm/->pipeline op-spec)\n\n\n#object[clojure.core$partial$fn__5927 0x79747166 \"clojure.core$partial$fn__5927@79747166\"]\n\nCreating these functions does not yet execute anything, they are functions which can be executed against a context as part of a metamorph pipeline. Executions are triggered like this:\n\n(ops-1 {:metamorph/data titanic})\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 3]:\n\n\n\n:sex\n:pclass\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n2.0\n1.0\n\n\n...\n...\n...\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n}\n\n(ops-2 {:metamorph/data titanic})\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 3]:\n\n\n\n:sex\n:pclass\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n2.0\n1.0\n\n\n...\n...\n...\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n}\n\n(ops-3 {:metamorph/data titanic})\n\n{\n\n\n\n\n\n\n\n\n:metamorph/data\n\n\n\n_unnamed [889 3]:\n\n\n\n:sex\n:pclass\n:survived\n\n\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n0.0\n1.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n1.0\n\n\n1.0\n2.0\n1.0\n\n\n...\n...\n...\n\n\n1.0\n2.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n0.0\n3.0\n0.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n2.0\n0.0\n\n\n1.0\n1.0\n1.0\n\n\n1.0\n3.0\n0.0\n\n\n0.0\n1.0\n1.0\n\n\n0.0\n3.0\n0.0\n\n\n\n\n\n\n\n\n}\nThe mm/lift function transforms any dataset->dataset function into a ctx->ctx function, while using the metamorph convention, as required for metamorph pipeline operations\nFor convenience tablecloth contains a ns where all dataset->dataset functions are lifted into ctx->ctx operations, so can be added to pipelines directly without using lift.\nSo a metamorph pipeline can encapsulate arbitrary transformation of a dataset in the 2 modes. They can be “stateless” (only chaining the dataset, such as drop-columns) or “state-full”, so they store data in the ctx during :fit and can use it in :transform. In the pipeline above, the trained model is stored in this way.\nThis state is not stored globally, but inside the pipeline so this makes pipeline execution “isolated”.\nSo now we can add more operations to the pipeline, and nothing else changes, for example drop columns.", "crumbs": [ "Tutorials", "8 AutoML using metamorph pipelines" diff --git a/notebooks/noj_book/automl.clj b/notebooks/noj_book/automl.clj index a809441..a49c918 100644 --- a/notebooks/noj_book/automl.clj +++ b/notebooks/noj_book/automl.clj @@ -1,6 +1,6 @@ ;; # AutoML using metamorph pipelines -;; author: Carsten Behring +;; Author: Carsten Behring ;; In this tutorial we see how to use `metamorph.ml` to perform automatic machine learning. ;; With AutoML we mean to try lots of different models and hyper parameters and rely on automatic