Name		Name	Last commit message	Last commit date
parent directory ..
MLproject		MLproject
README.md		README.md
conda.yaml		conda.yaml
playbook.ipynb		playbook.ipynb
predict.py		predict.py
train.py		train.py

README.md

mlflow-spark-summit-2019 - pyspark

Overview

PySpark Decision Tree Classification example
Source: train.py and predict.py
Experiment name: pypark

Train

Unmanaged without mlflow run

To run with standard main function

spark-submit --master local[2] train.py --max_depth 16 --max_bins 32

Using mlflow run

These runs use the MLproject file. For more details see MLflow documentation - Running Projects.

Note that mlflow run ignores the set_experiment() function so you must specify the experiment with the --experiment-id argument.

mlflow run local

mlflow run . -P max_depth=3 -P max_bins=24 --experiment-id=2019

mlflow run github

mlflow run https://github.com/amesar/mlflow-fun.git#examples/pyspark \
   -P max_depth=3 -P max_bins=24 \
  --experiment-id=2019

Predict

See predict.py.

run_id=7b951173284249f7a3b27746450ac7b0
spark-submit --master local[2] predict.py $run_id

Predictions
root
 |-- label: double (nullable = true)
 |-- features: vector (nullable = true)
 |-- indexedLabel: double (nullable = false)
 |-- indexedFeatures: vector (nullable = true)
 |-- rawPrediction: vector (nullable = true)
 |-- probability: vector (nullable = true)
 |-- prediction: double (nullable = false)

+----------+------------+-----------+
|prediction|indexedLabel|probability|
+----------+------------+-----------+
|0.0       |1.0         |[1.0,0.0]  |
|1.0       |0.0         |[0.0,1.0]  |
|1.0       |0.0         |[0.0,1.0]  |
+----------+------------+-----------+

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark

pyspark

README.md

mlflow-spark-summit-2019 - pyspark

Overview

Train

Unmanaged without mlflow run

Using mlflow run

Predict

Files

pyspark

Directory actions

More options

Directory actions

More options

Latest commit

History

pyspark

Folders and files

parent directory

README.md

mlflow-spark-summit-2019 - pyspark

Overview

Train

Unmanaged without mlflow run

Using mlflow run

Predict