sklearn-bot

sklearn-bot that can be used to automatically run scikit-learn classifiers on OpenML tasks. This is an improved version of the sklearn-bot that was created for the Parameter IMPortance project.

Usage

Currently, two use cases are supported. Furthermore, we describe how to obtain the results back from OpenML, once uploaded.

Single Task

To run the sklearn-bot on a single task from OpenML, please use the following command:

python examples/run_on_task.py --task_id 5 --openml_server https://www.openml.org/api/v1/ --openml_apikey abcdef --upload_result

The following command line options are accepted:

n_executions: By default, the sklearn-bot will execute 1000 runs and terminate after these. Using this option this behavior can be overridden to any other number of runs.
task_id: the OpenML task id to run on.
openml_server: due to the beta-state of the sklearn-bot, the default behavior is to upload results to the test server. By using this option, this behavior can be overridden
openml_apikey: API key to authenticate yourself with. Can be found on your OpenML profile.
classifier_name: the classifier to run. The default behavior is that a random classifier will be selected. Select one of the list that is returned by sklearnbot.config_spaces.get_available_config_spaces(False)
output_dir: local directory where the results can be stored before uploading.
upload_result: the default behavior of the sklearn-bot is to store the runs locally, before uploading them to the server. By specifying this flag, the runs will be uploaded and the local files will be deleted.

Additionally, the sklearn-bot can be ran on a OpenML benchmark suite, for example the OpenML100. The sklearn-bot will execute a set of n_executions, each time selecting a task at random from the full set of tasks.

Benchmark suite

To run the sklearn-bot on a a benchmark suite from OpenML, please use the following command:

python examples/run_on_study.py --study_id OpenML100 --openml_server https://www.openml.org/api/v1/ --openml_apikey abcdef --upload_result

This function has the same command line options as run_on_task, except for the option task_id. Additionally,

study_id (int or string) refers to the study ID on which the sklearn-bot shall be ran.

Obtaining results

Usually, running the sklearn-bot is done so that the results can be re-used in one or another way. Once the results have been stored on OpenML, it is important to be able to acquire them back. Although it is slightly out of the scope of the sklearn-bot, the following command obtains all results that have been created using the given search space. Note that this can also include results from other people that ran the sklearn-bot, or happened to run a scikit-learn classifier with hyperparameter settings that also fell within the search range.

To obtain results from the sklearn-bot that were uploaded to OpenML (using the --upload_result flag), please use the following command:

python examples/obtain_results.py --study_id OpenML100 --openml_server https://www.openml.org/api/v1/ --scoring predictive_accuracy --classifier_name decision_tree

The following command line options are accepted:

output_directory: This is where the results will be placed as ARFF file. Also cache files will be stored here, that allow for fast regeneration of the datasets.
num_runs: The number of runs per task that will be obtained. Setting this to a number lower than the actual available runs will allow for efficient caching.
study_id: Refers to the benchmark suite (which tasks will be included)
scoring: The performance measure to download. Defaults to predictive_accuracy, but for example area_under_roc_curve, f_measure and precision are also sensible options.
normalize (flag): If set, all performance results will be normalized to the interval [0, 1] per task.
openml_server: The server from which the results should be downloaded. Make sure this is the same as the server to which the results where uploaded.
openml_apikey: API key to authenticate yourself with. Can be found on your OpenML profile. (Although most operations involve obtaining results, we need to do a single post-request to the server as well.)
classifier_name: The classifier from which the results should be downloaded. Make sure that this is the same as the classifier with which the bot was ran. Note that the current functionality does not support the option to obtain the results from all classifiers at once yet.

Feature Requests

The following features will gradually be added to the sklearn-bot (contributors are welcome):

Non-static pipelines. Although the current definition of a pipeline is fixed, we aim to add the notion of non-static pipelines. This should be incorporated in a modular way.
More sampling methods. Currently, the sklearn-bot sampled uniformly from a set of tasks, however it would be great if it was able to sample according to the number of runs per task present on OpenML.

Dependencies

OpenML-Python - Base functionality for connecting with OpenML.
OpenML-Python-Contrib - Not on pypi yet. Used for convenience functions to obtain the results from OpenML. The bot itself does not rely on this.
Scikit-learn - Version 0.20.0 and up.
ConfigSpace - For defining search spaces.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
data		data
examples		examples
sklearnbot		sklearnbot
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sklearn-bot

Usage

Single Task

Benchmark suite

Obtaining results

Feature Requests

Dependencies

About

Releases

Sponsor this project

Packages

Languages

License

openml/sklearn-bot

Folders and files

Latest commit

History

Repository files navigation

sklearn-bot

Usage

Single Task

Benchmark suite

Obtaining results

Feature Requests

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages