-
The MLflow API returns uneven error codes especially when calling the Databaricks tracking server (some 404 should be 403, etc.). MLflow Export Import does a best effort to recover and not terminate the overall export/import, and indicate the root cause.
-
If the run linked to a registered model version does not exist (has been deleted) the version is not exported since when importing MLflowClient.create_model_version requires a run ID.
-
There are no MLflow API guarantees that model version numbers are preserved. Version numbers are monotonically increasing read-only integers generated by MLflow. However, you should be able to get the same sequence of imported version numbers if you satisfy the following conditions:
- Have not deleted any version numbers
- Are exporting all versions
- Are not using the
use-threads
option
-
Run tag values are always exported as a
string
even if they are anint
since MlflowClienti.get_run() does not return tag type information. -
Custom experiment artifact location importing is not supported. The imported destination artifact location will be generated as a default by MLflow. This is due to a number of non-trivial reasons especially when importing into Databricks MLflow. A solution is possible but with non-trivial effort due to complex semantics.
-
Importing from a file-based
--backend-store-uri
implementation is not supported since it does not have the same semantics as a database-based implementation (e.g. primary key constraints are not respected, model registry is not implemented, etc.). This is is not a limitation of mlflow-export-import but rather of the MLflow file-based implementation which is not meant for production. -
Nested runs are only supported when you import an experiment. For a run, it is still a TODO. `
A Databricks MLflow run is associated with a notebook that generated the model.
There are two types of Databricks notebooks:
- Workspace notebooks. Classical notebooks whose source of truth is a notebooks revision (version) stored in the Databricks internal database.
Every run has MLflow system tags (starting with
mlflow.notebook*
) that point to the notebook and its revision:mlflow.databricks.notebookPath
mlflow.databricks.notebookRevisionID
- Git Repo notebooks. Git-based (Repos) notebooks whose source of truth is a git repo.
mlflow.databricks.gitRepoUrl
mlflow.databricks.gitRepoCommit
- The notebook revision associated with a run can be exported. It is stored as an artifact in the run's
notebooks
artifact directory. - You can save the notebook in the suppported SOURCE, HTML, JUPYTER and DBC formats.
- Examples:
notebooks/notebook.dbc
ornotebooks/notebook.source
.
- Partial functionality due to Databricks REST API limitations.
- The Databricks REST API does not support:
- Importing a notebook with its revision history.
- Linking an imported run with its associated imported notebook revision.
- The API does however allow you to export a notebook revision (undocumented), but it is simply a notebook with one revision.
- When you import a run, the link to its source
notebookRevisionID
tag will be a dead link and you cannot access the notebook from the MLflow UI. - The notebook is exported as a run artifact for convenience.
- As a convenience, the import tools allows you to import the exported notebook into a Databricks workspace directory with the
--dst-notebook-dir
option. See [import-run]((README_single.md#Import-run) or [import-experiment]((README_single.md#Import-experiment). However there is no API endpoint to link that notebook to its run. - You must export a notebook in the SOURCE format for the notebook to be imported.
- When importing an MLflow run or registered model, the semantics of the
user_id
attribute differs between OSS and Databricks MLflow.- OSS MLflow. You can overrride the source
user_id
with whatever value you want. - Databricks MLflow. You cannot set the
user_id
tag which will be based upon the personal access token (PAT) of the importing user.
- OSS MLflow. You can overrride the source