Skip to content

Latest commit

 

History

History
68 lines (51 loc) · 4.41 KB

README_limitations.md

File metadata and controls

68 lines (51 loc) · 4.41 KB

MLflow Export Import Limitations

General Limitations

  • The MLflow API returns uneven error codes especially when calling the Databaricks tracking server (some 404 should be 403, etc.). MLflow Export Import does a best effort to recover and not terminate the overall export/import, and indicate the root cause.

  • If the run linked to a registered model version does not exist (has been deleted) the version is not exported since when importing MLflowClient.create_model_version requires a run ID.

  • There are no MLflow API guarantees that model version numbers are preserved. Version numbers are monotonically increasing read-only integers generated by MLflow. However, you should be able to get the same sequence of imported version numbers if you satisfy the following conditions:

    • Have not deleted any version numbers
    • Are exporting all versions
    • Are not using the use-threads option
  • Run tag values are always exported as a string even if they are an int since MlflowClienti.get_run() does not return tag type information.

  • Custom experiment artifact location importing is not supported. The imported destination artifact location will be generated as a default by MLflow. This is due to a number of non-trivial reasons especially when importing into Databricks MLflow. A solution is possible but with non-trivial effort due to complex semantics.

  • Importing from a file-based --backend-store-uri implementation is not supported since it does not have the same semantics as a database-based implementation (e.g. primary key constraints are not respected, model registry is not implemented, etc.). This is is not a limitation of mlflow-export-import but rather of the MLflow file-based implementation which is not meant for production.

  • Nested runs are only supported when you import an experiment. For a run, it is still a TODO. `

Databricks Limitations

A Databricks MLflow run is associated with a notebook that generated the model.

There are two types of Databricks notebooks:

  • Workspace notebooks. Classical notebooks whose source of truth is a notebooks revision (version) stored in the Databricks internal database. Every run has MLflow system tags (starting with mlflow.notebook*) that point to the notebook and its revision:
    • mlflow.databricks.notebookPath
    • mlflow.databricks.notebookRevisionID
  • Git Repo notebooks. Git-based (Repos) notebooks whose source of truth is a git repo.
    • mlflow.databricks.gitRepoUrl
    • mlflow.databricks.gitRepoCommit

Workspace Notebooks

Exporting Workspace Notebook Revisions

  • The notebook revision associated with a run can be exported. It is stored as an artifact in the run's notebooks artifact directory.
  • You can save the notebook in the suppported SOURCE, HTML, JUPYTER and DBC formats.
  • Examples: notebooks/notebook.dbc or notebooks/notebook.source.

Importing Workspace Notebooks

  • Partial functionality due to Databricks REST API limitations.
  • The Databricks REST API does not support:
    • Importing a notebook with its revision history.
    • Linking an imported run with its associated imported notebook revision.
  • The API does however allow you to export a notebook revision (undocumented), but it is simply a notebook with one revision.
  • When you import a run, the link to its source notebookRevisionID tag will be a dead link and you cannot access the notebook from the MLflow UI.
  • The notebook is exported as a run artifact for convenience.
  • As a convenience, the import tools allows you to import the exported notebook into a Databricks workspace directory with the --dst-notebook-dir option. See [import-run]((README_single.md#Import-run) or [import-experiment]((README_single.md#Import-experiment). However there is no API endpoint to link that notebook to its run.
  • You must export a notebook in the SOURCE format for the notebook to be imported.

Used ID

  • When importing an MLflow run or registered model, the semantics of the user_id attribute differs between OSS and Databricks MLflow.
    • OSS MLflow. You can overrride the source user_id with whatever value you want.
    • Databricks MLflow. You cannot set the user_id tag which will be based upon the personal access token (PAT) of the importing user.