Skip to content

Commit

Permalink
Remote-only packaging of MLMD Python lib
Browse files Browse the repository at this point in the history
  • Loading branch information
tarilabs committed Jan 29, 2024
1 parent bf5c8d3 commit dfe7252
Show file tree
Hide file tree
Showing 10 changed files with 213 additions and 189 deletions.
128 changes: 4 additions & 124 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,126 +1,6 @@
A remote-only, gRPC-only, MLMD Python client variant.

# ML Metadata
## See also:

[![Python](https://img.shields.io/badge/python%20-3.8%7C3.9%7C3.10-blue)](https://github.com/google/ml-metadata)
[![PyPI](https://badge.fury.io/py/ml-metadata.svg)](https://badge.fury.io/py/ml-metadata)

*ML Metadata (MLMD)* is a library for recording and retrieving metadata
associated with ML developer and data scientist workflows.

NOTE: ML Metadata may be backwards incompatible before version 1.0.

## Getting Started

For more background on MLMD and instructions on using it, see the
[getting started guide](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md)

## Installing from PyPI

The recommended way to install ML Metadata is to use the
[PyPI package](https://pypi.org/project/ml-metadata/):

```bash
pip install ml-metadata
```

Then import the relevant packages:

```python
from ml_metadata import metadata_store
from ml_metadata.proto import metadata_store_pb2
```

### Nightly Packages

ML Metadata (MLMD) also hosts nightly packages at
https://pypi-nightly.tensorflow.org on Google Cloud. To install the latest
nightly package, please use the following command:

```bash
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple ml-metadata
```

## Installing with Docker

This is the recommended way to build ML Metadata under Linux, and is
continuously tested at Google.

Please first install `docker` and `docker-compose` by following the directions:
[docker](https://docs.docker.com/install/);
[docker-compose](https://docs.docker.com/compose/install/).

Then, run the following at the project root:

```bash
DOCKER_SERVICE=manylinux-python${PY_VERSION}
sudo docker-compose build ${DOCKER_SERVICE}
sudo docker-compose run ${DOCKER_SERVICE}
```

where `PY_VERSION` is one of `{38, 39, 310}`.

A wheel will be produced under `dist/`, and installed as follows:

```shell
pip install dist/*.whl
```

## Installing from source


### 1. Prerequisites

To compile and use ML Metadata, you need to set up some prerequisites.


#### Install Bazel

If Bazel is not installed on your system, install it now by following [these
directions](https://bazel.build/versions/master/docs/install.html).

#### Install cmake
If cmake is not installed on your system, install it now by following [these
directions](https://cmake.org/install/).

### 2. Clone ML Metadata repository

```shell
git clone https://github.com/google/ml-metadata
cd ml-metadata
```

Note that these instructions will install the latest master branch of ML
Metadata. If you want to install a specific branch (such as a release branch),
pass `-b <branchname>` to the `git clone` command.

### 3. Build the pip package

ML Metadata uses Bazel to build the pip package from source:

```shell
python setup.py bdist_wheel
```

You can find the generated `.whl` file in the `dist` subdirectory.

### 4. Install the pip package

```shell
pip install dist/*.whl
```

### 5.(Optional) Build the grpc server

ML Metadata uses Bazel to build the c++ binary from source:

```shell
bazel build -c opt --define grpc_no_ares=true //ml_metadata/metadata_store:metadata_store_server
```

## Supported platforms

MLMD is built and tested on the following 64-bit operating systems:

* macOS 10.14.6 (Mojave) or later.
* Ubuntu 20.04 or later.
* Windows 10 or later.
Upstream project: https://github.com/google/ml-metadata
Motivations for this client variant: https://github.com/opendatahub-io/model-registry/blob/main/doc/remote_only_packaging_of_MLMD_Python_lib.md
6 changes: 6 additions & 0 deletions ml_metadata-1.14.0-remote-testing/conn_config.pb
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
connection_config {
sqlite {
filename_uri: '/tmp/shared/metadata.sqlite.db'
connection_mode: READWRITE_OPENCREATE
}
}
163 changes: 163 additions & 0 deletions ml_metadata-1.14.0-remote-testing/demo_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
from pprint import pprint

import ml_metadata as mlmd
from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2

def test_demo():
# Setup client config
client_connection_config = metadata_store_pb2.MetadataStoreClientConfig()
client_connection_config.host = 'localhost'
client_connection_config.port = 8080

store = metadata_store.MetadataStore(client_connection_config)

# Create ArtifactTypes, e.g., DataSet
data_type = metadata_store_pb2.ArtifactType()
data_type.name = "DataSet"
data_type.properties["day"] = metadata_store_pb2.INT
data_type.properties["split"] = metadata_store_pb2.STRING
data_type_id = store.put_artifact_type(data_type)
pprint(data_type_id)

# Create ArtifactTypes, e.g.,SavedModel
model_type = metadata_store_pb2.ArtifactType()
model_type.name = "SavedModel"
model_type.properties["version"] = metadata_store_pb2.INT
model_type.properties["name"] = metadata_store_pb2.STRING
model_type_id = store.put_artifact_type(model_type)
pprint(model_type_id)

# ModelVersion
model_version_type = metadata_store_pb2.ContextType()
model_version_type.name = "odh.ModelVersion"
model_version_type.properties["model_name"] = metadata_store_pb2.STRING
model_version_type.properties["version"] = metadata_store_pb2.STRING
model_version_type_id = store.put_context_type(model_version_type)
pprint(model_version_type_id)

# Query all registered Artifact types.
artifact_types = store.get_artifact_types()
pprint(artifact_types)

# Create an ExecutionType, e.g., Trainer
trainer_type = metadata_store_pb2.ExecutionType()
trainer_type.name = "Trainer"
trainer_type.properties["state"] = metadata_store_pb2.STRING
trainer_type_id = store.put_execution_type(trainer_type)
pprint(trainer_type_id)

# Query a registered Execution type with the returned id
[registered_type] = store.get_execution_types_by_id([trainer_type_id])
pprint(registered_type)

# Create an input artifact of type DataSet
data_artifact = metadata_store_pb2.Artifact()
data_artifact.uri = 'path/to/data'
data_artifact.properties["day"].int_value = 1
data_artifact.properties["split"].string_value = 'train'
data_artifact.type_id = data_type_id
[data_artifact_id] = store.put_artifacts([data_artifact])
pprint(data_artifact_id)

# Query all registered Artifacts
artifacts = store.get_artifacts()
pprint(artifacts)

# Plus, there are many ways to query the same Artifact
[stored_data_artifact] = store.get_artifacts_by_id([data_artifact_id])
print(stored_data_artifact)
artifacts_with_uri = store.get_artifacts_by_uri(data_artifact.uri)
pprint(artifacts_with_uri)

artifacts_with_conditions = store.get_artifacts(
list_options=mlmd.ListOptions(
filter_query='uri LIKE "%/data" AND properties.day.int_value > 0'))
pprint(artifacts_with_conditions)

# Register the Execution of a Trainer run
trainer_run = metadata_store_pb2.Execution()
trainer_run.type_id = trainer_type_id
trainer_run.properties["state"].string_value = "RUNNING"
[run_id] = store.put_executions([trainer_run])
pprint(run_id)

# Query all registered Execution
executions = store.get_executions_by_id([run_id])
pprint(executions)

# Similarly, the same execution can be queried with conditions.
executions_with_conditions = store.get_executions(
list_options = mlmd.ListOptions(
filter_query='type = "Trainer" AND properties.state.string_value IS NOT NULL'))
pprint(executions_with_conditions)

# Define the input event
input_event = metadata_store_pb2.Event()
input_event.artifact_id = data_artifact_id
input_event.execution_id = run_id
input_event.type = metadata_store_pb2.Event.DECLARED_INPUT

# Record the input event in the metadata store
store.put_events([input_event])
# Declare the output artifact of type SavedModel
model_artifact = metadata_store_pb2.Artifact()
model_artifact.uri = 'path/to/model/file'
model_artifact.properties["version"].int_value = 1
model_artifact.properties["name"].string_value = 'MNIST-v1'
model_artifact.type_id = model_type_id
[model_artifact_id] = store.put_artifacts([model_artifact])
pprint(model_artifact_id)

# Declare the output event
output_event = metadata_store_pb2.Event()
output_event.artifact_id = model_artifact_id
output_event.execution_id = run_id
output_event.type = metadata_store_pb2.Event.DECLARED_OUTPUT

# Submit output event to the Metadata Store
store.put_events([output_event])
trainer_run.id = run_id
trainer_run.properties["state"].string_value = "COMPLETED"
store.put_executions([trainer_run])

# Create a ContextType, e.g., Experiment with a note property
experiment_type = metadata_store_pb2.ContextType()
experiment_type.name = "Experiment"
experiment_type.properties["note"] = metadata_store_pb2.STRING
experiment_type_id = store.put_context_type(experiment_type)

# Group the model and the trainer run to an experiment.
my_experiment = metadata_store_pb2.Context()
my_experiment.type_id = experiment_type_id
# Give the experiment a name
my_experiment.name = "exp1"
my_experiment.properties["note"].string_value = "My first experiment."
[experiment_id] = store.put_contexts([my_experiment])

attribution = metadata_store_pb2.Attribution()
attribution.artifact_id = model_artifact_id
attribution.context_id = experiment_id

association = metadata_store_pb2.Association()
association.execution_id = run_id
association.context_id = experiment_id

store.put_attributions_and_associations([attribution], [association])

# Query the Artifacts and Executions that are linked to the Context.
experiment_artifacts = store.get_artifacts_by_context(experiment_id)
pprint(experiment_artifacts)
experiment_executions = store.get_executions_by_context(experiment_id)
pprint(experiment_executions)

# You can also use neighborhood queries to fetch these artifacts and executions
# with conditions.
experiment_artifacts_with_conditions = store.get_artifacts(
list_options = mlmd.ListOptions(
filter_query=('contexts_a.type = "Experiment" AND contexts_a.name = "exp1"')))
pprint(experiment_artifacts_with_conditions)
experiment_executions_with_conditions = store.get_executions(
list_options = mlmd.ListOptions(
filter_query=('contexts_a.id = {}'.format(experiment_id))))
pprint(experiment_executions_with_conditions)
18 changes: 9 additions & 9 deletions ml_metadata/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,17 @@ _public_protos = [
"//ml_metadata/proto:metadata_store_service_pb2_grpc.py",
]

_py_extension = select({
":windows": [
"//ml_metadata/metadata_store/pywrap:metadata_store_extension.pyd",
],
"//conditions:default": [
"//ml_metadata/metadata_store/pywrap:metadata_store_extension.so",
],
})
# _py_extension = select({
# ":windows": [
# "//ml_metadata/metadata_store/pywrap:metadata_store_extension.pyd",
# ],
# "//conditions:default": [
# "//ml_metadata/metadata_store/pywrap:metadata_store_extension.so",
# ],
# })

sh_binary(
name = "move_generated_files",
srcs = ["move_generated_files.sh"],
data = _py_extension + _public_protos,
data = _public_protos,
)
23 changes: 5 additions & 18 deletions ml_metadata/metadata_store/metadata_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@

from ml_metadata import errors
from ml_metadata import proto
from ml_metadata.metadata_store.pywrap.metadata_store_extension import metadata_store as metadata_store_serialized
# fork of ml-metadata supporting ONLY remote gRPC connection
# from ml_metadata.metadata_store.pywrap.metadata_store_extension import metadata_store as metadata_store_serialized
from ml_metadata.proto import metadata_store_pb2
from ml_metadata.proto import metadata_store_service_pb2
from ml_metadata.proto import metadata_store_service_pb2_grpc
Expand Down Expand Up @@ -110,19 +111,7 @@ def __init__(self, config, enable_upgrade_migration: bool = False):
self._max_num_retries = 5
self._service_client_wrapper = None
if isinstance(config, proto.ConnectionConfig):
self._using_db_connection = True
migration_options = metadata_store_pb2.MigrationOptions()
migration_options.enable_upgrade_migration = enable_upgrade_migration
self._metadata_store = metadata_store_serialized.CreateMetadataStore(
config.SerializeToString(), migration_options.SerializeToString())
logging.log(logging.INFO, 'MetadataStore with DB connection initialized')
logging.log(logging.DEBUG, 'ConnectionConfig: %s', config)
if config.HasField('retry_options'):
self._max_num_retries = config.retry_options.max_num_retries
logging.log(logging.INFO,
'retry options is overwritten: max_num_retries = %d',
self._max_num_retries)
return
raise RuntimeError('Unimplemented. This is fork of ml-metadata supporting ONLY remote gRPC connection')
if not isinstance(config, proto.MetadataStoreClientConfig):
raise ValueError('MetadataStore is expecting either '
'proto.ConnectionConfig or '
Expand Down Expand Up @@ -220,8 +209,7 @@ def _call_method(self, method_name, request, response) -> None:
response: a protobuf message, filled from the return value of the method.
"""
if self._using_db_connection:
cc_method = getattr(metadata_store_serialized, method_name)
self._pywrap_cc_call(cc_method, request, response)
raise RuntimeError('Unimplemented. This is fork of ml-metadata supporting ONLY remote gRPC connection')
else:
grpc_method = getattr(self._metadata_store_stub, method_name)
try:
Expand Down Expand Up @@ -1783,8 +1771,7 @@ def downgrade_schema(config: proto.ConnectionConfig,
try:
migration_options = metadata_store_pb2.MigrationOptions()
migration_options.downgrade_to_schema_version = downgrade_to_schema_version
metadata_store_serialized.CreateMetadataStore(
config.SerializeToString(), migration_options.SerializeToString())
raise RuntimeError('Unimplemented. This is fork of ml-metadata supporting ONLY remote gRPC connection')
except RuntimeError as e:
if str(e).startswith('MLMD cannot be downgraded to schema_version'):
raise errors.make_exception(str(e), errors.INVALID_ARGUMENT) from e
Expand Down
Loading

0 comments on commit dfe7252

Please sign in to comment.