-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
* refactor(sdk): added option for custom metric collector for tune in… #2406
base: master
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… katlib_client.py Signed-off-by: Prakhar Singhal <[email protected]>
@Electronic-Waste can you review and let me know if any changes are required in this. Thanks a lot! |
@prakhar479 Yes, of course. Thanks for your great effort to Katib! I'll look into this PR in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/rerun-all |
Signed-off-by: Prakhar Singhal <[email protected]>
I have corrected some oversights from my side and need approval for testing. Thanks! |
/rerun-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job @prakhar479 , I left some comments for you!
for using custom metric collectors use "custom_collector" key, | ||
for example, `metrics_collector_config = {"custom_collector": "prometheus"}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess prometheus
is not a custom collector, since we define it here:
katib/pkg/apis/controller/common/v1beta1/common_types.go
Lines 216 to 220 in 8eb0e86
PrometheusMetricCollector CollectorKind = "PrometheusMetric" | |
DefaultPrometheusPath string = "/metrics" | |
DefaultPrometheusPort int = 8080 | |
CustomCollector CollectorKind = "Custom" |
Here are some relevant resources which may be helpful for you:
https://github.com/kubeflow/katib/blob/master/examples/v1beta1/metrics-collector/custom-metrics-collector.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, custom_collector
accepts V1Container
as its input, not a str
:
class V1beta1CollectorSpec(object): | |
"""NOTE: This class is auto generated by OpenAPI Generator. | |
Ref: https://openapi-generator.tech | |
Do not edit the class manually. | |
""" | |
""" | |
Attributes: | |
openapi_types (dict): The key is attribute name | |
and the value is attribute type. | |
attribute_map (dict): The key is attribute name | |
and the value is json key in definition. | |
""" | |
openapi_types = { | |
'custom_collector': 'V1Container', | |
'kind': 'str' | |
} | |
attribute_map = { | |
'custom_collector': 'customCollector', | |
'kind': 'kind' | |
} |
Can you add some comments to remind users of this usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you add some e2e tests like this (an example for StdOut
Collector)?
def run_e2e_experiment_create_by_tune( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Electronic-Waste, I have made the appropriate changes in the function comments to reflect the correct usage
for e2e tests, should I modify the existing run-e2e-tune-api.py
test to use metrics_collector_config
or should I create a separate test script for it altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prakhar479 I think modifying the existing run-e2e-tune-api.py
test is better since it's hard to pass the collector config to the test script: https://github.com/kubeflow/katib/blob/master/test/e2e/v1beta1/scripts/gh-actions/run-e2e-tune-api.sh. WDYT👀 @andreyvelich @tenzen-y @johnugeorge
Also a small tip for reference: you can build images you need here
katib/test/e2e/v1beta1/scripts/gh-actions/build-load.sh
Lines 166 to 170 in 8eb0e86
# Testing image for tune function | |
if "$TUNE_API"; then | |
echo -e "\nPulling and building testing image for tune function..." | |
_build_containers "suggestion-hyperopt" "$CMD_PREFIX/suggestion/hyperopt/$VERSION/Dockerfile" | |
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Electronic-Waste for the helpful insights 😄
PTAL👀 @andreyvelich @tenzen-y @johnugeorge when you have time. |
Signed-off-by: Prakhar Singhal <[email protected]>
…ontainer Signed-off-by: Prakhar Singhal <[email protected]>
I have modified the comment on usage of the custom metric param as well e2e test for tune Api. I was a bit confused with placement for these new files and have placed all of them in |
… dummy-collector image Signed-off-by: Prakhar Singhal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prakhar479 Sorry for my late response. I'm busy with other affairs these two weeks.
I left a few comments for you. Thanks for your greate contribution!
@@ -251,8 +254,8 @@ def tune( | |||
pip_index_url: The PyPI url from which to install Python packages. | |||
metrics_collector_config: Specify the config of metrics collector, | |||
for example, `metrics_collector_config = {"kind": "Push"}`. | |||
Currently, we only support `StdOut` and `Push` metrics collector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need to tell users about the supported types of MC. So can you re-add this line?
for using custom metric collectors use "custom_collector" key and provide instance of custom V1Container as value, | ||
for example, `metrics_collector_config = {"kind" : "Custom", "custom_collector": <Instance of V1Container>}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can reorganize these comments and explain each field in metrics_collector_config
? Like
kind: specify the kind of Metrics Collector (currently we support...)
custom_collector: ...
# [3] Create Katib Experiment with 4 Trials and 2 CPUs per Trial. | ||
# [3] Create a dummy metric collector (DOES NOT HAVE A IMAGE) | ||
metric_collector = V1Container( | ||
name="dummy-collector", | ||
image="dummy-collector:latest", | ||
command=["python", "/app/dummy-collector.py"], | ||
args=["--metric-name=result"], | ||
env=[ | ||
client.V1EnvVar(name="EXPERIMENT_NAME", value=exp_name), | ||
client.V1EnvVar(name="EXPERIMENT_NAMESPACE", value=exp_namespace) | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can create another function run_e2e_experiment_create_by_tune_custom
to run e2e tests for custom collector, rather than delete original e2e test for StdOut
collector. Then we can run these e2e tests together in this file:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I think its a good idea .Will make this change in a few days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made corresponding changes to run both e2e tests (one with custom metrics collector and another with default metric collector [StdOut])
/rerun-all |
Signed-off-by: Prakhar Singhal <[email protected]>
Signed-off-by: Prakhar Singhal <[email protected]>
Signed-off-by: Prakhar Singhal <[email protected]>
I have made the neccesary changes that should also solve failing tests. Let me know about any further changes/suggestions @Electronic-Waste @andreyvelich @tenzen-y @johnugeorge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @prakhar479 . I left some comments for you.
@@ -35,7 +96,7 @@ def objective(parameters): | |||
"b": search.double(min=0.1, max=0.2) | |||
} | |||
|
|||
# [3] Create Katib Experiment with 4 Trials and 2 CPUs per Trial. | |||
# [4] Create Katib Experiment with 4 Trials and 2 CPUs per Trial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why its numer is 4? I guess 3 is more suitable.
try: | ||
run_e2e_experiment_create_by_tune(katib_client, exp_name, exp_namespace) | ||
exp_name = "tune-example-default-metrics-collector" | ||
logging.info(f"Runnning E2E for Experiment created by tune: {exp_namespace}/{exp_name}") | ||
run_e2e_experiment_create_by_tune_default_metrics_collector(katib_client, exp_name, exp_namespace) | ||
logging.info("---------------------------------------------------------------") | ||
logging.info(f"E2E is succeeded for Experiment created by tune: {exp_namespace}/{exp_name}") | ||
|
||
exp_name = "tune-example-custom-metrics-collector" | ||
logging.info(f"Runnning E2E for Experiment created by tune: {exp_namespace}/{exp_name}") | ||
run_e2e_experiment_create_by_tune_custom_metrics_collector(katib_client, exp_name, exp_namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe splitting these two e2e tests into separate try-catch
clauses is better to identify error :)
@@ -93,4 +162,4 @@ def objective(parameters): | |||
# Delete the Experiment. | |||
logging.info("---------------------------------------------------------------") | |||
logging.info("---------------------------------------------------------------") | |||
katib_client.delete_experiment(exp_name, exp_namespace) | |||
katib_client.delete_experiment(exp_name, exp_namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to delete the former experiment before we run another experiment. Otherwise, we may run into xxx experiment alreay exists
error.
…g and fixed some bugs Signed-off-by: Prakhar Singhal <[email protected]>
@Electronic-Waste I have fixed these issues lmk if anything else is needed. thanks! |
/rerun-all |
@prakhar479 Can you please fix the lint error and the error in tune API? |
Signed-off-by: Prakhar Singhal <[email protected]>
/rerun-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this @prakhar479!
I left a few comments.
cc @kubeflow/wg-automl-leads
`metrics_collector_config`: Specify the configuration for the metrics collector with following keys: | ||
- **kind**: Specify the kind of Metrics Collector. Currently supported values are: | ||
- `StdOut`: Collects metrics from standard output. | ||
- `None`: No metrics collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Can you lead me to where I can find all supported metric collector. For the current comment I had referenced https://github.com/kubeflow/katib/blob/master/pkg/ui/v1beta1/frontend/src/app/models/experiment.k8s.model.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, UI also needs to be updated with the latest changes cc @Electronic-Waste
Please ref the official CRDs APIs for Metrics Collector spec: https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L207-L227
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrey, I'll update UI with the latest changes.
- **custom_collector**: If the `kind` is set to `Custom`, you must provide an instance of a custom `V1Container` as the value. For example: | ||
`metrics_collector_config = {"kind" : "Custom", "custom_collector": <Instance of V1Container>}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add support for custom_collectors in the followup PRs when we get user requests ?
I feel that Data Scientists who will use tune
API doesn't need such functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but can you elaborate on this a bit I am not quite sure what this intends since custom metric is already supported by underlying framework for tune function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that until we find use-cases when tune
function needs to be used with Custom
metrics collector, we can introduce it.
For example, I can see the value for File metrics collector, since users can write metrics into specific File during their training script, similar as for TensorFlow events.
However, for custom metrics collector, I can't see a use-cases when it might be useful with tune
function.
Any thoughts @shannonbradshaw @prakhar479 @johnugeorge @tenzen-y ?
|
||
RUN pip install kubernetes | ||
|
||
CMD ["python", "dummy-collector.py"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this container ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This container is meant for serving as a dummy metric collector container intended for e2e test.
@@ -11,8 +12,68 @@ | |||
# The default logging config. | |||
logging.basicConfig(level=logging.INFO) | |||
|
|||
def run_e2e_experiment_create_by_tune_custom_metrics_collector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need another E2E just to test this functionality.
We already run E2Es using these YAML files for various metrics collectors: https://github.com/kubeflow/katib/tree/master/examples/v1beta1/metrics-collector
I think for your feature, you just need to add unit tests for Katib Client to verify that Experiment has correct specification: https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client_test.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! should I then remove the e2e test with custom metric collector from run-e2e-tune-api.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please can you add the unit tests for your functionality: https://github.com/kubeflow/katib/blob/master/sdk/python/v1beta1/kubeflow/katib/api/katib_client_test.py ?
Signed-off-by: Prakhar Singhal <[email protected]>
Signed-off-by: Prakhar Singhal <[email protected]>
… katlib_client.py
Signed-off-by: prakhar479 [email protected]
added
custom_collector
field tometrics_collector_config
in tune to allow for users to specify custom metrics collector for exampleprometheus
fixes #2402