Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ray-project:master #2324

Merged
merged 23 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
aae9135
[tune] Remove the URI parsing code (#38697)
ericl Aug 24, 2023
82d3f35
[Data] Store serialized ParquetFileFragment in _ParquetDatasourceRead…
c21 Aug 24, 2023
2c1e0c1
[Data] Retry open files with expotential backoff (#38773)
c21 Aug 24, 2023
8ace253
[Data] Implement streamed read from Hugging Face Datasets (#38432)
scottjlee Aug 24, 2023
f081050
[core] add serve chaos test (#38700)
rynewang Aug 24, 2023
32db812
make it medium tests. (#38819)
rkooo567 Aug 24, 2023
5d56a6f
Broken down tests. (#38823)
rkooo567 Aug 24, 2023
f6c7dd9
[core] Store task cancellation errors in metadata to suppress unhandl…
stephanie-wang Aug 24, 2023
dd41535
[Serve] Relaxed input Serve config's schemas to allow extra params (#…
alexeykudinkin Aug 24, 2023
d5e324d
[train] New persistence mode: Support the `tune.run(restore="ckpt_pat…
justinvyu Aug 24, 2023
a9e4ce9
Fix test coverage lint error (#38832)
aslonnie Aug 24, 2023
2daa4e5
[train] Bump XGBoost/LightGBM versions (#38828)
krfricke Aug 24, 2023
8ed1cd1
[ci] Reduce train tests+examples parallelism (#38820)
krfricke Aug 24, 2023
692428a
[Ray 2.7 Examples][4/n]Revamp the Lightning `vicuna-13b` DeepSpeed fi…
woshiyyya Aug 24, 2023
d791c73
[serve] Add api ref and telemetry for new multi-app api (#38502)
zcin Aug 24, 2023
cc0b719
copyright: change to 2023 ray authors (#38735)
aslonnie Aug 24, 2023
d7618a0
Add whisper as a dependency for ray-ml-39 image (#38856)
can-anyscale Aug 25, 2023
f5189f4
[serve] Different smoothing factors for upscale and downscale (#38034)
zcin Aug 25, 2023
d44d294
[serve][3/X] Separate app/deployment name: metrics (#38702)
zcin Aug 25, 2023
87942ba
[train] make Trainable storage optional (#38853)
matthewdeng Aug 25, 2023
35738b8
[Serve] fix flaky grpc tests in test_cli_2 (#38841)
GeneDer Aug 25, 2023
cea281c
[Data] Restructure the user-provided optimizer rules file (#38829)
c21 Aug 25, 2023
44e5edc
[train] enable new persistence mode for minimal train tests (#38868)
matthewdeng Aug 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion .buildkite/pipeline.build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,9 @@
# Todo (krfricke): Move mosaicml to train-test-requirements.txt
- pip install "mosaicml==0.12.1"
- DOC_TESTING=1 ./ci/env/install-dependencies.sh
# TODO(scottjlee): Move datasets to train/data-test-requirements.txt
# (see https://github.com/ray-project/ray/pull/38432/)
- pip install "datasets==2.14.0"
- ./ci/env/install-horovod.sh
- ./ci/env/env_info.sh
- bazel test --config=ci $(./scripts/bazel_export_options)
Expand Down Expand Up @@ -383,6 +386,57 @@
- ray job submit --address http://localhost:8265 --runtime-env python/ray/tests/chaos/runtime_env.yaml --working-dir python/ray/tests/chaos -- python potato_passer.py --num-actors=3 --pass-times=1000 --sleep-secs=0.01


- label: ":kubernetes: :mending_heart: :ray-serve: serve chaos network delay test"
conditions: ["RAY_CI_LINUX_WHEELS_AFFECTED"]
instance_size: medium
commands:
- |
cleanup() {
if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi
kind delete cluster
}
trap cleanup EXIT
- ./ci/env/install-minimal.sh 3.8
- PYTHON=3.8 ./ci/env/install-dependencies.sh
# Specifying above somehow messes up the Ray install.
# Uninstall and re-install Ray so that we can use Ray Client.
# (Remove thirdparty_files to sidestep an issue with psutil.)
- pip uninstall -y ray && rm -rf /ray/python/ray/thirdparty_files
- pip install -e /ray/python
- echo "--- Setting up local kind cluster."
- ./ci/k8s/prep-k8s-environment.sh
- ./ci/k8s/prep-helm.sh
- echo "--- Building py38-cpu Ray image for the test."
- LINUX_WHEELS=1 ./ci/ci.sh build
- pip install -q docker
- python ci/build/build-docker-images.py --py-versions py38 --device-types cpu --build-type LOCAL --build-base
# Tag the image built in the last step. We want to be sure to distinguish the image from the real Ray nightly.
- docker tag rayproject/ray:nightly-py38-cpu ray-ci:kuberay-test
# Load the image into the kind node
- kind load docker-image ray-ci:kuberay-test
# Helm install KubeRay
- echo "--- Installing KubeRay operator and cluser."
- helm repo add kuberay https://ray-project.github.io/kuberay-helm/
- helm install kuberay-operator kuberay/kuberay-operator
- kubectl wait pod -l app.kubernetes.io/name=kuberay-operator --for=condition=Ready=True --timeout=5m
# We are in m4i.xlarge and have 4 cpus. Can't have too many nodes.
- helm install raycluster kuberay/ray-cluster --set image.repository=ray-ci --set image.tag=kuberay-test --set worker.replicas=2 --set worker.resources.limits.cpu=500m --set worker.resources.requests.cpu=500m --set head.resources.limits.cpu=500m --set head.resources.requests.cpu=500m --set head.containerEnv[0].name=RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING --set head.containerEnv[0].value=\"1\"
- kubectl wait pod -l ray.io/cluster=raycluster-kuberay --for=condition=Ready=True --timeout=5m
- kubectl port-forward --address 0.0.0.0 service/raycluster-kuberay-head-svc 8265:8265 &
# Helm install chaos-mesh
- echo "--- Installing chaos-mesh operator and CR."
- helm repo add chaos-mesh https://charts.chaos-mesh.org
- kubectl create ns chaos-mesh
- helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-mesh --set chaosDaemon.runtime=containerd --set chaosDaemon.socketPath=/run/containerd/containerd.sock --version 2.6.1
- kubectl wait pod --namespace chaos-mesh -l app.kubernetes.io/instance=chaos-mesh --for=condition=Ready=True
- echo "--- Running the script without faults"
- ray job submit --address http://localhost:8265 --runtime-env python/ray/tests/chaos/streaming_llm.yaml --working-dir python/ray/tests/chaos -- python streaming_llm.py --num_queries_per_task=100 --num_tasks=2 --num_words_per_query=100
# Now add the delay, rerun the job
- kubectl apply -f python/ray/tests/chaos/chaos_network_delay.yaml
- echo "--- Running the script with fault of networking delay"
- ray job submit --address http://localhost:8265 --runtime-env python/ray/tests/chaos/streaming_llm.yaml --working-dir python/ray/tests/chaos -- python streaming_llm.py --num_queries_per_task=100 --num_tasks=2 --num_words_per_query=100


- label: ":book: Documentation"
commands:
- export LINT=1
Expand Down Expand Up @@ -445,4 +499,6 @@
- TRAIN_MINIMAL_INSTALL=1 ./ci/env/install-minimal.sh
- ./ci/env/env_info.sh
- python ./ci/env/check_minimal_install.py
- bazel test --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=minimal python/ray/train/...
- bazel test --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=minimal
--test_env=RAY_AIR_NEW_PERSISTENCE_MODE=1
python/ray/train/...
6 changes: 4 additions & 2 deletions .buildkite/pipeline.gpu_large.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,13 @@
- DOC_TESTING=1 TRAIN_TESTING=1 TUNE_TESTING=1 ./ci/env/install-dependencies.sh
- pip install -Ur ./python/requirements/ml/dl-gpu-requirements.txt
- ./ci/env/install-horovod.sh
- ./ci/env/env_info.sh
# Test examples with newer version of `transformers`
# TODO(amogkam): Remove when https://github.com/ray-project/ray/issues/36011
# is resolved.
- pip install transformers==4.30.2
# TODO(scottjlee): Move datasets to train/data-test-requirements.txt
# (see https://github.com/ray-project/ray/pull/38432/)
- pip install transformers==4.30.2 datasets==2.14.0
- ./ci/env/env_info.sh
- bazel test --config=ci $(./scripts/bazel_export_options)
--test_tag_filters=doctest,-cpu python/ray/... doc/...

Expand Down
17 changes: 14 additions & 3 deletions .buildkite/pipeline.ml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
- label: ":steam_locomotive: Train tests and examples"
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_TRAIN_AFFECTED"]
instance_size: large
parallelism: 4
parallelism: 3
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
# Todo (krfricke): Move mosaicml to train-test-requirements.txt
Expand Down Expand Up @@ -343,7 +343,7 @@
- label: ":steam_locomotive: :floppy_disk: New persistence mode: Train tests and examples"
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_TRAIN_AFFECTED"]
instance_size: large
parallelism: 4
parallelism: 3
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
# Todo (krfricke): Move mosaicml to train-test-requirements.txt
Expand Down Expand Up @@ -454,6 +454,9 @@
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- DATA_PROCESSING_TESTING=1 ARROW_VERSION=12.* ./ci/env/install-dependencies.sh
# TODO(scottjlee): Move datasets to train/data-test-requirements.txt
# (see https://github.com/ray-project/ray/pull/38432/)
- pip install "datasets==2.14.0"
- ./ci/env/env_info.sh
- ./ci/run/run_bazel_test_with_sharding.sh --config=ci $(./ci/run/bazel_export_options) --action_env=RAY_DATA_USE_STREAMING_EXECUTOR=1 --build_tests_only --test_tag_filters=-data_integration,-doctest python/ray/data/...
- ./ci/run/run_bazel_test_with_sharding.sh --config=ci $(./ci/run/bazel_export_options) --action_env=RAY_DATA_USE_STREAMING_EXECUTOR=1 --build_tests_only --test_tag_filters=ray_data,-doctest python/ray/air/...
Expand All @@ -465,6 +468,9 @@
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- DATA_PROCESSING_TESTING=1 ARROW_VERSION=nightly ./ci/env/install-dependencies.sh
# TODO(scottjlee): Move datasets to train/data-test-requirements.txt
# (see https://github.com/ray-project/ray/pull/38432/)
- pip install "datasets==2.14.0"
- ./ci/env/env_info.sh
- ./ci/run/run_bazel_test_with_sharding.sh --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=-data_integration,-doctest python/ray/data/...
- ./ci/run/run_bazel_test_with_sharding.sh --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=ray_data,-doctest python/ray/air/...
Expand All @@ -476,6 +482,9 @@
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- DATA_PROCESSING_TESTING=1 ARROW_VERSION=12.* ./ci/env/install-dependencies.sh
# TODO(scottjlee): Move datasets to train/data-test-requirements.txt
# (see https://github.com/ray-project/ray/pull/38432/)
- pip install "datasets==2.14.0"
- ./ci/env/env_info.sh
- ./ci/run/run_bazel_test_with_sharding.sh --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=-data_integration,-doctest python/ray/data/...
- ./ci/run/run_bazel_test_with_sharding.sh --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=ray_data,-doctest python/ray/air/...
Expand Down Expand Up @@ -510,7 +519,9 @@
- DOC_TESTING=1 INSTALL_HOROVOD=1 ./ci/env/install-dependencies.sh
# TODO (shrekris-anyscale): Remove transformers after core transformer
# requirement is upgraded
- pip install "transformers==4.30.2"
# TODO(scottjlee): Move datasets to train/data-test-requirements.txt
# (see https://github.com/ray-project/ray/pull/38432/)
- pip install "transformers==4.30.2" "datasets==2.14.0"
- ./ci/env/env_info.sh
- bazel test --config=ci $(./ci/run/bazel_export_options) --build_tests_only --test_tag_filters=-timeseries_libs,-external,-ray_air,-gpu,-post_wheel_build,-doctest,-datasets_train,-highly_parallel doc/...

Expand Down
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright {yyyy} {name of copyright owner}
Copyright 2023 Ray Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -447,4 +447,4 @@ Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.
28 changes: 14 additions & 14 deletions dashboard/modules/metrics/dashboards/serve_dashboard_panels.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,8 @@
unit="replicas",
targets=[
Target(
expr="sum(ray_serve_deployment_replica_healthy{{{global_filters}}}) by (deployment)",
legend="{{deployment}}",
expr="sum(ray_serve_deployment_replica_healthy{{{global_filters}}}) by (application, deployment)",
legend="{{application, deployment}}",
),
],
grid_pos=GridPos(0, 2, 8, 8),
Expand All @@ -152,8 +152,8 @@
unit="qps",
targets=[
Target(
expr='sum(rate(ray_serve_deployment_request_counter{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (deployment)',
legend="{{deployment}}",
expr='sum(rate(ray_serve_deployment_request_counter{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (application, deployment)',
legend="{{application, deployment}}",
),
],
grid_pos=GridPos(8, 2, 8, 8),
Expand All @@ -165,8 +165,8 @@
unit="qps",
targets=[
Target(
expr='sum(rate(ray_serve_deployment_error_counter{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (deployment)',
legend="{{deployment}}",
expr='sum(rate(ray_serve_deployment_error_counter{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (application, deployment)',
legend="{{application, deployment}}",
),
],
grid_pos=GridPos(16, 2, 8, 8),
Expand All @@ -178,8 +178,8 @@
unit="ms",
targets=[
Target(
expr='histogram_quantile(0.5, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (deployment, le))',
legend="{{deployment}}",
expr='histogram_quantile(0.5, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (application, deployment, le))',
legend="{{application, deployment}}",
),
Target(
expr='histogram_quantile(0.5, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (le))',
Expand All @@ -197,8 +197,8 @@
unit="ms",
targets=[
Target(
expr='histogram_quantile(0.9, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (deployment, le))',
legend="{{deployment}}",
expr='histogram_quantile(0.9, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (application, deployment, le))',
legend="{{application, deployment}}",
),
Target(
expr='histogram_quantile(0.9, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (le))',
Expand All @@ -216,8 +216,8 @@
unit="ms",
targets=[
Target(
expr='histogram_quantile(0.99, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (deployment, le))',
legend="{{deployment}}",
expr='histogram_quantile(0.99, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (application, deployment, le))',
legend="{{application, deployment}}",
),
Target(
expr='histogram_quantile(0.99, sum(rate(ray_serve_deployment_processing_latency_ms_bucket{{route=~"$Route",route!~"/-/.*",{global_filters}}}[5m])) by (le))',
Expand All @@ -235,8 +235,8 @@
unit="requests",
targets=[
Target(
expr="sum(ray_serve_deployment_queued_queries{{{global_filters}}}) by (deployment)",
legend="{{deployment}}",
expr="sum(ray_serve_deployment_queued_queries{{{global_filters}}}) by (application, deployment)",
legend="{{application, deployment}}",
),
],
fill=0,
Expand Down
Loading
Loading