You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the issues and found no similar issues.
KubeRay Component
ray-operator, Others
What happened + What you expected to happen
In ray-service.sample.yaml, the serveConfigV2 defines deployments for MangoStand, OrangeStand, and PearStand.
If two of these deployments are removed (e.g., keeping only MangoStand and FruitMarket), running kubectl get rayservice shows the state as WaitForServeDeploymentReady, and the service does not reach a ready state.
However, if only one deployment is removed (e.g., keeping two of the three), the service works as expected.
Reproduction script
Edit the ray-service.sample.yaml file to remove two of the three deployments in serveConfigV2 (e.g., keep only MangoStand).
Apply the updated ray-service.sample.yaml using kubectl apply -f ray-service.sample.yaml.
Run kubectl get rayservice and observe the status remaining in WaitForServeDeploymentReady
# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1
kind: RayService
metadata:
name: rayservice-sample
spec:
# serveConfigV2 takes a yaml multi-line scalar, which should be a Ray Serve multi-application config. See https://docs.ray.io/en/latest/serve/multi-app.html.
serveConfigV2: |
applications:
- name: fruit_app
import_path: fruit.deployment_graph
route_prefix: /fruit
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
deployments:
- name: MangoStand
num_replicas: 2
max_replicas_per_node: 1
user_config:
price: 3
ray_actor_options:
num_cpus: 0.1
- name: FruitMarket
num_replicas: 1
ray_actor_options:
num_cpus: 0.1
- name: math_app
import_path: conditional_dag.serve_dag
route_prefix: /calc
runtime_env:
working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
deployments:
- name: Adder
num_replicas: 1
user_config:
increment: 3
ray_actor_options:
num_cpus: 0.1
- name: Multiplier
num_replicas: 1
user_config:
factor: 5
ray_actor_options:
num_cpus: 0.1
- name: Router
num_replicas: 1
rayClusterConfig:
rayVersion: '2.9.0' # should match the Ray version in the image of the containers
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams:
dashboard-host: '0.0.0.0'
#pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 2
memory: 4Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265 # Ray dashboard
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 5
# logical group name, for this called small-group, also can be functional
groupName: small-group
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: rayproject/ray:2.9.0
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","ray stop"]
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "500m"
memory: "2Gi"
Anything else
We need to investigate why removing two deployments causes the issue while removing only one deployment does not. It seems like there might be a threshold or configuration issue in serveConfigV2.
Are you willing to submit a PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
CheyuWu
changed the title
[Bug] Partial Removal of Deployments in ray-service.sample.yaml's ServeConfigV2 Causes WaitForServeDeploymentReady State
[RayService][Bug] Partial Removal of Deployments in ray-service.sample.yaml's ServeConfigV2 Causes WaitForServeDeploymentReady State
Nov 20, 2024
Search before asking
KubeRay Component
ray-operator, Others
What happened + What you expected to happen
In
ray-service.sample.yaml
, theserveConfigV2
definesdeployments
forMangoStand
,OrangeStand
, andPearStand
.MangoStand
andFruitMarket
), runningkubectl get rayservice
shows the state asWaitForServeDeploymentReady
, and the service does not reach a ready state.Reproduction script
Edit the
ray-service.sample.yaml
file to remove two of the three deployments inserveConfigV2
(e.g., keep onlyMangoStand
).Apply the updated
ray-service.sample.yaml
usingkubectl apply -f ray-service.sample.yaml
.Run
kubectl get rayservice
and observe the status remaining inWaitForServeDeploymentReady
Anything else
We need to investigate why removing two deployments causes the issue while removing only one deployment does not. It seems like there might be a threshold or configuration issue in serveConfigV2.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: