diff --git a/docs/modelserving/v1beta1/transformer/collocation/README.md b/docs/modelserving/v1beta1/transformer/collocation/README.md index 74245ea8b..faa9cd470 100644 --- a/docs/modelserving/v1beta1/transformer/collocation/README.md +++ b/docs/modelserving/v1beta1/transformer/collocation/README.md @@ -15,8 +15,8 @@ KServe by default deploys the Transformer and Predictor as separate services, al ## Deploy the InferenceService -Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8000 and 8081 -while, `Predictor` listens on port 8080 and 8082. `Transformer` calls `Predictor` on port 8082 via local socket. +Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC) +while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket. Deploy the `Inferenceservice` using the below command. ```bash @@ -28,24 +28,44 @@ metadata: spec: predictor: containers: - - name: kserve-container - image: kserve/custom-model-grpc:latest + - name: kserve-container # Do not change the name; This should be the predictor container + image: "pytorch/torchserve:0.9.0-cpu" args: - - --model_name=custom-model - - --grpc_port=8082 - - --http_port=8080 - - - image: kserve/image-transformer:latest - name: transformer-container # Do not change the container name + - "torchserve" + - "--start" + - "--model-store=/mnt/models/model-store" + - "--ts-config=/mnt/models/config/config.properties" + env: + - name: TS_SERVICE_ENVELOPE + value: kserve + - name: STORAGE_URI # This will trigger storage initializer; Should be only present in predictor container + value: "gs://kfserving-examples/models/torchserve/image_classifier/v1" + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 1 + memory: 1Gi + + - name: transformer-container # Do not change the container name + image: kserve/image-transformer:latest args: - - --model_name=custom-model - - --protocol=grpc-v2 - - --http_port=8000 + - --model_name=mnist + - --protocol=v1 # protocol of the predictor; used for converting the input to specific protocol supported by the predictor + - --http_port=8080 - --grpc_port=8081 - - --predictor_host=localhost:8082 + - --predictor_host=localhost:8085 # predictor listening port ports: - - containerPort: 8000 + - containerPort: 8080 protocol: TCP + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 1 + memory: 1Gi EOF ``` !!! success "Expected output" @@ -57,8 +77,18 @@ EOF Always use the transformer container name as `transformer-container`. Otherwise, the model volume is not mounted to the transformer container which may result in an error. +!!! Warning + Always use the predictor container name as `kserve-container`. Kserve internally uses this name to find out the + predictor. The storage uri should be only present in this container. If it is specified in the transformer + container the isvc creation will fail. + +!!! Note + Currently, The collocation support is limited to the custom container spec for kserve model container. + !!! Note - Currently, The collocation support is limited to the custom container spec for kserve model container. + In Serverless mode, Specifying ports for predictor will result in isvc creation failure as specifying multiple ports + is not supported by knative. Due to this limitation predictor cannot be exposed to the outside cluster. + For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471). ## Check InferenceService status ```bash @@ -82,35 +112,34 @@ Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md# ```bash SERVICE_NAME=custom-transformer-collocation -MODEL_NAME=custom-model +MODEL_NAME=mnist INPUT_PATH=@./input.json SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3) ``` You can use `curl` to send the inference request as: ```bash -curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/$MODEL_NAME/infer +curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict ``` !!! success "Expected output" ```{ .bash .no-copy } * Trying 127.0.0.1:8080... * Connected to localhost (127.0.0.1) port 8080 (#0) - > POST /v2/models/custom-model/infer HTTP/1.1 + > POST /v1/models/mnist:predict HTTP/1.1 > Host: custom-transformer-collocation.default.example.com > User-Agent: curl/7.85.0 > Accept: */* > Content-Type: application/json - > Content-Length: 105396 + > Content-Length: 427 > - * We are completely uploaded and fine * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK - < content-length: 298 + < content-length: 19 < content-type: application/json - < date: Thu, 04 May 2023 10:35:30 GMT + < date: Sat, 02 Dec 2023 09:13:16 GMT < server: istio-envoy - < x-envoy-upstream-service-time: 1273 + < x-envoy-upstream-service-time: 315 < * Connection #0 to host localhost left intact - {"model_name":"custom-model","model_version":null,"id":"d685805f-a310-4690-9c71-a2dc38085d6f","parameters":null,"outputs":[{"name":"output-0","shape":[1,5],"datatype":"FP32","parameters":null,"data":[14.975618362426758,14.036808967590332,13.966032028198242,12.252279281616211,12.086268424987793]}]} + {"predictions":[2]} ```