Update tranformer collocation docs for specifying storage uri (#323)

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
kserve · Dec 25, 2023 · 0b9c87c · 0b9c87c
1 parent 6ef91ca
commit 0b9c87c
Showing 1 changed file with 54 additions and 25 deletions.
diff --git a/docs/modelserving/v1beta1/transformer/collocation/README.md b/docs/modelserving/v1beta1/transformer/collocation/README.md
@@ -15,8 +15,8 @@ KServe by default deploys the Transformer and Predictor as separate services, al
 
 ## Deploy the InferenceService
 
-Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8000 and 8081 
-while, `Predictor` listens on port 8080 and 8082. `Transformer` calls `Predictor` on port 8082 via local socket. 
+Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC) 
+while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket. 
 Deploy the `Inferenceservice` using the below command.
 
 ```bash
@@ -28,24 +28,44 @@ metadata:
 spec:
   predictor:
     containers:
-      - name: kserve-container
-        image: kserve/custom-model-grpc:latest
+      - name: kserve-container        # Do not change the name; This should be the predictor container
+        image: "pytorch/torchserve:0.9.0-cpu"
         args:
-          - --model_name=custom-model
-          - --grpc_port=8082
-          - --http_port=8080
-
-      - image: kserve/image-transformer:latest
-        name: transformer-container    # Do not change the container name
+          - "torchserve"
+          - "--start"
+          - "--model-store=/mnt/models/model-store"
+          - "--ts-config=/mnt/models/config/config.properties"
+        env:
+          - name: TS_SERVICE_ENVELOPE
+            value: kserve
+          - name: STORAGE_URI    # This will trigger storage initializer; Should be only present in predictor container
+            value: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
+        resources:
+          requests:
+            cpu: 100m
+            memory: 256Mi
+          limits:
+            cpu: 1
+            memory: 1Gi
+
+      - name: transformer-container    # Do not change the container name
+        image: kserve/image-transformer:latest
         args:
-          - --model_name=custom-model
-          - --protocol=grpc-v2
-          - --http_port=8000
+          - --model_name=mnist
+          - --protocol=v1    # protocol of the predictor; used for converting the input to specific protocol supported by the predictor
+          - --http_port=8080
           - --grpc_port=8081
-          - --predictor_host=localhost:8082
+          - --predictor_host=localhost:8085      # predictor listening port
         ports:
-          - containerPort: 8000
+          - containerPort: 8080
             protocol: TCP
+        resources:
+          requests:
+            cpu: 100m
+            memory: 256Mi
+          limits:
+            cpu: 1
+            memory: 1Gi
 EOF
 ```
 !!! success "Expected output"
@@ -57,8 +77,18 @@ EOF
     Always use the transformer container name as `transformer-container`. Otherwise, the model volume is not mounted to the transformer 
     container which may result in an error.
 
+!!! Warning
+    Always use the predictor container name as `kserve-container`. Kserve internally uses this name to find out the
+    predictor. The storage uri should be only present in this container. If it is specified in the transformer 
+    container the isvc creation will fail.
+
+!!! Note
+    Currently, The collocation support is limited to the custom container spec for kserve model container.
+
 !!! Note
-     Currently, The collocation support is limited to the custom container spec for kserve model container.
+    In Serverless mode, Specifying ports for predictor will result in isvc creation failure as specifying multiple ports 
+    is not supported by knative. Due to this limitation predictor cannot be exposed to the outside cluster. 
+    For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
 
 ## Check InferenceService status
 ```bash
@@ -82,35 +112,34 @@ Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#
 
 ```bash
 SERVICE_NAME=custom-transformer-collocation
-MODEL_NAME=custom-model
+MODEL_NAME=mnist
 INPUT_PATH=@./input.json
 SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
 ```
 You can use `curl` to send the inference request as:
 ```bash
-curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/$MODEL_NAME/infer
+curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
 ```
 
 !!! success "Expected output"
     ```{ .bash .no-copy }
     *   Trying 127.0.0.1:8080...
     * Connected to localhost (127.0.0.1) port 8080 (#0)
-    > POST /v2/models/custom-model/infer HTTP/1.1
+    > POST /v1/models/mnist:predict HTTP/1.1
     > Host: custom-transformer-collocation.default.example.com
     > User-Agent: curl/7.85.0
     > Accept: */*
     > Content-Type: application/json
-    > Content-Length: 105396
+    > Content-Length: 427
     > 
-    * We are completely uploaded and fine
     * Mark bundle as not supporting multiuse
     < HTTP/1.1 200 OK
-    < content-length: 298
+    < content-length: 19
     < content-type: application/json
-    < date: Thu, 04 May 2023 10:35:30 GMT
+    < date: Sat, 02 Dec 2023 09:13:16 GMT
     < server: istio-envoy
-    < x-envoy-upstream-service-time: 1273
+    < x-envoy-upstream-service-time: 315
     < 
     * Connection #0 to host localhost left intact
-    {"model_name":"custom-model","model_version":null,"id":"d685805f-a310-4690-9c71-a2dc38085d6f","parameters":null,"outputs":[{"name":"output-0","shape":[1,5],"datatype":"FP32","parameters":null,"data":[14.975618362426758,14.036808967590332,13.966032028198242,12.252279281616211,12.086268424987793]}]}
+    {"predictions":[2]}
     ```