Skip to content

Commit

Permalink
Update tranformer collocation docs for specifying storage uri (#323)
Browse files Browse the repository at this point in the history
Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
  • Loading branch information
sivanantha321 authored Dec 25, 2023
1 parent 6ef91ca commit 0b9c87c
Showing 1 changed file with 54 additions and 25 deletions.
79 changes: 54 additions & 25 deletions docs/modelserving/v1beta1/transformer/collocation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ KServe by default deploys the Transformer and Predictor as separate services, al

## Deploy the InferenceService

Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8000 and 8081
while, `Predictor` listens on port 8080 and 8082. `Transformer` calls `Predictor` on port 8082 via local socket.
Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
Deploy the `Inferenceservice` using the below command.

```bash
Expand All @@ -28,24 +28,44 @@ metadata:
spec:
predictor:
containers:
- name: kserve-container
image: kserve/custom-model-grpc:latest
- name: kserve-container # Do not change the name; This should be the predictor container
image: "pytorch/torchserve:0.9.0-cpu"
args:
- --model_name=custom-model
- --grpc_port=8082
- --http_port=8080
- image: kserve/image-transformer:latest
name: transformer-container # Do not change the container name
- "torchserve"
- "--start"
- "--model-store=/mnt/models/model-store"
- "--ts-config=/mnt/models/config/config.properties"
env:
- name: TS_SERVICE_ENVELOPE
value: kserve
- name: STORAGE_URI # This will trigger storage initializer; Should be only present in predictor container
value: "gs://kfserving-examples/models/torchserve/image_classifier/v1"
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1
memory: 1Gi
- name: transformer-container # Do not change the container name
image: kserve/image-transformer:latest
args:
- --model_name=custom-model
- --protocol=grpc-v2
- --http_port=8000
- --model_name=mnist
- --protocol=v1 # protocol of the predictor; used for converting the input to specific protocol supported by the predictor
- --http_port=8080
- --grpc_port=8081
- --predictor_host=localhost:8082
- --predictor_host=localhost:8085 # predictor listening port
ports:
- containerPort: 8000
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1
memory: 1Gi
EOF
```
!!! success "Expected output"
Expand All @@ -57,8 +77,18 @@ EOF
Always use the transformer container name as `transformer-container`. Otherwise, the model volume is not mounted to the transformer
container which may result in an error.

!!! Warning
Always use the predictor container name as `kserve-container`. Kserve internally uses this name to find out the
predictor. The storage uri should be only present in this container. If it is specified in the transformer
container the isvc creation will fail.

!!! Note
Currently, The collocation support is limited to the custom container spec for kserve model container.

!!! Note
Currently, The collocation support is limited to the custom container spec for kserve model container.
In Serverless mode, Specifying ports for predictor will result in isvc creation failure as specifying multiple ports
is not supported by knative. Due to this limitation predictor cannot be exposed to the outside cluster.
For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).

## Check InferenceService status
```bash
Expand All @@ -82,35 +112,34 @@ Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#

```bash
SERVICE_NAME=custom-transformer-collocation
MODEL_NAME=custom-model
MODEL_NAME=mnist
INPUT_PATH=@./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```
You can use `curl` to send the inference request as:
```bash
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/$MODEL_NAME/infer
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
```

!!! success "Expected output"
```{ .bash .no-copy }
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /v2/models/custom-model/infer HTTP/1.1
> POST /v1/models/mnist:predict HTTP/1.1
> Host: custom-transformer-collocation.default.example.com
> User-Agent: curl/7.85.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 105396
> Content-Length: 427
>
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 298
< content-length: 19
< content-type: application/json
< date: Thu, 04 May 2023 10:35:30 GMT
< date: Sat, 02 Dec 2023 09:13:16 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 1273
< x-envoy-upstream-service-time: 315
<
* Connection #0 to host localhost left intact
{"model_name":"custom-model","model_version":null,"id":"d685805f-a310-4690-9c71-a2dc38085d6f","parameters":null,"outputs":[{"name":"output-0","shape":[1,5],"datatype":"FP32","parameters":null,"data":[14.975618362426758,14.036808967590332,13.966032028198242,12.252279281616211,12.086268424987793]}]}
{"predictions":[2]}
```

0 comments on commit 0b9c87c

Please sign in to comment.