Skip to content

Latest commit

 

History

History
75 lines (66 loc) · 3.5 KB

README.md

File metadata and controls

75 lines (66 loc) · 3.5 KB

Using Deployment Scripts

The scripts require kubectl and setting the context to use the namespace where ModelMesh Serving is installed, usually modelmesh-serving. Below are examples of creating/removing models from the ModelMesh Serving instance using the included shell scripts.

cd multi_model_tester

Deploying Models

# Deploy 5000 Mnist Onnx models named mnist-onnx-1 to mnist-onnx-5000 at concurrency of 10
./deployNpredictors.sh 10 mnist-onnx 1 5000 deploy_1mnist_onnx_predictor.sh
# Deploy 1000 simple string models named simple-string-tf-1000 to simple-string-2000 at concurrency of 30
./deployNpredictors.sh 30 simple-string-tf 1000 2000 deploy_1simple_string_tf_predictor.sh

Deleting Models

# Remove Mnist Onnx models named mnist-onnx-1 to mnist-onnx-5000 at concurrency of 10
./rmNpredictors.sh 10 mnist-onnx 1 5000 rm_1mnist_onnx_predictor.sh
# Remove 1000 simple string models named simple-string-tf-1000 to simple-string-2000 at concurrency of 30
./rmNpredictors.sh 30 simple-string-tf 1000 2000 rm_1simple_string_tf_predictor.sh

Using the multi_model_tester

The multi_model_tester is written in Go and utilizes a modified Trunks library that uses round_robbin loadbalancer policy and targets dns:///modelmesh-serving:8033.

The tester generates a mix of up to six heterogeneous model inference requests to stress the Inference Request gRPC API of the ModelMesh Serving instance.

Build:

cd multi_model_tester
go build .

Usage:

./multi_model_test -h

Usage of ./multi_model_test:
  -cp uint
    	Number of connections to create. Default: 1 (default 1)
  -debug
    	Sends 1 request and view response. Default: false
  -dur int
    	Test duration in seconds. Default: 1 (default 1)
  -ma string
    	List of different types models separate by space. Available options: SimpleStringTF, MnistSklearn, MushroomXgboost, CifarPytorch, MushroomLightgbm, MnistOnnx
  -npm int
    	Number of model name 1 to npm per model to generate (default 1)
  -qps int
    	Constant Queries Per Second to hold. Default: 1 (default 1)
  -u string
    	Inference Server URL. Default: localhost:8033 (default "localhost:8033")
  -wp uint
    	Number of worker pool. Default: 1 (default 1)

Examples:

The following examples use the tester from inside the same namespace to target the kube-dns address and port corresponding to the ModelMesh Serving service.

Target the Simple-String models named simple-string-tf-1 to simple-string-tf-2000 at 1000 queries for 60 seconds.

./multi_model_test -u dns:///modelmesh-serving:8033 -npm 2000 -ma "SimpleStringTF" -qps 1000 -dur 60

Target the Simple-String and MnistOnnx models named simple-string-tf-1 to simple-string-tf-2000 and mnist-onnx-1 to mnist-onnx-2000 at 1000 queries for 60 seconds.

./multi_model_test -u dns:///modelmesh-serving:8033 -npm 2000 -ma "SimpleStringTF MnistOnnx" -qps 1000 -dur 60

Sometimes, it is useful to verify whether all the models can respond. In debug mode, the tester sends inference requests to Simple-String and MnistOnnx models named simple-string-tf-1 to simple-string-tf-2000 and mnist-onnx-1 to mnist-onnx-2000 sequentially and print the response.

./multi_model_test -u dns:///modelmesh-serving:8033 -npm 2000 -ma "SimpleStringTF MnistOnnx" -debug