diff --git a/docs/get_started/first_isvc.md b/docs/get_started/first_isvc.md index afd3d8ece..9c9f1504d 100644 --- a/docs/get_started/first_isvc.md +++ b/docs/get_started/first_isvc.md @@ -189,6 +189,22 @@ Depending on your setup, use one of the following commands to curl the `Inferenc curl -v -H "Content-Type: application/json" http://sklearn-iris.kserve-test/v1/models/sklearn-iris:predict -d @./iris-input.json ``` +=== "Inference Python Client" + + If you want to use InferenceClient to perform the inference, you can follow the below example + ```python + from kserve import RESTConfig, InferenceRESTClient + + config = RESTConfig(protocol="v1", retries=5, timeout=30) + client = InferenceRESTClient(config) + base_url = "http://sklearn-iris.kserve-test" + data = {"instances": [[6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6]]} + model_name = "sklearn-iris" + result = await client.infer(base_url, data, model_name=model_name) + print(result) + ``` + + You should see two predictions returned (i.e. `{"predictions": [1, 1]}`). Both sets of data points sent for inference correspond to the flower with index `1`. In this case, the model predicts that both flowers are "Iris Versicolour". diff --git a/docs/inference_client/doc.md b/docs/inference_client/doc.md new file mode 100644 index 000000000..f01f3e954 --- /dev/null +++ b/docs/inference_client/doc.md @@ -0,0 +1,19 @@ +### Inference REST Client API Reference + +Class | Method | Description +------------ | ------------- | ------------- +[InferenceRESTClient](inference_rest_client.md) | [infer](inference_rest_client.md#infer) | Runs asynchronous inference using the supplied data. | +[InferenceRESTClient](inference_rest_client.md) | [explain](inference_rest_client.md#explain) | Runs asynchronous explanation using the supplied data. | +[InferenceRESTClient](inference_rest_client.md) | [is_server_ready](inference_rest_client.md#is_server_ready) | Checks if the inference server is ready. | +[InferenceRESTClient](inference_rest_client.md) | [is_server_live](inference_rest_client.md#is_server_live) | Checks if the inference server is live. | +[InferenceRESTClient](inference_rest_client.md) | [is_model_ready](inference_rest_client.md#is_model_ready) | Checks if the specified model is ready. | + + +### Inference GRPC Client API Reference + +Class | Method | Description +------------ | ------------- | ------------- +[InferenceGRPCClient](inference_grpc_client.md) | [infer](inference_grpc_client.md#infer) | Runs asynchronous inference using the supplied data. | +[InferenceGRPCClient](inference_grpc_client.md) | [is_server_ready](inference_grpc_client.md#is_server_ready) | Checks if the inference server is ready. | +[InferenceGRPCClient](inference_grpc_client.md) | [is_server_live](inference_grpc_client.md#is_server_live) | Checks if the inference server is live. | +[InferenceGRPCClient](inference_grpc_client.md) | [is_model_ready](inference_grpc_client.md#is_model_ready) | Checks if the specified model is ready. | diff --git a/docs/inference_client/inference_grpc_client.md b/docs/inference_client/inference_grpc_client.md new file mode 100644 index 000000000..fb5c4e440 --- /dev/null +++ b/docs/inference_client/inference_grpc_client.md @@ -0,0 +1,214 @@ +# Inference GRPC Client + +> InferenceGRPCClient(url, verbose=None, use_ssl=None, root_certificates=None, private_key=None, certificate_chain=None, creds=None, channel_args=None, timeout=60) + +This asynchronous client provides methods to communicate with an inference server using gRPC protocol. This feature is currently in alpha and may be subject to change. + +!!! note + This client uses a default retry config. To override, explicitly provide the 'method_config' in channel + options or to disable retry set the channel option ("grpc.enable_retries", 0). + ```json + { + "methodConfig": [ + { + # Apply retry to all methods + "name": [{}], + "retryPolicy": { + "maxAttempts": 3, + "initialBackoff": "0.1s", + "maxBackoff": "1s", + "backoffMultiplier": 2, + "retryableStatusCodes": ["UNAVAILABLE"], + }, + } + ] + } + + ``` + +### Parameters + +parameter | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +url | str | Inference server url as a string | required +verbose | bool | A boolean to enable verbose logging. Defaults to False | optional +use_ssl | bool | A boolean value indicating whether to use an SSL-enabled channel (True) or not (False). If creds provided the client will use SSL-enabled channel regardless of the specified value. | optional +root_certificates | str | Path to the PEM-encoded root certificates file as a string, or None to retrieve them from a default location chosen by gRPC runtime. If creds provided this will be ignored. | optional +private_key | str | Path to the PEM-encoded private key file as a string or None if no private key should be used. If creds provided this will be ignored | optional +certificate_chain | str | Path to the PEM-encoded certificate chain file as a string or None if no certificate chain should be used. If creds provided this will be ignored. | optional +creds | grpc.ChannelCredentials | A ChannelCredentials instance for secure channel communication. | optional +channel_args | List[Tuple[str, Any]] | An list of key-value pairs (channel_arguments in gRPC Core runtime) to configure the channel | optional +timeout | float | The maximum end-to-end time, in seconds, the request is allowed to take. By default, client timeout is 60 seconds. To disable timeout explicitly set it to 'None' | optional + +The APIs for InferenceGRPCClient are as following: + +Class | Method | Description +------------ | ------------- | ------------- +InferenceGRPCClient | [infer](#infer) | Runs asynchronous inference using the supplied data. | +InferenceGRPCClient | [is_server_ready](#is_server_ready) | Checks if the inference server is ready. | +InferenceGRPCClient | [is_server_live](#is_server_live) | Checks if the inference server is live. | +InferenceGRPCClient | [is_model_ready](#is_model_ready) | Checks if the specified model is ready. | + +## infer() + +>infer(infer_request, timeout=USE_CLIENT_DEFAULT, headers=None) ~async~ + +This method sends an inference request to the server and returns the inference response. +It supports asynchronous execution and allows for optional timeout and headers customization. + +### Example + +```python +from kserve import InferenceGRPCClient + +client = InferenceGRPCClient(url="https://localhost:443") +infer_request = InferRequest(...) +headers = [("header-key", "header-value")] + +response = await inference_client.infer(infer_request, timeout, headers) +print(response) +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +infer_request | InferRequest | Inference input data as InferRequest or ModelInferRequest object | required +timeout | float | The maximum end-to-end time, in seconds, the request is allowed to take. The default value is 60 seconds. To disable timeout explicitly set it to 'None'. This will override the client's timeout. Defaults to USE_CLIENT_DEFAULT. | optional +headers | Union[grpc.aio.Metadata, Sequence[Tuple[str, str]], None] | Additional headers to be transmitted with the request. Defaults to None. | optional + +### Returns + +Return Type: `InferResponse` + +Inference output as ModelInferResponse + +### Raises + +`InvalidInput`: If the input format is invalid. + +`grpc.RpcError`: For non-OK-status response. + + +## is_server_ready() + +> is_server_ready(timeout=USE_CLIENT_DEFAULT, headers=None) ~async~ + +Check if the inference server is ready. + +This asynchronous method sends a readiness request to the inference server and returns a boolean indicating +whether the server is ready to handle requests. + +### Example + +```python +from kserve import InferenceGRPCClient + +client = InferenceClient(...) +is_ready = await client.is_server_ready(timeout=30.0) +if is_ready: + print("Server is ready to handle requests.") +else: + print("Server is not ready.") +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +timeout | float | The maximum time, in seconds, allowed for the request to complete. The default value is 60 seconds. To disable the timeout, explicitly set it to 'None'. This value will override the client's default timeout if specified. | optional +headers | Union[grpc.aio.Metadata, Sequence[Tuple[str, str]]]Additional headers to include in the request. This can be useful for passing metadata such as authentication tokens. | optional + + +### Returns + +Return Type: `bool` + +True if the server is ready, False otherwise + +### Raises + +`grpc.RpcError`: If the server responds with a non-OK status, an RPCError is raised. This can occur due to network +issues, server-side errors. + +## is_server_live() + +> is_server_live(timeout=USE_CLIENT_DEFAULT, headers=None) ~async~ + +Check if the inference server is live. + +This asynchronous method sends a request to the inference server to check its liveness status. + +### Example + +```python +from kserve import InferenceGRPCClient + +client = InferenceClient(...) +is_live = await client.is_server_live(timeout=30.0) +if is_live: + print("Server is live") +else: + print("Server is not live") +``` + +### Parameters + + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +timeout | float | The maximum time, in seconds, allowed for the request to complete. The default value is 60 seconds. To disable the timeout, explicitly set it to 'None'. This value will override the client's default timeout if specified. | optional +headers | Union[grpc.aio.Metadata, Sequence[Tuple[str, str]]]Additional headers to include in the request. This can be useful for passing metadata such as authentication tokens. | optional + + +### Returns + +Return Type: `bool` + +True if the server is live, False if the server is not live + +### Raises + +`grpc.RpcError`: If the server responds with a non-OK status, an RPCError is raised. This can occur due to network +issues, server-side errors. + +## is_model_ready() + +> is_model_ready() + +Check if the specified model is ready. + +This asynchronous method sends a request to check the readiness of a model by its name. + +### Example + +```python +from kserve import InferenceGRPCClient + +client = InferenceClient(...) +is_ready = await client.is_model_ready("my_model") +if is_ready: + print("Model is ready for inference.") +else: + print("Model is not ready.") +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +model_name | string | The name of the model to check for readiness. | required +timeout | float | The maximum time, in seconds, allowed for the request to complete. The default value is 60 seconds. To disable the timeout, explicitly set it to 'None'. This value will override the client's default timeout if specified. | optional +headers | Union[grpc.aio.Metadata, Sequence[Tuple[str, str]]]Additional headers to include in the request. This can be useful for passing metadata such as authentication tokens. | optional + +### Returns + +Return Type: `bool` + +`True` if the model is ready, `False` if the model is not ready + +### Raises + +`grpc.RpcError`: If the server responds with a non-OK status, an RPCError is raised. This can occur due to network +issues, server-side errors. + diff --git a/docs/inference_client/inference_rest_client.md b/docs/inference_client/inference_rest_client.md new file mode 100644 index 000000000..fa9aaeb95 --- /dev/null +++ b/docs/inference_client/inference_rest_client.md @@ -0,0 +1,264 @@ +# Inference REST Client + +> InferenceRESTClient(config: RESTConfig = None) + +InferenceRESTClient is designed to interact with inference servers that follow the V1 and V2 protocols for model serving. +It provides methods to perform inference, explanation, and health checks on the server and models. This feature is currently in alpha and may be subject to change. + +parameter | Description +------------ | ------------- +config([RESTConfig](#restconfig)) | Configuration for the REST client, including server protocol, timeout settings, and authentication. | + +Initializes the InferenceRESTClient with the given configuration. If no configuration is provided, a default RESTConfig is used. + +### RESTConfig + +> RESTConfig(transport=None, protocol=v1, retries=3, http2=False, timeout=60, cert=None, verify=True, auth=None, verbose=False) + +Configuration class for REST client settings. + +parameter | Type | Description +------------ | ------------- | ------------- +transport | httpx.AsyncBaseTransport | Custom transport for HTTP requests. | +protocol | Union[str, PredictorProtocol] | Protocol version v1 or v2, default is "v1". | +retries | int | Number of retries for HTTP requests, default is 3. | +http2 | bool | Whether to use HTTP/2, default is False. | +timeout | (Union[float, None, tuple, httpx.Timeout]) | Timeout setting for HTTP requests, default is 60 seconds. | +cert | | SSL certificate to use for the requests. | +verify | (Union[str, bool, ssl.SSLContext]) | SSL verification setting, default is True. | +auth | | Authentication credentials for HTTP requests. | +verbose | bool | Whether to enable verbose logging, default is False. | + + +The APIs for InferenceRestClient are as following: + +Class | Method | Description +------------ | ------------- | ------------- +InferenceRESTClient | [infer](#infer) | Runs asynchronous inference using the supplied data. | +InferenceRESTClient | [explain](#explain) | Runs asynchronous explanation using the supplied data. | +InferenceRESTClient | [is_server_ready](#is_server_ready) | Checks if the inference server is ready. | +InferenceRESTClient | [is_server_live](#is_server_live) | Checks if the inference server is live. | +InferenceRESTClient | [is_model_ready](#is_model_ready) | Checks if the specified model is ready. | + +## infer() + +> infer(base_url, data, model_name=None, headers=None, response_headers=None, is_graph_endpoint=False, timeout=USE_CLIENT_DEFAULT) ~async~ + +Perform inference by sending a request to the specified model endpoint. + +### Example + +```python +from kserve import RESTConfig, InferenceRESTClient + +config = RESTConfig(protocol="v2", retries=5, timeout=30) +client = InferenceRESTClient(config) +base_url = "https://example.com:443" +data = {"inputs": [{"name": "input_1", "shape": [1, 3], "datatype": "FP32", "data": [1.0, 2.0, 3.0]}]} +model_name = "example_model" +headers = {"Authorization": "Bearer YOUR_TOKEN"} +response_headers = {} +result = await client.infer(base_url, data, model_name, headers, response_headers) +print(result) + +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +base_url | Union[httpx.URL, str] | The base URL of the inference server | Required +data | Union[InferRequest, dict] | Input data as InferRequest object | Required +model_name | str | The name of the model to be used for inference | Required +headers | Mapping[str, str] | HTTP headers to include when sending request | +response_headers | Dict[str, str] | Dictionary to store response headers | +is_graph_endpoint | bool | Flag indicating if the endpoint is a graph endpoint. Default value is False | +timeout | Union[float, None, tuple, httpx.Timeout] | Timeout configuration for the request. Default value is 60 seconds | + +### Returns + +Return Type: `Union[InferResponse, Dict]` + +The inference response, either as an InferResponse object or a dictionary + +### Raises + +`ValueError`: If model_name is None and not using a graph endpoint. + +`UnsupportedProtocol`: If the protocol specified in the configuration is not supported. + +`HTTPStatusError`: If the response status code indicates an error. + +## explain() + +> explain(base_url, model_name, data, headers=None, timeout) ~async~ + +Sends an asynchronous request to the model server to get an explanation for the given input data. Only supports V1 protocol. + +### Example + +```python +from kserve import RESTConfig, InferenceRESTClient + +config = RESTConfig(protocol="v2", retries=5, timeout=30) +client = InferenceRESTClient(config) +base_url = "https://example.com:443" +model_name = "my_model" +data = {"instances": [[1.0, 2.0, 5.0]]} +headers = {"Authorization": "Bearer my_token"} + +result = await client.explain(base_url, model_name, data, headers=headers) +print(result) +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +base_url | Union[httpx.URL, str] | The base URL of the model server | +model_name | str | The name of the model for which to get an explanation | +data | dict | The input data for the model | +headers | Mapping[str, str] | headers to include in the request | +timeout | Union[float, None, tuple, httpx.Timeout] | Timeout configuration for the request | + +### Returns + +Return Type: `dict` + +The explanation response from the model server as a dict. + +### Raises + +`UnsupportedProtocol`: If the protocol specified in the configuration is not supported. + +`HTTPStatusError`: If the response status code indicates an error. + + +## is_server_ready() + +> is_server_ready(base_url, headers=None, timeout=None) ~async~ + +Check if the inference server is ready. Only supports V2 protocol. + +### Example + +```python +from kserve import RESTConfig, InferenceRESTClient + +config = RESTConfig(protocol="v2", retries=5, timeout=30) + +client = InferenceClient(config) +is_ready = await client.is_server_ready("https://example.com:443") +if is_ready: + print("Server is ready") +else: + print("Server is not ready") +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +base_url | Union[httpx.URL, str] | The base URL of the model server | +headers | Mapping[str, str] | headers to include in the request | +timeout | Union[float, None, tuple, httpx.Timeout] | Timeout configuration for the request | + +### Returns + +Return Type: `bool` + +`True`: if the Inference Server is ready +`False`: if the Inference Server is not ready + +### Raises + +`UnsupportedProtocol`: If the protocol specified in the configuration is not supported. + +`HTTPStatusError`: If the response status code indicates an error. + +## is_server_live() + +> is_server_live(base_url, headers=None, timeout=USE_CLIENT_DEFAULT) ~async~ + +Return the liveness status of the inference server. + +### Example + +```python +from kserve import RESTConfig, InferenceRESTClient + +config = RESTConfig(protocol="v2", retries=5, timeout=30) + +client = InferenceClient(config) +is_live = await client.is_server_live("https://example.com:443") +if is_live: + print("Server is live") +else: + print("Server is not live") +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +base_url | Union[httpx.URL, str] | The base URL of the model server | +headers | Mapping[str, str] | headers to include in the request | +timeout | Union[float, None, tuple, httpx.Timeout] | Timeout configuration for the request | + +### Returns + +Return Type: `bool` + +`True`: if the Inference Server is live +`False`: if the Inference Server is not live + +### Raises + +`UnsupportedProtocol`: If the protocol specified in the configuration is not supported. + +`HTTPStatusError`: If the response status code indicates an error. + +## is_model_ready() + +> is_model_ready(base_url, model_name, headers=None, timeout=USE_CLIENT_DEFAULT) ~async~ + +Return the readiness status of the specified model. + +### Example + +```python +from kserve import RESTConfig, InferenceRESTClient + +config = RESTConfig(protocol="v2", retries=5, timeout=30) + +client = InferenceClient(config) +base_url = "https://example.com:443" +model_name = "my_model" +is_reay = await client.is_model_ready(base_url, model_name) +if is_ready: + print("Model is ready") +else: + print("Model is not ready") +``` + +### Parameters + +Name | Type | Description | Notes +------------ | ------------- | ------------- | ------------- +base_url | Union[httpx.URL, str] | The base URL of the model server | +model_name | str | The name of the model to be used for inference | +headers | Mapping[str, str] | headers to include in the request | +timeout | Union[float, None, tuple, httpx.Timeout] | Timeout configuration for the request | + +### Returns + +Return Type: `bool` + +`True`: if the Model is ready +`False`: if the Model is not ready + +### Raises + +`UnsupportedProtocol`: If the protocol specified in the configuration is not supported. + +`HTTPStatusError`: If the response status code indicates an error. diff --git a/mkdocs.yml b/mkdocs.yml index d6670bb56..7b4bca63e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -109,6 +109,7 @@ nav: - Open Inference Protocol API Spec: reference/swagger-ui.md - Python Client SDK: sdk_docs/sdk_doc.md - Python Runtime Server SDK: python_runtime_api/docs/index.md + - Inference Python Client: inference_client/doc.md - Developer Guide: - How to contribute: developer/developer.md - Debugging guide: developer/debug.md