Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model loaded via model repository api does not appear after querying it with v2/repository/index endpoint #7066

Open
ogvalt opened this issue Apr 2, 2024 · 20 comments
Assignees
Labels
enhancement New feature or request investigating The developement team is investigating this issue

Comments

@ogvalt
Copy link

ogvalt commented Apr 2, 2024

Description
I've loaded a model via v2/repository/models/simple/load endpoint.
But when querying v2/repository/index endpoint I get a [] as a responce.

Triton Information
What version of Triton are you using?
2.42.0
Are you using the Triton container or did you build it yourself?
Triton container, version nvcr.io/nvidia/tritonserver:24.01-py3
To Reproduce

  1. I've took this model: https://github.com/triton-inference-server/server/tree/main/docs/examples/model_repository/simple
  2. Loaded it with with python script using tritonclient
    model_name = "simple"

    config_path = models_repository[model_name]["config"]
    model_path = models_repository[model_name]["model"]

    with open(model_path, "rb") as f:
        model_bytes = f.read()

    json_obj = _pbtxt_to_json(config_path)

    triton_client.load_model(
        model_name=model_name,
        config=json_obj,
        files={
            "file:1/model.graphdef": model_bytes,
        },
    )
  1. Then
triton_client.get_model_repository_index()
# returns: []

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Model mentioned above

Expected behavior
I expect than this code:

triton_client.get_model_repository_index()

will return responce according to this specification

$repository_index_response =
[
  {
    "name" : $string,
    "version" : $string #optional,
    "state" : $string,
    "reason" : $string
  },
  …
]

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/extension_model_repository.html

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 3, 2024

can you share the corresponding triton server log?

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 3, 2024

As reference I was able to do the following locally:

mkdir repro; cd repro
git clone https://github.com/triton-inference-server/server
docker run -it --rm \
  --name triton \
  --gpus all --network host \
   --shm-size=1g --ulimit memlock=-1 \
    -v /tmp:/tmp \
    -v ${PWD}:/workspace \
    -v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
    -v ${PWD}/models:/root/models \
    -w /workspace \
     nvcr.io/nvidia/tritonserver:24.01-py3
 tritonserver --model-control-mode=explicit --load-model simple --model-repository=server\
       /docs/examples/model_repository --log-verbose=6 --log-error=1

Expected Output:

<SNIP>
 I0403 05:01:26.112318 155 server.cc:676]
+--------+---------+--------+
| Model  | Version | Status |
+--------+---------+--------+
| simple  | 1      | READY  |
+--------+---------+--------+
<SNIP>

and then from a seperate shell

curl --request POST http://localhost:8000/v2/repository/index

Expected Output:

  [{"name":"densenet_onnx"},{"name":"inception_graphdef"},{"name":"simple","version":"1","state":"READY"},{"name":"simple_dyna_sequence"},{"name":"simple_identity"},{"name":"simple_int8"},{"name":"simple_sequence"},{"name":"simple_string"}]

@ogvalt
Copy link
Author

ogvalt commented Apr 3, 2024

@nnshah1 Sorry, I was little in a hurry and missed some key details.

  1. I'm launching my triton instance with an empty model repository, which translates into my commands looking like follows:
docker run -it --rm \
  --name triton \
  --gpus all --network host \
   --shm-size=1g --ulimit memlock=-1 \
     nvcr.io/nvidia/tritonserver:24.01-py3
 tritonserver --model-control-mode=explicit --model-repository=/home --log-verbose=6 --log-error=1
  1. Then I'm loading simple model using tritonclient python SDK and functionality that could be found in its tritonclient.http.InferenceServerClient class. I'm refererring to load_model method for loading simple model and corresponding get_model_repository_index method for querying index.

The idea is that I'm launching tritonserver without any model at all and then load and unload models as I please.

@nnshah1 nnshah1 added the investigating The developement team is investigating this issue label Apr 8, 2024
@nnshah1
Copy link
Contributor

nnshah1 commented Apr 9, 2024

Can you provide the server logs?

I ran the server without loading the model (but still pointing to the example artifacts):

tritonserver --model-control-mode=explicit --model-repository=server/docs/examples/model_repository --log-verbose=6 --log-error=1

And loaded the example model directly:

   1 │ import tritonclient
   2 │
   6 │ import sys
   7 │
   8 │ import tritonclient.http as httpclient
   9 │
  10 │ if __name__ == "__main__":
  11 │
  12 │     model_name = "simple"
  13 │
  14 │     try:
  15 │         triton_client = httpclient.InferenceServerClient(
  16 │             url="localhost:8000", verbose=True
  17 │         )
  18 │     except Exception as e:
  19 │         print("context creation failed: " + str(e))
  20 │         sys.exit(1)
  21 │
  22 │     triton_client.load_model("simple")
  23 │
  24 │     triton_client.get_model_repository_index()

And everything worked as expected. Can you check that as a sanity test?

My guess is that there is an error either in the pbtxt to json or the way the model bytes are loaded.

If you can share the pbtxt to json conversion code you are using could also see if the exact steps reproduce on our end.

@ogvalt
Copy link
Author

ogvalt commented Apr 9, 2024

@nnshah1
You are pointing your server to the folder with models that already in there.
As far as I understand documentation index will return the list of all model, loaded or not.

But I expect that if I uploaded model to the server via API it should show when I query index independently from it existance in folder where --model-repository points to.
Correct me please, If my expectation is wrong.

To reproduce my case - you need to point to an empty model repository like I suggested:

tritonserver --model-control-mode=explicit --model-repository=/home --log-verbose=6 --log-error=1

Since I'm running tritonserver in docker, /home folder is empty in the container.

My use case: my starting triton container on some server with empty model repository and then gradually uploading or unloading models as my needs change.

My code to convert pbtxt to json convertion:

import pathlib

import google.protobuf.message
import google.protobuf.text_format
import google.protobuf.json_format
import tritonclient.grpc as tritongrpcclient

def pbtxt_to_json(filepath: pathlib.Path) -> str:
    with open(filepath, "r") as f:
        json_obj = google.protobuf.json_format.MessageToJson(
            google.protobuf.text_format.Parse(
                f.read(), 
                tritongrpcclient.model_config_pb2.ModelConfig()
            )
        )
    return json_obj

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 9, 2024

@ogvalt I understand your use case. Are there any errors on the server side log when loading the model? Can you confirm that loading the example as above (explicitly from a directory via the client) works as well? I'd like to see at which point things diverge from loading the example model directly from disk and when loading by passing the bits in manually.

@ogvalt
Copy link
Author

ogvalt commented Apr 9, 2024

@nnshah1
Understood, I'm working on launching your code. Meanwhile here is a server log you asked for

I0409 14:33:20.431590 1 cache_manager.cc:480] Create CacheManager with cache_dir: '/opt/tritonserver/caches'
I0409 14:33:20.569581 1 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x71d448000000' with size 268435456
I0409 14:33:20.569750 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0409 14:33:20.570429 1 server.cc:606] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0409 14:33:20.570442 1 server.cc:633] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0409 14:33:20.570444 1 model_lifecycle.cc:265] ModelStates()
I0409 14:33:20.570451 1 server.cc:676] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0409 14:33:20.601711 1 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA GeForce RTX 2070 with Max-Q Design
I0409 14:33:20.603244 1 metrics.cc:770] Collecting CPU metrics
I0409 14:33:20.603362 1 tritonserver.cc:2498] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.42.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /home                                                                                                                                                                                                           |
| model_control_mode               | MODE_EXPLICIT                                                                                                                                                                                                   |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0409 14:33:20.603750 1 grpc_server.cc:2426] 
+----------------------------------------------+---------+
| GRPC KeepAlive Option                        | Value   |
+----------------------------------------------+---------+
| keepalive_time_ms                            | 7200000 |
| keepalive_timeout_ms                         | 20000   |
| keepalive_permit_without_calls               | 0       |
| http2_max_pings_without_data                 | 2       |
| http2_min_recv_ping_interval_without_data_ms | 300000  |
| http2_max_ping_strikes                       | 2       |
+----------------------------------------------+---------+

I0409 14:33:20.604148 1 grpc_server.cc:102] Ready for RPC 'Check', 0
I0409 14:33:20.604164 1 grpc_server.cc:102] Ready for RPC 'ServerLive', 0
I0409 14:33:20.604168 1 grpc_server.cc:102] Ready for RPC 'ServerReady', 0
I0409 14:33:20.604172 1 grpc_server.cc:102] Ready for RPC 'ModelReady', 0
I0409 14:33:20.604176 1 grpc_server.cc:102] Ready for RPC 'ServerMetadata', 0
I0409 14:33:20.604180 1 grpc_server.cc:102] Ready for RPC 'ModelMetadata', 0
I0409 14:33:20.604184 1 grpc_server.cc:102] Ready for RPC 'ModelConfig', 0
I0409 14:33:20.604190 1 grpc_server.cc:102] Ready for RPC 'SystemSharedMemoryStatus', 0
I0409 14:33:20.604194 1 grpc_server.cc:102] Ready for RPC 'SystemSharedMemoryRegister', 0
I0409 14:33:20.604198 1 grpc_server.cc:102] Ready for RPC 'SystemSharedMemoryUnregister', 0
I0409 14:33:20.604203 1 grpc_server.cc:102] Ready for RPC 'CudaSharedMemoryStatus', 0
I0409 14:33:20.604206 1 grpc_server.cc:102] Ready for RPC 'CudaSharedMemoryRegister', 0
I0409 14:33:20.604210 1 grpc_server.cc:102] Ready for RPC 'CudaSharedMemoryUnregister', 0
I0409 14:33:20.604215 1 grpc_server.cc:102] Ready for RPC 'RepositoryIndex', 0
I0409 14:33:20.604222 1 grpc_server.cc:102] Ready for RPC 'RepositoryModelLoad', 0
I0409 14:33:20.604225 1 grpc_server.cc:102] Ready for RPC 'RepositoryModelUnload', 0
I0409 14:33:20.604231 1 grpc_server.cc:102] Ready for RPC 'ModelStatistics', 0
I0409 14:33:20.604236 1 grpc_server.cc:102] Ready for RPC 'Trace', 0
I0409 14:33:20.604244 1 grpc_server.cc:102] Ready for RPC 'Logging', 0
I0409 14:33:20.604256 1 grpc_server.cc:359] Thread started for CommonHandler
I0409 14:33:20.604386 1 infer_handler.h:1185] StateNew, 0 Step START
I0409 14:33:20.604400 1 infer_handler.cc:674] New request handler for ModelInferHandler, 0
I0409 14:33:20.604410 1 infer_handler.h:1309] Thread started for ModelInferHandler
I0409 14:33:20.604522 1 infer_handler.h:1185] StateNew, 0 Step START
I0409 14:33:20.604533 1 infer_handler.cc:674] New request handler for ModelInferHandler, 0
I0409 14:33:20.604542 1 infer_handler.h:1309] Thread started for ModelInferHandler
I0409 14:33:20.604606 1 infer_handler.h:1185] StateNew, 0 Step START
I0409 14:33:20.604615 1 stream_infer_handler.cc:128] New request handler for ModelStreamInferHandler, 0
I0409 14:33:20.604624 1 infer_handler.h:1309] Thread started for ModelStreamInferHandler
I0409 14:33:20.604631 1 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0409 14:33:20.604824 1 http_server.cc:4623] Started HTTPService at 0.0.0.0:8000
I0409 14:33:20.645724 1 http_server.cc:315] Started Metrics Service at 0.0.0.0:8002
I0409 14:33:21.357261 1 http_server.cc:4509] HTTP request: 0 /v2/health/ready
I0409 14:33:21.357323 1 model_lifecycle.cc:265] ModelStates()
I0409 14:33:21.373833 1 http_server.cc:4509] HTTP request: 2 /v2/repository/models/simple/load
I0409 14:33:21.378079 1 model_config_utils.cc:680] Server side auto-completed config: name: "simple"
platform: "tensorflow_graphdef"
max_batch_size: 8
input {
  name: "INPUT0"
  data_type: TYPE_INT32
  dims: 16
}
input {
  name: "INPUT1"
  data_type: TYPE_INT32
  dims: 16
}
output {
  name: "OUTPUT0"
  data_type: TYPE_INT32
  dims: 16
}
output {
  name: "OUTPUT1"
  data_type: TYPE_INT32
  dims: 16
}
default_model_filename: "model.graphdef"
backend: "tensorflow"

I0409 14:33:21.378206 1 model_lifecycle.cc:430] AsyncLoad() 'simple'
I0409 14:33:21.378312 1 model_lifecycle.cc:461] loading: simple:1
I0409 14:33:21.378438 1 model_lifecycle.cc:539] CreateModel() 'simple' version 1
I0409 14:33:21.378647 1 backend_model.cc:502] Adding default backend config setting: default-max-batch-size,4
I0409 14:33:21.378692 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow/libtriton_tensorflow.so
W0409 14:33:21.604963 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
2024-04-09 14:33:21.658999: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9360] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-09 14:33:21.659028: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-09 14:33:21.659052: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1537] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
I0409 14:33:21.666817 1 tensorflow.cc:2577] TRITONBACKEND_Initialize: tensorflow
I0409 14:33:21.666835 1 tensorflow.cc:2587] Triton TRITONBACKEND API version: 1.17
I0409 14:33:21.666838 1 tensorflow.cc:2593] 'tensorflow' TRITONBACKEND API version: 1.17
I0409 14:33:21.666841 1 tensorflow.cc:2617] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0409 14:33:21.667066 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple (version 1)
I0409 14:33:21.667443 1 model_config_utils.cc:1902] ModelConfig 64-bit fields:
I0409 14:33:21.667451 1 model_config_utils.cc:1904] 	ModelConfig::dynamic_batching::default_priority_level
I0409 14:33:21.667453 1 model_config_utils.cc:1904] 	ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0409 14:33:21.667455 1 model_config_utils.cc:1904] 	ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0409 14:33:21.667457 1 model_config_utils.cc:1904] 	ModelConfig::dynamic_batching::priority_levels
I0409 14:33:21.667459 1 model_config_utils.cc:1904] 	ModelConfig::dynamic_batching::priority_queue_policy::key
I0409 14:33:21.667461 1 model_config_utils.cc:1904] 	ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0409 14:33:21.667463 1 model_config_utils.cc:1904] 	ModelConfig::ensemble_scheduling::step::model_version
I0409 14:33:21.667465 1 model_config_utils.cc:1904] 	ModelConfig::input::dims
I0409 14:33:21.667467 1 model_config_utils.cc:1904] 	ModelConfig::input::reshape::shape
I0409 14:33:21.667469 1 model_config_utils.cc:1904] 	ModelConfig::instance_group::secondary_devices::device_id
I0409 14:33:21.667471 1 model_config_utils.cc:1904] 	ModelConfig::model_warmup::inputs::value::dims
I0409 14:33:21.667473 1 model_config_utils.cc:1904] 	ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0409 14:33:21.667474 1 model_config_utils.cc:1904] 	ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0409 14:33:21.667476 1 model_config_utils.cc:1904] 	ModelConfig::output::dims
I0409 14:33:21.667478 1 model_config_utils.cc:1904] 	ModelConfig::output::reshape::shape
I0409 14:33:21.667480 1 model_config_utils.cc:1904] 	ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0409 14:33:21.667482 1 model_config_utils.cc:1904] 	ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0409 14:33:21.667484 1 model_config_utils.cc:1904] 	ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0409 14:33:21.667486 1 model_config_utils.cc:1904] 	ModelConfig::sequence_batching::state::dims
I0409 14:33:21.667488 1 model_config_utils.cc:1904] 	ModelConfig::sequence_batching::state::initial_state::dims
I0409 14:33:21.667491 1 model_config_utils.cc:1904] 	ModelConfig::version_policy::specific::versions
I0409 14:33:21.667579 1 tensorflow.cc:1833] model configuration:
{
    "name": "simple",
    "platform": "tensorflow_graphdef",
    "backend": "tensorflow",
    "runtime": "",
    "version_policy": {
        "latest": {
            "num_versions": 1
        }
    },
    "max_batch_size": 8,
    "input": [
        {
            "name": "INPUT0",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                16
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        },
        {
            "name": "INPUT1",
            "data_type": "TYPE_INT32",
            "format": "FORMAT_NONE",
            "dims": [
                16
            ],
            "is_shape_tensor": false,
            "allow_ragged_batch": false,
            "optional": false
        }
    ],
    "output": [
        {
            "name": "OUTPUT0",
            "data_type": "TYPE_INT32",
            "dims": [
                16
            ],
            "label_filename": "",
            "is_shape_tensor": false
        },
        {
            "name": "OUTPUT1",
            "data_type": "TYPE_INT32",
            "dims": [
                16
            ],
            "label_filename": "",
            "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
            "enable": true
        },
        "output_pinned_memory": {
            "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
            "name": "simple",
            "kind": "KIND_GPU",
            "count": 1,
            "gpus": [
                0
            ],
            "secondary_devices": [],
            "profile": [],
            "passive": false,
            "host_policy": ""
        }
    ],
    "default_model_filename": "model.graphdef",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
}
I0409 14:33:21.670116 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_0 (GPU device 0)
I0409 14:33:21.670231 1 backend_model_instance.cc:106] Creating instance simple_0 on GPU 0 (7.5) using artifact 'model.graphdef'
2024-04-09 14:33:21.674731: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-09 14:33:21.675352: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 14:33:21.704935: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 14:33:21.705094: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 14:33:21.705346: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 14:33:21.705476: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 14:33:21.705599: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 14:33:21.705701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1883] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5854 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2070 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5
2024-04-09 14:33:21.721025: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
I0409 14:33:21.721266 1 backend_model_instance.cc:772] Starting backend thread for simple_0 at nice 0 on device 0...
I0409 14:33:21.721356 1 backend_model.cc:674] Created model instance named 'simple_0' with device id '0'
I0409 14:33:21.721379 1 model_lifecycle.cc:684] OnLoadComplete() 'simple' version 1
I0409 14:33:21.721384 1 model_lifecycle.cc:722] OnLoadFinal() 'simple' for all version(s)
I0409 14:33:21.721387 1 model_lifecycle.cc:827] successfully loaded 'simple'
I0409 14:33:21.721404 1 model_lifecycle.cc:286] VersionStates() 'simple'
I0409 14:33:21.721433 1 model_lifecycle.cc:286] VersionStates() 'simple'
I0409 14:33:21.721844 1 http_server.cc:4509] HTTP request: 2 /v2/models/simple/versions/1/infer
I0409 14:33:21.721859 1 model_lifecycle.cc:328] GetModel() 'simple' version 1
I0409 14:33:21.721865 1 model_lifecycle.cc:328] GetModel() 'simple' version 1
I0409 14:33:21.721919 1 infer_request.cc:131] [request id: <id_unknown>] Setting state from INITIALIZED to INITIALIZED
I0409 14:33:21.721928 1 infer_request.cc:893] [request id: <id_unknown>] prepared: [0x0x71d4100100b0] request id: , model: simple, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x71d410043b38] input: INPUT1, type: INT32, original shape: [8,16], batch + shape: [8,16], shape: [16]
[0x0x71d4100036a8] input: INPUT0, type: INT32, original shape: [8,16], batch + shape: [8,16], shape: [16]
override inputs:
inputs:
[0x0x71d4100036a8] input: INPUT0, type: INT32, original shape: [8,16], batch + shape: [8,16], shape: [16]
[0x0x71d410043b38] input: INPUT1, type: INT32, original shape: [8,16], batch + shape: [8,16], shape: [16]
original requested outputs:
OUTPUT0
OUTPUT1
requested outputs:
OUTPUT0
OUTPUT1

I0409 14:33:21.721940 1 infer_request.cc:131] [request id: <id_unknown>] Setting state from INITIALIZED to PENDING
I0409 14:33:21.721958 1 infer_request.cc:131] [request id: <id_unknown>] Setting state from PENDING to EXECUTING
I0409 14:33:21.721980 1 tensorflow.cc:2803] model simple, instance simple_0, executing 1 requests
I0409 14:33:21.721986 1 tensorflow.cc:1971] TRITONBACKEND_ModelExecute: Running simple_0 with 1 requests
I0409 14:33:21.722021 1 tensorflow.cc:2223] TRITONBACKEND_ModelExecute: input 'INPUT0' is GPU tensor: false
I0409 14:33:21.722029 1 tensorflow.cc:2223] TRITONBACKEND_ModelExecute: input 'INPUT1' is GPU tensor: false
I0409 14:33:21.731327 1 infer_response.cc:167] add response output: output: OUTPUT0, type: INT32, shape: [8,16]
I0409 14:33:21.731352 1 http_server.cc:1232] HTTP using buffer for: 'OUTPUT0', size: 512, addr: 0x71d2c4053230
I0409 14:33:21.731361 1 tensorflow.cc:2497] TRITONBACKEND_ModelExecute: output 'OUTPUT0' is GPU tensor: false
I0409 14:33:21.731366 1 infer_response.cc:167] add response output: output: OUTPUT1, type: INT32, shape: [8,16]
I0409 14:33:21.731372 1 http_server.cc:1232] HTTP using buffer for: 'OUTPUT1', size: 512, addr: 0x71d2c4028e90
I0409 14:33:21.731377 1 tensorflow.cc:2497] TRITONBACKEND_ModelExecute: output 'OUTPUT1' is GPU tensor: false
I0409 14:33:21.731413 1 http_server.cc:1306] HTTP release: size 512, addr 0x71d2c4053230
I0409 14:33:21.731419 1 http_server.cc:1306] HTTP release: size 512, addr 0x71d2c4028e90
I0409 14:33:21.731430 1 infer_request.cc:131] [request id: <id_unknown>] Setting state from EXECUTING to RELEASED
I0409 14:33:21.731444 1 tensorflow.cc:2555] TRITONBACKEND_ModelExecute: model simple_0 released 1 requests
I0409 14:33:21.731924 1 http_server.cc:4509] HTTP request: 0 /v2/models/simple/versions/1/ready
I0409 14:33:21.731942 1 model_lifecycle.cc:328] GetModel() 'simple' version 1
I0409 14:33:21.732167 1 http_server.cc:4509] HTTP request: 2 /v2/repository/index
I0409 14:33:21.732217 1 model_lifecycle.cc:265] ModelStates()
I0409 14:33:21.776010 1 http_server.cc:4509] HTTP request: 0 /v2/health/ready
I0409 14:33:21.776034 1 model_lifecycle.cc:265] ModelStates()

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 9, 2024

quick update - I believe I'm able to reproduce what you are describing - will investigate -

@ogvalt
Copy link
Author

ogvalt commented Apr 9, 2024

@nnshah1
logs above obtained by launching everything my way with empty repository

@nnshah1 nnshah1 added the bug Something isn't working label Apr 9, 2024
@ogvalt
Copy link
Author

ogvalt commented Apr 9, 2024

@nnshah1
FYI: I've run your code and got:

POST /v2/repository/models/simple/load, headers {}
{}
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '0'}>
Loaded model 'simple'
POST /v2/repository/index, headers {}

<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '238'}>
bytearray(b'[{"name":"densenet_onnx"},{"name":"inception_graphdef"},{"name":"simple","version":"1","state":"READY"},{"name":"simple_dyna_sequence"},{"name":"simple_identity"},{"name":"simple_int8"},{"name":"simple_sequence"},{"name":"simple_string"}]')

Sanity test - checked

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 9, 2024

@nnshah1 FYI: I've run your code and got:

POST /v2/repository/models/simple/load, headers {}
{}
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '0'}>
Loaded model 'simple'
POST /v2/repository/index, headers {}

<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '238'}>
bytearray(b'[{"name":"densenet_onnx"},{"name":"inception_graphdef"},{"name":"simple","version":"1","state":"READY"},{"name":"simple_dyna_sequence"},{"name":"simple_identity"},{"name":"simple_int8"},{"name":"simple_sequence"},{"name":"simple_string"}]')

Sanity test - checked

Thanks! Appreciate it. I'm suspecting that since the models get loaded into a temp directory and not /home - there is a difference in how they are listed out in the index. Need to investigate if that is by design or a bug ....

@ogvalt
Copy link
Author

ogvalt commented Apr 9, 2024

I'm suspecting that since the models get loaded into a temp directory and not /home - there is a difference in how they are listed out in the index. Need to investigate if that is by design or a bug ....

Looking forward for an answer too.
In any case it would be great to list any model server under the triton

@nnshah1 nnshah1 self-assigned this Apr 16, 2024
@nnshah1
Copy link
Contributor

nnshah1 commented Apr 16, 2024

@ogvalt I've filed an internal ticket to track - let us know if there timeline / priority for this

@ogvalt
Copy link
Author

ogvalt commented Apr 16, 2024

@ogvalt I've filed an internal ticket to track - let us know if there timeline / priority for this

it's not urgent, but I hope it won't take months to see a release with this fix.

@nnshah1 nnshah1 added enhancement New feature or request and removed bug Something isn't working labels Apr 22, 2024
@nnshah1
Copy link
Contributor

nnshah1 commented Apr 22, 2024

@ogvalt - we're discussing internally and will get back on ETA.

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 30, 2024

@ogvalt For a temporary workaround you can find: triton-inference-server/core#340

Need to finalize the change in behavior - but in case you'd like to see it sooner than later.

@ogvalt
Copy link
Author

ogvalt commented Apr 30, 2024

@nnshah1 thanks for an update.

I was wondering what kind of side effects to expect after dynamically loaded model was unloaded?

Like some amount of ram or disk space will be left occupied? or it would be completely deleted?

@nnshah1
Copy link
Contributor

nnshah1 commented Apr 30, 2024

It will generally depend on the backend and how it handles things. For the python backend- model instances are in seperate processes so memory would be reclaimed. For In-Process backends like tensorflow and pytorch mileage can very on how quickly and if all memory is reclaimed. For tensorflow specifically we have seen memory being held.

@ogvalt
Copy link
Author

ogvalt commented Jul 18, 2024

just checking, how things are going?

@ogvalt
Copy link
Author

ogvalt commented Nov 3, 2024

@nnshah1 hey, any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request investigating The developement team is investigating this issue
Development

No branches or pull requests

2 participants