Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for tgi metricesfor vllm #1513

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,24 @@ ${TEST_NS}= vllm-gpt2
... vllm:request_success_total
... vllm:avg_prompt_throughput_toks_per_s
... vllm:avg_generation_throughput_toks_per_s

... tgi_tokenize_request_tokens_bucket
... tgi_tokenize_request_tokens_count
... tgi_tokenize_request_tokens_sum
... tgi_tokenize_request_input_count_total
... tgi_request_input_count_total
... tgi_request_queue_duration_bucket
... tgi_request_queue_duration_sum
... tgi_queue_size
... tgi_batch_current_size
... tgi_request_input_length_bucket
... tgi_request_input_length_count
... tgi_request_input_length_sum
... tgi_request_generated_tokens_bucket
... tgi_request_generated_tokens_count
... tgi_request_generated_tokens_sum

*** Test Cases ***
Verify User Can Deploy A Model With Vllm Via CLI
Verify User Can Deploy A Model With Vllm And tgi Via CLI
[Documentation] Deploy a model (gpt2) using the vllm runtime and confirm that it's running
[Tags] Tier1 Sanity Resources-GPU RHOAIENG-6264 VLLM
${rc} ${out}= Run And Return Rc And Output oc apply -f ${DL_POD_FILEPATH}
Comment on lines -58 to 75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not using tgi to deploy the model, it's using vLLM, please revert the name of this TC

Expand All @@ -75,8 +89,8 @@ Verify User Can Deploy A Model With Vllm Via CLI
... inference_type=chat-completions n_times=3 query_idx=8
... namespace=${TEST_NS} string_check_only=${TRUE} validate_response=${FALSE}

Verify Vllm Metrics Are Present
[Documentation] Confirm vLLM metrics are exposed in OpenShift metrics
Verify Vllm And tgi Metrics Are Present
[Documentation] Confirm vLLM and tgi metrics are exposed in OpenShift metrics
Comment on lines +92 to +93
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO you should create a separate test to validate that TGI metrics are present - as far as I understand you've already encountered the issue where only vLLM ones would get exposed, so it'd be better to separate the two and have clear reporting on which ones (if any) are failing to get exposed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

[Tags] Tier1 Sanity Resources-GPU RHOAIENG-6264 VLLM
Depends On Test Verify User Can Deploy A Model With Vllm Via CLI
${host}= llm.Get KServe Inference Host Via CLI isvc_name=vllm-gpt2-openai namespace=${TEST_NS}
Expand All @@ -89,7 +103,7 @@ Verify Vllm Metrics Are Present
Set Suite Variable ${token}
Metrics Should Exist In UserWorkloadMonitoring ${thanos_url} ${token} ${SEARCH_METRICS}

Verify Vllm Metrics Values Match Between UWM And Endpoint
Verify Vllm And tgi Metrics Values Match Between UWM And Endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work - Line 98/112 fetches the metrics to test for by looking at those that start with vllm:, so TGI metrics are not being tested at all here.
You will need to either change the logic of the Get Vllm Metrics And Values in Helpers.py or develop an equivalent keyword for TGI metrics

[Documentation] Confirm the values returned by UWM and by the model endpoint match for each metric
[Tags] Tier1 Sanity Resources-GPU RHOAIENG-6264 RHOAIENG-7687 VLLM
Depends On Test Verify User Can Deploy A Model With Vllm Via CLI
Expand Down
Loading