-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rearrange metric enablement, so that model metric reporter can procee… #321
base: main
Are you sure you want to change the base?
Rearrange metric enablement, so that model metric reporter can procee… #321
Conversation
To elaborate why this change fixes GPU metrics labels: enabling GPU metrics before initializing the server around line 2396: |
f93cf3a
to
b0a970d
Compare
b0a970d
to
89a0126
Compare
Thank you for this PR! These look good to me. Adding @rmccorm4 as a reviewer as well, since he is more familiar with these files. Once Ryan is good with these changes, we'll run them through the CI and merge once all passes. |
Hi @ClifHouck, thanks for this contribution! While you have figured out a way to have the existing logic propagate the GPU labels to the generic per-model inference metrics - I wouldn't exactly say this is a bug at the moment. Our per-model metrics are currently aggregated per-model, even if technically under the hood they are being tracked per-model-instance. By introducing these GPU labels for metrics other than the gpu mem/util metrics, it would start to expose the notion of per-model-instance metrics for the case of KIND_GPU models with multiple model instances. To me I think there is some drawback to adding this support as-is, because it will introduce some inconsistency in how our metrics are reported and aggregated. With this change, KIND_GPU models will have per-model-instance metrics, but KIND_CPU/KIND_MODEL models will not. Similarly, I think this will also beg the question for models using multi-gpu (currently only supported via KIND_MODEL), why aren't the gpu_labels showing the multiple gpus being used for these model instances? We have a ticket in our backlog (DLIS-3959) to increase the breakdown to per-model-instance metrics (generically for all model instances, irrespective of device type), but it hasn't been prioritized over other tasks yet. Not exposing the GPU labels for these inference metrics allows the metrics to be aggregated for consistency across all cases. Can you elaborate more on your use case and needs, and how your proposed changes or our future changes for per-gpu or per-model-instance inference metrics would directly impact you? Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking accidental merge while we discuss the above comments.
@rmccorm4 I have to disagree that this is not a bug. Given what you have said, there are at least two here:
I can add a commit to this PR which removes the gathering and applying of GPU UUID information to model metrics. That way we solve both issues outlined above. |
I agree that this flow may be a bit unintuitive currently, since the MetricModelReporters are initialized along with the models+model_repository_manager. In fact, if you were do use the I agree this should be resolved one way or the other to be consistent, but I think it's something we should take care to change and we need to balance our current list of priorities. If this behavior is having a significant impact on some workflow or use case, please do let us know. But otherwise I think this is something for us to revisit when we have the bandwidth to do so.
I agree that this code should probably be commented out with a note that it could be re-applied if per-model-instance metrics are exposed for consistency. |
Hi @ClifHouck, thanks for your patience on this and sorry for the long turnaround time. Upon further reflection, I think application of the GPU labels to the inference request metrics can be useful and provide some insights into request distribution, as well as even indirectly detecting faulty GPUs.
In hindsight, I realize that you can have multiple model instances per GPU, and so this would directly mean the metric is becoming a per-model-instance level metric. It just so happens to be in the case of 1 instance per GPU.
I think this is something we will just have to accept for now until we re-work or improve the instance group feature to better account and track models that use multiple GPUs per-instance. Overall, I want the metric labels to be consistent one way or the other, and not dependent on timing or use of dynamic model loading. I think we should go ahead with this change, but will need to some testing of a few things to make sure nothing breaks. It would be great if you're willing to help in adding some test cases, I can help point in the right direction or where to start. Otherwise, we can help when we get the cycles as well. Off the top of my head, I'm interested in seeing the following checked via tests:
Lastly, I'm curious how this will impact existing workflows or dashboards that query the metrics today. Given a server today reporting request metrics with no GPU labels such as:
and then we add the GPU label after this PR on a 2-GPU system, so the metrics output becomes:
Will we break any existing promql queries or workflows looking at |
CC @chriscarollo from your issue as this PR pertains to your question. |
…d properly.
Addresses triton-inference-server/server#6815
The fix is to enable GPU metrics (assuming they're enabled at compile time and by the user at run time) prior to calling
MetricModelReporter::Create
. If GPU metrics are not enabled thenMetricModelReporter::GetMetricsLabels
will not get/populate relevant GPU labels.