Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more precise logging for VPA resource recommendations #6723

Merged
merged 7 commits into from
May 29, 2024

Conversation

nikimanoledaki
Copy link
Contributor

@nikimanoledaki nikimanoledaki commented Apr 17, 2024

What type of PR is this?

/kind documentation

What this PR does / why we need it:

This PR adds more precise logging for VPA recommendations.

  • Logs the exact processedRecommendation from the VPA Updater, specifically, the capped and uncapped target resources (CPU/mem).

Which issue(s) this PR fixes:

Helps to debug #6705

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The VPA Updater logs the processed recommendations, specifically, the target and uncapped target CPU/memory.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 17, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @nikimanoledaki!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 17, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @nikimanoledaki. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from kgolab April 17, 2024 12:11
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/vertical-pod-autoscaler labels Apr 17, 2024
@k8s-ci-robot k8s-ci-robot requested a review from voelzmo April 17, 2024 12:11
@leonardpahlke
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 18, 2024
@nikimanoledaki
Copy link
Contributor Author

nikimanoledaki commented Apr 18, 2024

/test

@k8s-ci-robot
Copy link
Contributor

@nikimanoledaki: The /test command needs one or more targets.
The following commands are available to trigger optional jobs:

  • /test pull-cluster-autoscaler-e2e-azure

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nikimanoledaki thanks for the PR!

A few thoughts:

  • do you care about all the recommendation values including the bounds? I guess given the issue at hand, we only really care about target and uncappedTarget (it might be interesting to see what the recommendation would be without minAllowed). So maybe let's remove upperBound and lowerBound here to reduce noise
  • As the format string you're re-using doesn't have a placeholder for the recommendations, they're inserted at the end with this ugly !(EXTRA stuff. Maybe you could put a proper placeholder in the format string
  • It might be the simplest solution to add a String() method to * RecommendedPodResources similar to this one (note that this doesn't account for any nil values yet)
func (r *RecommendedPodResources) String() string {
	sb := &strings.Builder{}
	for _, cr := range r.ContainerRecommendations {
		sb.WriteString(fmt.Sprintf("%s: [ target: %sK, %vm; uncappedTarget: %sK, %vm ]\n", cr.ContainerName, cr.Target.Memory().AsDec(), cr.Target.Cpu().MilliValue(), cr.UncappedTarget.Memory().AsDec(), cr.UncappedTarget.Cpu().MilliValue()))
	}
	return sb.String()
}
  • Do we maybe want to add additional logging for that issue? I was integrating Add logging in case we have minimum memory recommendations gardener/autoscaler#299 into our fork to debug why recommender would give so low recommendations, even when the workload had been running for a long time. In theory, the in-memory model should be backfilled from the VPACheckpoint, such that even a small period with non-existing memory samples (caused e.g. by the KeyError that you observed) should not lead to recommendations below minAllowed.

I really appreciate that you're putting work into this!

@nikimanoledaki
Copy link
Contributor Author

nikimanoledaki commented May 14, 2024

Hi @voelzmo thank you so much for your review! Apologies for the long delay in getting back to this. Responding to your feedback as soon as possible, first thing tomorrow. I needed to wrap up a few other things and finally made time to get back into this. Thank you for your understanding!

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 15, 2024
@nikimanoledaki
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 27, 2024
@voelzmo
Copy link
Contributor

voelzmo commented May 27, 2024

This looks very good to me! We now have one remaining linter error, that's why the test-and-verify job is failing. Could you please take care of this:

/home/runner/work/autoscaler/autoscaler/go/src/k8s.io/autoscaler/vertical-pod-autoscaler/pkg/updater/priority/update_priority_calculator.go:168:1: exported method UpdatePriorityCalculator.GetProcessedRecommendationTargets should have comment or be unexported

@nikimanoledaki nikimanoledaki force-pushed the nm/add-min-allowed-log branch from 9e8f445 to 1800a6e Compare May 27, 2024 15:04
if cr.Target != nil {
sb.WriteString("target: ")
if !cr.Target.Memory().IsZero() {
sb.WriteString(fmt.Sprintf("%sK ", cr.Target.Memory().AsDec()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to scale this in order to be Kilobytes, otherwise AsDec() returns the memory value in Bytes

Suggested change
sb.WriteString(fmt.Sprintf("%sK ", cr.Target.Memory().AsDec()))
sb.WriteString(fmt.Sprintf("%sk ", cr.Target.Memory().ScaledValue(resource.Kilo)))

whereas resource is imported from
"k8s.io/apimachinery/pkg/api/resource"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - added this in the latest commit + updated the tests!

@nikimanoledaki nikimanoledaki requested a review from voelzmo May 29, 2024 08:14
Copy link
Contributor

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2024
@voelzmo
Copy link
Contributor

voelzmo commented May 29, 2024

@kwiesmueller this looks good from my side and will hopefully help investigating a long-standing issue. Can you please approve? Thanks!

Copy link
Member

@kwiesmueller kwiesmueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit that the string building could be done without fmt which might be more efficient. (using the string builder directly and strconv).

Otherwise lgtm
/assign @raywainman

func (calc *UpdatePriorityCalculator) GetProcessedRecommendationTargets(r *vpa_types.RecommendedPodResources) string {
sb := &strings.Builder{}
for _, cr := range r.ContainerRecommendations {
sb.WriteString(fmt.Sprintf("%s: ", cr.ContainerName))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you're already using the string builder here you could write this without fmt.Sprintf to be more efficient.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kwiesmueller, nikimanoledaki, voelzmo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit eed68ef into kubernetes:master May 29, 2024
7 checks passed
@kwiesmueller
Copy link
Member

Oh well, I guess I should have done a hold if I cared more about the nit.
This one is on me :#

@kwiesmueller
Copy link
Member

@nikimanoledaki up to you if you want to follow up with another PR to reply to the nit

@nikimanoledaki
Copy link
Contributor Author

@kwiesmueller thank you - agreed that it would be nice to make it more efficient! Opened a PR: #6883

@nikimanoledaki nikimanoledaki deleted the nm/add-min-allowed-log branch May 29, 2024 15:39
@raywainman raywainman mentioned this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants