Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance logging in cases where we have 0 memory recommendations #329

Conversation

plkokanov
Copy link

@plkokanov plkokanov commented Oct 17, 2024

What this PR does / why we need it:
This PR further enhances logging added in #322 in cases where vpa-recommender recommends 0 memory resources. I have chosen to only log cases when recommendations are 0 as these are the recommendations that we have observed when the issue occurs.

Here are the enhancements:

  • The internal histogram representation will be printed when 0 recommendations occur
  • Recommendations of each estimator are printed, not only the last one in the chain
  • Logging is only done when the target estimator recommends 0 memory to reduce amounts of logs - no logs are printed for lower bound and upper bound estimators
  • The histogram and checkpoint will be printed on single lines so that they can be easily identified in plutono and not be mixed up with other log lines

Here's what an example log would look like:

W1017 16:24:46.291415   92393 estimator.go:133] Computed "memory" resources for VPA "foo/bar" were 0 in percentile estimator of "targetEstimator"!
W1017 16:24:46.291842   92393 estimator.go:194] Computed "memory" resources for VPA "foo/bar" were 0 after applying margin in margin estimator of "targetEstimator", they were 0 before that
W1017 16:24:46.291864   92393 estimator.go:212] Computed "memory" resources for VPA "foo/bar" were below minimum in min resource estimator of "targetEstimator"! Computed 0, minimum is 400000000.
W1017 16:24:46.291916   92393 estimator.go:240] The memory histogram for VPA "foo/bar" is empty!
W1017 16:24:46.291998   92393 estimator.go:243] Here's the string representation of the memory histogram for VPA "foo/bar": referenceTimestamp: 0001-01-01 00:00:00 +0000 UTC, halfLife: 24h0m0s; minBucket: 175, maxBucket: 0, totalWeight: 0.0000; %-tile value: ; 0: 0.0000; 5: 0.0000; 10: 0.0000; 15: 0.0000; 20: 0.0000; 25: 0.0000; 30: 0.0000; 35: 0.0000; 40: 0.0000; 45: 0.0000; 50: 0.0000; 55: 0.0000; 60: 0.0000; 65: 0.0000; 70: 0.0000; 75: 0.0000; 80: 0.0000; 85: 0.0000; 90: 0.0000; 95: 0.0000; 100: 0.0000; buckets value; 0: 0.0000; 1: 0.0000; 2: 0.0000; 3: 0.0000; 4: 0.0000; 5: 0.0000; 6: 0.0000; 7: 0.0000; 8: 0.0000; 9: 0.0000; 10: 0.0000; 11: 0.0000; 12: 0.0000; 13: 0.0000; 14: 0.0000; 15: 0.0000; 16: 0.0000; 17: 0.0000; 18: 0.0000; 19: 0.0000; 20: 0.0000; 21: 0.0000; 22: 0.0000; 23: 0.0000; 24: 0.0000; 25: 0.0000; 26: 0.0000; 27: 0.0000; 28: 0.0000; 29: 0.0000; 30: 0.0000; 31: 0.0000; 32: 0.0000; 33: 0.0000; 34: 0.0000; 35: 0.0000; 36: 0.0000; 37: 0.0000; 38: 0.0000; 39: 0.0000; 40: 0.0000; 41: 0.0000; 42: 0.0000; 43: 0.0000; 44: 0.0000; 45: 0.0000; 46: 0.0000; 47: 0.0000; 48: 0.0000; 49: 0.0000; 50: 0.0000; 51: 0.0000; 52: 0.0000; 53: 0.0000; 54: 0.0000; 55: 0.0000; 56: 0.0000; 57: 0.0000; 58: 0.0000; 59: 0.0000; 60: 0.0000; 61: 0.0000; 62: 0.0000; 63: 0.0000; 64: 0.0000; 65: 0.0000; 66: 0.0000; 67: 0.0000; 68: 0.0000; 69: 0.0000; 70: 0.0000; 71: 0.0000; 72: 0.0000; 73: 0.0000; 74: 0.0000; 75: 0.0000; 76: 0.0000; 77: 0.0000; 78: 0.0000; 79: 0.0000; 80: 0.0000; 81: 0.0000; 82: 0.0000; 83: 0.0000; 84: 0.0000; 85: 0.0000; 86: 0.0000; 87: 0.0000; 88: 0.0000; 89: 0.0000; 90: 0.0000; 91: 0.0000; 92: 0.0000; 93: 0.0000; 94: 0.0000; 95: 0.0000; 96: 0.0000; 97: 0.0000; 98: 0.0000; 99: 0.0000; 100: 0.0000; 101: 0.0000; 102: 0.0000; 103: 0.0000; 104: 0.0000; 105: 0.0000; 106: 0.0000; 107: 0.0000; 108: 0.0000; 109: 0.0000; 110: 0.0000; 111: 0.0000; 112: 0.0000; 113: 0.0000; 114: 0.0000; 115: 0.0000; 116: 0.0000; 117: 0.0000; 118: 0.0000; 119: 0.0000; 120: 0.0000; 121: 0.0000; 122: 0.0000; 123: 0.0000; 124: 0.0000; 125: 0.0000; 126: 0.0000; 127: 0.0000; 128: 0.0000; 129: 0.0000; 130: 0.0000; 131: 0.0000; 132: 0.0000; 133: 0.0000; 134: 0.0000; 135: 0.0000; 136: 0.0000; 137: 0.0000; 138: 0.0000; 139: 0.0000; 140: 0.0000; 141: 0.0000; 142: 0.0000; 143: 0.0000; 144: 0.0000; 145: 0.0000; 146: 0.0000; 147: 0.0000; 148: 0.0000; 149: 0.0000; 150: 0.0000; 151: 0.0000; 152: 0.0000; 153: 0.0000; 154: 0.0000; 155: 0.0000; 156: 0.0000; 157: 0.0000; 158: 0.0000; 159: 0.0000; 160: 0.0000; 161: 0.0000; 162: 0.0000; 163: 0.0000; 164: 0.0000; 165: 0.0000; 166: 0.0000; 167: 0.0000; 168: 0.0000; 169: 0.0000; 170: 0.0000; 171: 0.0000; 172: 0.0000; 173: 0.0000; 174: 0.0000; 175: 0.0000
W1017 16:24:46.292089   92393 estimator.go:251] Here's the checkpoint/state for VPA "foo/bar": {"lastUpdateTime":"2024-10-17T13:24:46Z","version":"v3","cpuHistogram":{"referenceTimestamp":null},"memoryHistogram":{"referenceTimestamp":null},"firstSampleStart":null,"lastSampleStart":null}

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
/cc @voelzmo @ialidzhikov

Release note:

Enhance logging in case we have 0 memory recommendations.

@plkokanov plkokanov requested review from unmarshall, rishabh-11 and a team as code owners October 17, 2024 13:48
@gardener-robot gardener-robot added needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Oct 17, 2024
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 17, 2024
@ialidzhikov ialidzhikov self-assigned this Oct 21, 2024
Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine.

I was wondering how to handle the versioning. We released v1.2.1. Right now the version points to v1.3.0-dev.

const VerticalPodAutoscalerVersion = "1.3.0-dev"

We should rather update to version somehow to make it clear that it is the 2nd patch on top of the v1.2.1 upstream release.

Maybe something like 1.2.1-build.2 or 1.2.1-2. According to the semver regex these are valid semver versions.

Copy link
Member

@voelzmo voelzmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for the change!

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Oct 22, 2024
@plkokanov
Copy link
Author

plkokanov commented Oct 23, 2024

According to the CICD docs the options for bumping the version for the release trait are: bump_major, bump_minor, bump_patch and noop.

For the version trait we have the choices of: noop, finalize, inject-commit-hash, inject-commit-hash-nodash, inject-timestamp, inject-branch-name and use-branch-name. For releases we currently use the finalize option which (I think) removes the suffix from the version - v1.3.0-dev becomes v1.3.0

If we use noop in both places we could maintain the version name ourselves and maybe even automate the bump with the write-vpa-version.sh script. I'll check to see if there are more options tomorrow.

Anyway, I would propose to do it in a separate PR.

@ialidzhikov ialidzhikov merged commit 12de623 into gardener:rel-vertical-pod-autoscaler Oct 24, 2024
5 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants