-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vtexplain: Suppress duplicate vtexplain
error messages
#15243
Conversation
Signed-off-by: Tyler Coleman <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #15243 +/- ##
==========================================
+ Coverage 67.41% 67.44% +0.02%
==========================================
Files 1560 1560
Lines 192752 193005 +253
==========================================
+ Hits 129952 130163 +211
- Misses 62800 62842 +42 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the bigger question here, why do we log this over and over? As in, the logging itself is only a symptom of an underlying problem that we keep doing the same work over and over and that's causing the real slowdown?
Regardless, we should not use global maps to reduce logging here and see if we can remove a lot of the duplicate work we're doing here.
@@ -773,6 +773,8 @@ func (t *explainTablet) analyzeWhere(selStmt *sqlparser.Select, tableColumnMap m | |||
return inColName, inVal, rowCount, nil, nil | |||
} | |||
|
|||
var alreadyLogged = make(map[string]int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely not use a global for this in this way.
The failure here also shows why it's broken to use a shared global map for this. |
Yes, I'm looking into why we're doing the same work repeatedly in a separate issue. But removing the log lines alone reduces the run time of
I can look into another way to reduce logging - thanks for the tip! |
Going after the root cause instead. |
FWIW, it appears the gRPC version bump from 1.55.0 → 1.59.0 causes a vtexplain slowdown across Vitess versions. This change introduces the repeated |
Description
vtexplain
can generate many duplicate error messages, which can also slow execution (especially for batch jobs). This PR aims to suppress duplicate messages by outputting unique error messages once and only once.The two error messages that tend to repeat are:
vtexplain_vttablet.go:842] vtexplain: invalid column .column_name, tableColumnMap +map[:map[]]
, andthrottler.go:510] Throttler.retryReadAndApplyThrottlerConfig(): error reading throttler config. Will retry in 10s. Err=node doesn't exist: cells/explainCell/CellInfo
.This is a bug fix, and I think the fix should be backported.
Before
After
Related Issue(s)
#15242
Checklist
Deployment Notes
N/A