Fix for redis memory check failure after link flap and also sometimes cpu usage high failure #15732
+46
−26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
Redis memory check result is not a stable value using "redis-cli info memory | grep used_memory_human". It's found on a stable system (BGO converged, no port flapping etc), the above check could have memory usage difference by more than 0.2M.
Followings are CLI output from 202405 and 202205.
202405:
admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
2.64
admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
2.74
admin@ixre-egl-board30: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
2.52
202205:
admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
6.02
admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
6.26
admin@ixre-egl-board64: redis-cli info memory | grep used_memory_human | sed -e 's/.:(.)M/\1/'
6.14
We can see that 202405 has some memory optimization for redis and it's not using as much memory as 202205. 0.2M memory usage difference could easily reach the memory usage threshold of 5% in 202405.
Solution is to get the average redis memory usage before and after link flap. using 5 seconds interval and 5 times query and then get the average memory usage for redis. Also make the threshold to 10% from 5%. With this fix it's found that the redis memory check will not fail for 2405 after link flap.
This commit also provide a fix for sometimes CPU utilization check failed for orchagent after link flap. The reason is in scaling setup (34k routes) orchagent takes more time to calm down.
Summary:
Fixes # (issue) #15733
Type of change
Back port request
Approach
What is the motivation for this PR?
Fix test failures
How did you verify/test it?
OC tests run with the fix. Did not see the test failed.