-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(query) Option to disable Lucene caching #1709
perf(query) Option to disable Lucene caching #1709
Conversation
new SearcherFactory() { | ||
override def newSearcher(reader: IndexReader, previousReader: IndexReader): IndexSearcher = { | ||
val indexSearcher = super.newSearcher(reader, previousReader) | ||
indexSearcher.setQueryCache(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to still set the cache to null if shouldCache returns false ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe why it is detrimental too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question @sandeep6189 Initially I set the caching policy and disable caching anything but later found setting cache to null also disables cache (at least from unit tests) and also eliminates the check for whether to cache for not. However, I left both changes in.
Hello @whizkido good to see your comments :)
When we profile a FIloDB instance with a large index and which sees a lot of repeating queries, we see the following flame graph (as taken by async profiler)
We see almost 80% stack traces are related to index searches and about 50% of the are related to caching. Also this archive mentions disabling caching which may or may not work in our case (also they mention disabling cache in a different way than what this PR does). The idea is to have that knob to let us disable the caching and then profile again. It might be worse but we can measure and check.
Let me add the above details to PR description too
val indexSearcher = super.newSearcher(reader, previousReader) | ||
indexSearcher.setQueryCache(null) | ||
indexSearcher.setQueryCachingPolicy(new QueryCachingPolicy() { | ||
override def onUse(query: Query): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question: what does onUse do ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken from Lucene documentation
Callback that is called every time that a cached filter is used. This is typically useful if the policy wants to track usage statistics in order to make decisions.
@@ -284,6 +284,8 @@ class TimeSeriesShard(val ref: DatasetRef, | |||
filodbConfig.getBoolean("memstore.index-faceting-enabled-shard-key-labels") | |||
private val indexFacetingEnabledAllLabels = filodbConfig.getBoolean("memstore.index-faceting-enabled-for-all-labels") | |||
private val numParallelFlushes = filodbConfig.getInt("memstore.flush-task-parallelism") | |||
private val disableIndexCaching = filodbConfig.getBoolean("memstore.disable-index-caching") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also need this in downsample time series shard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure its needed yet but not ruling out the possibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but +1 to all of @sandeep6189's questions (let's defer the DS config to a separate PR if that's a less-straightforward change, but the consistency would be nice to have).
Pull Request checklist
Current behavior :
When we profile a FIloDB instance with a large index and which sees a lot of repeating queries, we see the following flame graph (as taken by async profiler)
We see almost 80% stack traces are related to index searches and about 50% of them are related to caching. Also this archive mentions disabling caching which may or may not work in our case (also they mention disabling cache in a different way than what this PR does). The idea is to have that knob to let us disable the caching and then profile again.
New behavior :
We now support flag based enable/disable caching. By default the caching is enabled (existing default)