perf(query) Option to disable Lucene caching #1709

amolnayak311 · 2024-02-01T23:15:30Z

Pull Request checklist

The commit(s) message(s) follows the contribution guidelines ?
Tests for the changes have been added (for bug fixes / features) ?
Docs have been added / updated (for bug fixes / features) ?

Current behavior :

When we profile a FIloDB instance with a large index and which sees a lot of repeating queries, we see the following flame graph (as taken by async profiler)

We see almost 80% stack traces are related to index searches and about 50% of them are related to caching. Also this archive mentions disabling caching which may or may not work in our case (also they mention disabling cache in a different way than what this PR does). The idea is to have that knob to let us disable the caching and then profile again.

New behavior :

We now support flag based enable/disable caching. By default the caching is enabled (existing default)

sandeep6189 · 2024-02-01T23:18:02Z

core/src/main/scala/filodb.core/memstore/PartKeyLuceneIndex.scala

+        new SearcherFactory() {
+          override def newSearcher(reader: IndexReader, previousReader: IndexReader): IndexSearcher = {
+            val indexSearcher = super.newSearcher(reader, previousReader)
+            indexSearcher.setQueryCache(null)


do we need to still set the cache to null if shouldCache returns false ?

Describe why it is detrimental too

Good question @sandeep6189 Initially I set the caching policy and disable caching anything but later found setting cache to null also disables cache (at least from unit tests) and also eliminates the check for whether to cache for not. However, I left both changes in.

Hello @whizkido good to see your comments :)

When we profile a FIloDB instance with a large index and which sees a lot of repeating queries, we see the following flame graph (as taken by async profiler)

We see almost 80% stack traces are related to index searches and about 50% of the are related to caching. Also this archive mentions disabling caching which may or may not work in our case (also they mention disabling cache in a different way than what this PR does). The idea is to have that knob to let us disable the caching and then profile again. It might be worse but we can measure and check.

Let me add the above details to PR description too

sandeep6189 · 2024-02-01T23:18:40Z

core/src/main/scala/filodb.core/memstore/PartKeyLuceneIndex.scala

+            val indexSearcher = super.newSearcher(reader, previousReader)
+            indexSearcher.setQueryCache(null)
+            indexSearcher.setQueryCachingPolicy(new QueryCachingPolicy() {
+              override def onUse(query: Query): Unit = {


quick question: what does onUse do ?

Taken from Lucene documentation

Callback that is called every time that a cached filter is used. This is typically useful if the policy wants to track usage statistics in order to make decisions.

sandeep6189 · 2024-02-01T23:20:22Z

core/src/main/scala/filodb.core/memstore/TimeSeriesShard.scala

@@ -284,6 +284,8 @@ class TimeSeriesShard(val ref: DatasetRef,
                           filodbConfig.getBoolean("memstore.index-faceting-enabled-shard-key-labels")
  private val indexFacetingEnabledAllLabels = filodbConfig.getBoolean("memstore.index-faceting-enabled-for-all-labels")
  private val numParallelFlushes = filodbConfig.getInt("memstore.flush-task-parallelism")
+  private val disableIndexCaching = filodbConfig.getBoolean("memstore.disable-index-caching")


do we also need this in downsample time series shard?

Not sure its needed yet but not ruling out the possibility

alextheimer

Approved, but +1 to all of @sandeep6189's questions (let's defer the DS config to a separate PR if that's a less-straightforward change, but the consistency would be nice to have).

perf(query) Option to disable Lucene caching

a69819f

amolnayak311 requested a review from alextheimer February 1, 2024 23:15

sandeep6189 approved these changes Feb 1, 2024

View reviewed changes

alextheimer approved these changes Feb 2, 2024

View reviewed changes

amolnayak311 merged commit 618eae0 into filodb:develop Feb 2, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(query) Option to disable Lucene caching #1709

perf(query) Option to disable Lucene caching #1709

amolnayak311 commented Feb 1, 2024 •

edited

Loading

sandeep6189 Feb 1, 2024

whizkido Feb 2, 2024

amolnayak311 Feb 2, 2024 •

edited

Loading

sandeep6189 Feb 1, 2024

amolnayak311 Feb 2, 2024

sandeep6189 Feb 1, 2024

amolnayak311 Feb 2, 2024 •

edited

Loading

alextheimer left a comment

perf(query) Option to disable Lucene caching #1709

perf(query) Option to disable Lucene caching #1709

Conversation

amolnayak311 commented Feb 1, 2024 • edited Loading

sandeep6189 Feb 1, 2024

Choose a reason for hiding this comment

whizkido Feb 2, 2024

Choose a reason for hiding this comment

amolnayak311 Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

sandeep6189 Feb 1, 2024

Choose a reason for hiding this comment

amolnayak311 Feb 2, 2024

Choose a reason for hiding this comment

sandeep6189 Feb 1, 2024

Choose a reason for hiding this comment

amolnayak311 Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

alextheimer left a comment

Choose a reason for hiding this comment

amolnayak311 commented Feb 1, 2024 •

edited

Loading

amolnayak311 Feb 2, 2024 •

edited

Loading

amolnayak311 Feb 2, 2024 •

edited

Loading