Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

MaduMitha-Ravi · 2024-11-18T03:36:37Z

Increase in Outliers beyond 16 thread concurrency (Last value cache)

For concurrency threads greater than 16, we are observing more outliers nearly 5x-10x of the typical latency thus impacting the P95 numbers
CPU usage was less than 20% and Memory consumption was less than 25%
This observation and pattern looks like there are some restrictions/limitations that are paving way for outliers of latencies.

Could there be a wait happening on some internal resources?

Evidence

Note: How we capture latency (P95 reported) is by having backgrounded threads which are 12, 14, 16 etc. and collect the metrics from just one. This shows on concurrent load, how a particular user observes performance. Stating that, QPS could have been impacted by the outliers observed.

hiltontj · 2024-11-18T15:42:10Z

Hey @MaduMitha-Ravi - I'm wondering if we have observed similar break down in performance for higher thread counts when issuing regular queries, i.e., not to the last cache? I want to rule out that this is related to something systemic vs. in the last cache specifically before digging into what might be wrong in the cache.

MaduMitha-Ravi · 2024-11-18T15:45:45Z

I will do some quick runs and update in here. We can modify the issue based on evidence.

MaduMitha-Ravi · 2024-11-18T21:42:05Z

@hiltontj You suspicion is right. More outliers spike with the increase in concurrency.

pauldix · 2024-11-19T00:36:02Z

@hiltontj We encountered a problem with concurrency in IOx before that required moving query planning off of the IO threadpool and onto the DF threadpool. The pr is influxdata/influxdb_iox#11029 which has pointers to related PRs and issues that are worth reading through.

Basically, we weren't able to take advantage of all the cores of a larger machine because we have two threadpools: one for tokio IO and one for DF query execution. Too much happening in the IO threadpool would cause IO stalls and make it so we couldn't effectively utilize all cores.

Might be the case again, but might not. Thought it was worth highlighting.

hiltontj · 2024-11-19T14:36:56Z

Thanks for confirming @MaduMitha-Ravi and for the pointer @pauldix - @MaduMitha-Ravi is this is a major blocker? If so, I can start looking into it; otherwise, I will dive into this next week once I am through with #25539

MaduMitha-Ravi · 2024-11-19T14:44:58Z

@hiltontj Not a blocker, just a concern. We can take it up next week.

MaduMitha-Ravi · 2024-12-02T20:47:20Z

Working on the re-runs with the latest build (with fix), will update once I am done.

MaduMitha-Ravi · 2024-12-04T03:59:24Z

Results after the merge in on DF Threadpool.

Latencies have spiked compared to previous experiment
CPU usage has increased significantly , reaches 80% at concurrency of 4 (previously was less than 25%)
Latencies over runs shows variability got introduced

Note:

Re-runs are in progress to be sure of the results
This performance impact could be the effect of the DF Thread pool or due to some other change in the software.

hiltontj · 2024-12-04T14:29:07Z

Thanks for the update @MaduMitha-Ravi - wasn't expecting that, but clearly this warrants more investigation. I will open a separate issue to write out a plan for investigating this further.

Can you provide the command line arguments that are being used to run the influxdb3 serve process?

MaduMitha-Ravi added the v3 label Nov 18, 2024

hiltontj mentioned this issue Nov 29, 2024

fix: plan queries on DF threadpool to not block IO in REST API #25604

Merged

hiltontj added the kind/perf label Dec 2, 2024

hiltontj mentioned this issue Dec 4, 2024

Concurrency issues investigation #25615

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

MaduMitha-Ravi commented Nov 18, 2024

hiltontj commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

pauldix commented Nov 19, 2024

hiltontj commented Nov 19, 2024

MaduMitha-Ravi commented Nov 19, 2024

MaduMitha-Ravi commented Dec 2, 2024

MaduMitha-Ravi commented Dec 4, 2024

hiltontj commented Dec 4, 2024

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Comments

MaduMitha-Ravi commented Nov 18, 2024

Increase in Outliers beyond 16 thread concurrency (Last value cache)

Evidence

hiltontj commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

MaduMitha-Ravi commented Nov 18, 2024

pauldix commented Nov 19, 2024

hiltontj commented Nov 19, 2024

MaduMitha-Ravi commented Nov 19, 2024

MaduMitha-Ravi commented Dec 2, 2024

MaduMitha-Ravi commented Dec 4, 2024

hiltontj commented Dec 4, 2024