Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last value Cache - Increase in outliers beyond 16 threads for 100 series #25562

Open
MaduMitha-Ravi opened this issue Nov 18, 2024 · 9 comments

Comments

@MaduMitha-Ravi
Copy link

Increase in Outliers beyond 16 thread concurrency (Last value cache)

  • For concurrency threads greater than 16, we are observing more outliers nearly 5x-10x of the typical latency thus impacting the P95 numbers
  • CPU usage was less than 20% and Memory consumption was less than 25%
  • This observation and pattern looks like there are some restrictions/limitations that are paving way for outliers of latencies.

Could there be a wait happening on some internal resources?

Evidence

image image image

Note: How we capture latency (P95 reported) is by having backgrounded threads which are 12, 14, 16 etc. and collect the metrics from just one. This shows on concurrent load, how a particular user observes performance. Stating that, QPS could have been impacted by the outliers observed.

@hiltontj
Copy link
Contributor

Hey @MaduMitha-Ravi - I'm wondering if we have observed similar break down in performance for higher thread counts when issuing regular queries, i.e., not to the last cache? I want to rule out that this is related to something systemic vs. in the last cache specifically before digging into what might be wrong in the cache.

@MaduMitha-Ravi
Copy link
Author

I will do some quick runs and update in here. We can modify the issue based on evidence.

@MaduMitha-Ravi
Copy link
Author

@hiltontj You suspicion is right. More outliers spike with the increase in concurrency.
image

@pauldix
Copy link
Member

pauldix commented Nov 19, 2024

@hiltontj We encountered a problem with concurrency in IOx before that required moving query planning off of the IO threadpool and onto the DF threadpool. The pr is influxdata/influxdb_iox#11029 which has pointers to related PRs and issues that are worth reading through.

Basically, we weren't able to take advantage of all the cores of a larger machine because we have two threadpools: one for tokio IO and one for DF query execution. Too much happening in the IO threadpool would cause IO stalls and make it so we couldn't effectively utilize all cores.

Might be the case again, but might not. Thought it was worth highlighting.

@hiltontj
Copy link
Contributor

Thanks for confirming @MaduMitha-Ravi and for the pointer @pauldix - @MaduMitha-Ravi is this is a major blocker? If so, I can start looking into it; otherwise, I will dive into this next week once I am through with #25539

@MaduMitha-Ravi
Copy link
Author

@hiltontj Not a blocker, just a concern. We can take it up next week.

@MaduMitha-Ravi
Copy link
Author

Working on the re-runs with the latest build (with fix), will update once I am done.

@MaduMitha-Ravi
Copy link
Author

Results after the merge in on DF Threadpool.

  • Latencies have spiked compared to previous experiment
  • CPU usage has increased significantly , reaches 80% at concurrency of 4 (previously was less than 25%)
  • Latencies over runs shows variability got introduced
image image image image

Note:

  • Re-runs are in progress to be sure of the results
  • This performance impact could be the effect of the DF Thread pool or due to some other change in the software.

@hiltontj
Copy link
Contributor

hiltontj commented Dec 4, 2024

Thanks for the update @MaduMitha-Ravi - wasn't expecting that, but clearly this warrants more investigation. I will open a separate issue to write out a plan for investigating this further.

Can you provide the command line arguments that are being used to run the influxdb3 serve process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants