Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-lasting queries in cluster may result in Clickhouse server bloat #11

Open
michaelkl opened this issue Nov 18, 2017 · 1 comment
Open

Comments

@michaelkl
Copy link
Contributor

When performing heavy Clickhouse request it may take too much time to finish. Sometimes it may even exceed HTTPClient connection timeout. Here is what happens in details:

  1. We run some long-lasting query on Clickhouse cluster connecting in a way like this: Clickhouse.establish_connection(urls: ['host1.lvh.me:8123', 'host2.lvh.me:8123'])
  2. Clickhouse server starts working hard on query we issued, but it takes too long.
  3. Clickhouse::Connection::Client#request receives Faraday::TimeoutError exception.
  4. Clickhouse::Cluster#method_missing retries the same request on another Clickhouse host from pond's pool with the same result. If servers are under really heavy load and request lasts long enough, we will run out of servers in the pool and... come to the first one again which is still working over the heavy query!
  5. Go to 1.

As a result each Clickhouse server in cluster runs the same query over and over again increasing the load.

This bloat ends by reaching maximum number of simultaneous queries, when Clickhouse refuses to take another query.

@michaelkl
Copy link
Contributor Author

@archan937 I've made PR #10 to fix this. Not a perfect solution, but do the job. Please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant