-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If using weights, upsync does not properly distribute requests between a large number of backends #289
Comments
That's an interesting finding. You mentioned if you specify weight=1 for every backend, the load will be approximately equal. Can you try another weight, eg: weigh=100, for every backend, will the load still be equal? I am wondering if it was because of weight is 1, or weight is equal. |
Here's what it looks like with weight=100:
This is test upstream from usptream_show output:
I generate the load simply by using: |
Is your etcd index changing frequently? upsync watches You can "stat" the dump file |
And I also noticed something interesting. If use parameter include in config with upstreams, that balancing with weith=1 so also is very uneven:
If I don't use parameter include
the result will be like this:
But excluding the parameter include still doesn't solve the problem of uneven balancing if I use weights from 13 to 15 |
Every 5 seconds each of the 60 servers PUT the request with TTL=120 to etcd about its backend. And I have a lot of responses from etcd with the status-code 401: I don't see the point in checking backends and deleting entries from etcd using some script. upsync is very good if possible to store items in etcd with some TTL |
For testing, please stop those 60 servers from updating etcd and see if the result is different. The TTL=120 is only applicable to the etcd dns interface, upsync module uses http interface and will get updates immediately (no TTL). When upsync module first talks to etcd, it will save the index (from the X-Etcd-Index header in the response), so the next request will use that index, eg: |
Why do I want to use upsync? nginx checks the backend according to proxy_next_upstream values before proxying user request. Every time I have disaster in one of the data centers, I have overtime on serving a user requests, because nginx to check bunch of backends in searching of an available one. To avoid this problem, upstream list should always have only available backends. Checking backends for availability using some script and executing the command |
Yes, upsync is exactly for your use case. There are two ways to query etcd: dns or http. TTL is only used for dns queries. upsync uses http to query etcd, so the TTL doesn't matter. upsync watches etcd for any changes in realtime, so if you The problem you have is you are unnecessarily making frequent update to etcd, this causes upsync to update its upstream server list frequently and nginx has to go back to load balancer from first server. |
I don't know what dns has to do with it. I'm just executing
I understand. But how do I get rid of unavailable backends in upstrem? Need to develop script that will monitor backends and execute |
The ttl you set via curl is ignored, unless you use etcd dns. Everytime you execute that curl, the load balancing count get reset, so it starts from server 1 again. That's why you saw un-even. If you do the curl BEFORE you start the test, you shouldn't have a problem, unless you have something else also updating the etcd. Is your etcd dedicated for this test? I already explained why waitIndex is used how it can optimize the query. There are two things you can do for upstream:
|
I'm not arguing with you in any way, I'm just talking about observations. As well as features in etcd, which is called the 'ttl key': https://etcd.io/docs/v2/api/#using-key-ttl
This is very interesting and wonderful module. But it is very rarely updated compared with upsync_module and there were cases with problems in compiled with latest versions nginx. |
Unfortunately, this idea failed, because etcd sets X-Etcd-Index for the entire instance. Tell me if you can somehow disable the behavior of upsync so that it does not use X-Etcd-Index. Someone need to use waitIndex by default, but I want not to use it, is this possible? |
Using Index is just for optimization. You don't need to "tell" not to use it. If the Index is not usable, like in your case, it will automatically fall back to query etcd without using index. You can ignore the warning about 'index is outdated'. upsync is still working, and can detect if your upstream server is gone. Since you are not using the Index optimization, you will not be able to detect upstream server gone immediately. If you want the TTL to be 120 seconds, you should set upsync |
I just don't confirm this case #71 with etcd version 3 and format v2. May be @CallMeFoxie was using etcd version 2.2 at the time |
I only use consul. We need someone else who uses etcd to confirm. |
It is not interesting when there is etcd with the ability ttle key. Again, I would use nginx_upstream_check_module if it was updated frequently and was compilling with tatest versions of nginx without any problems
Exactly, but I have a suspicion that the problem is only reproducible with etcd version 2.2 |
Now I am confused as what exactly is the issue. Is it in the subject |
Initially the problem was an uneven distribution of proxied requests between backends. But we have already decided that this is due to frequent updates of X-Etcd-Index. Why? Because I add the key with ttl=n to etcd every 5 seconds. I am interested in making sure that unavailable or dismantled backends don't PUT upstreams to etcd. My script with regular PUTing upstreams to etcd can be replaced with nginx_upstream_check_module, which will just quickly mark 'down=1' for unavailable backends. But as I wrote this module is rarely updated and has compilation problems with latest versions nginx. The only solution to upsync_module problem in my case is not use X-Etcd-Index. Moreover I have tested the case described in issue #71 and I didn't reproduce the problem that caused X-Etcd-Index to be used in upsync_module |
Hello, guys!
I have 20 nginx upstreams with 60 php-fpm backends each. When I started using upsync, I got high CPU utilization by php-fpm processes on servers. I tested nginx before using upsync and after and got a stunning result.
With the standard upstreams configuration:
I have this result with distribute requests between backends:
Now enable upsync:
Adding entries with upstreams to etcd, run my test again and see the following result:
As you can see, nginx with upsync forward a lot more requests to several servers than to others. If I specify weight=1 for every backend, then load will be approximately equal. But this does not suit me, because I have different CPU and RAM configurations on different servers under high load. I need exactly the values of weights that I had without upsync. I have a suspicion that upsync does not work correctly with weights and needs the fix.
The text was updated successfully, but these errors were encountered: