Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If using weights, upsync does not properly distribute requests between a large number of backends #289

Open
Valeriyy opened this issue Jul 7, 2020 · 18 comments

Comments

@Valeriyy
Copy link

Valeriyy commented Jul 7, 2020

Hello, guys!
I have 20 nginx upstreams with 60 php-fpm backends each. When I started using upsync, I got high CPU utilization by php-fpm processes on servers. I tested nginx before using upsync and after and got a stunning result.
With the standard upstreams configuration:

upstrem test {
        server 10.0.134.116:8080 weight=15 max_fails=0;
        server 10.0.134.121:8080 weight=13 max_fails=0;
        server 10.0.137.52:8080 weight=14 max_fails=0;
        server 10.0.137.51:8080 weight=14 max_fails=0;
        server 10.0.135.159:8080 weight=14 max_fails=0;
        server 10.0.136.45:8080 weight=14 max_fails=0;
        server 10.0.137.57:8080 weight=15 max_fails=0;
        server 10.0.137.58:8080 weight=15 max_fails=0;
        server 10.0.136.28:8080 weight=14 max_fails=0;
        server 10.0.134.192:8080 weight=15 max_fails=0;
        server 10.0.134.179:8080 weight=13 max_fails=0;
        server 10.0.137.60:8080 weight=15 max_fails=0;
        server 10.0.135.139:8080 weight=14 max_fails=0;
        server 10.0.137.36:8080 weight=14 max_fails=0;
        server 10.0.136.65:8080 weight=14 max_fails=0;
        server 10.0.134.212:8080 weight=13 max_fails=0;
        server 10.0.137.92:8080 weight=14 max_fails=0;
        server 10.0.134.118:8080 weight=15 max_fails=0;
        server 10.0.137.61:8080 weight=14 max_fails=0;
        server 10.0.137.122:8080 weight=14 max_fails=0;
        server 10.0.137.243:8080 weight=13 max_fails=0;
        server 10.0.136.39:8080 weight=14 max_fails=0;
        server 10.0.137.195:8080 weight=13 max_fails=0;
        server 10.0.134.122:8080 weight=13 max_fails=0;
        server 10.0.137.171:8080 weight=13 max_fails=0;
        server 10.0.134.123:8080 weight=13 max_fails=0;
        server 10.0.137.54:8080 weight=15 max_fails=0;
        server 10.0.137.168:8080 weight=13 max_fails=0;
        server 10.0.136.51:8080 weight=14 max_fails=0;
        server 10.0.137.31:8080 weight=14 max_fails=0;
        server 10.0.137.156:8080 weight=14 max_fails=0;
        server 10.0.135.158:8080 weight=14 max_fails=0;
        server 10.0.137.23:8080 weight=14 max_fails=0;
        server 10.0.134.127:8080 weight=13 max_fails=0;
        server 10.0.137.170:8080 weight=13 max_fails=0;
        server 10.0.137.173:8080 weight=13 max_fails=0;
        server 10.0.134.98:8080 weight=14 max_fails=0;
        server 10.0.137.71:8080 weight=14 max_fails=0;
        server 10.0.135.140:8080 weight=14 max_fails=0;
        server 10.0.137.77:8080 weight=14 max_fails=0;
        server 10.0.136.49:8080 weight=14 max_fails=0;
        server 10.0.137.73:8080 weight=14 max_fails=0;
        server 10.0.136.38:8080 weight=14 max_fails=0;
        server 10.0.137.35:8080 weight=14 max_fails=0;
        server 10.0.137.138:8080 weight=14 max_fails=0;
        server 10.0.137.162:8080 weight=14 max_fails=0;
        server 10.0.136.43:8080 weight=14 max_fails=0;
        server 10.0.137.144:8080 weight=14 max_fails=0;
        server 10.0.134.124:8080 weight=13 max_fails=0;
        server 10.0.134.128:8080 weight=13 max_fails=0;
        server 10.0.136.48:8080 weight=14 max_fails=0;
        server 10.0.137.32:8080 weight=14 max_fails=0;
        server 10.0.137.169:8080 weight=13 max_fails=0;
        server 10.0.136.26:8080 weight=14 max_fails=0;
        server 10.0.136.68:8080 weight=14 max_fails=0;
        server 10.0.137.74:8080 weight=14 max_fails=0;
        server 10.0.137.81:8080 weight=14 max_fails=0;
        server 10.0.137.254:8080 weight=14 max_fails=0;
        server 10.0.137.172:8080 weight=13 max_fails=0;
        server 10.0.136.64:8080 weight=14 max_fails=0;
}

server {
        listen 80;

        location / {
                include fastcgi_params;
                fastcgi_pass  test;
                fastcgi_param SCRIPT_FILENAME $document_root/index.php;
        }
}

I have this result with distribute requests between backends:

for t in $(for i in {1..10}; do date "+%d/%b/%Y:%H:%M" --date "-$i min"; done); do grep -r "*$t.*GET \/ " /var/log/nginx/access.log | sed -r 's/.*upstream_addr:\s(.*):8080.*/\1/g'; done | sort | uniq -c | sort -nr
     10 10.0.137.60
     10 10.0.137.58
     10 10.0.137.57
      9 10.0.137.54
      9 10.0.134.192
      9 10.0.134.118
      9 10.0.134.116
      8 10.0.137.77
      8 10.0.137.74
      8 10.0.137.73
      8 10.0.137.71
      8 10.0.137.61
      8 10.0.137.52
      8 10.0.137.51
      8 10.0.137.35
      8 10.0.137.32
      8 10.0.137.23
      8 10.0.135.159
      8 10.0.135.158
      8 10.0.135.140
      8 10.0.135.139
      8 10.0.134.98
      7 10.0.137.92
      7 10.0.137.81
      7 10.0.137.36
      7 10.0.137.31
      7 10.0.137.254
      7 10.0.136.43
      7 10.0.136.28
      6 10.0.137.162
      6 10.0.137.156
      6 10.0.137.144
      6 10.0.137.138
      6 10.0.137.122
      6 10.0.136.68
      6 10.0.136.64
      6 10.0.136.51
      6 10.0.136.49
      6 10.0.136.48
      6 10.0.136.39
      6 10.0.136.38
      6 10.0.136.26
      6 10.0.134.124
      6 10.0.134.123
      5 10.0.137.243
      5 10.0.137.195
      5 10.0.137.173
      5 10.0.137.172
      5 10.0.137.171
      5 10.0.137.170
      5 10.0.137.169
      5 10.0.137.168
      5 10.0.136.45
      5 10.0.134.212
      5 10.0.134.179
      5 10.0.134.128
      5 10.0.134.127
      5 10.0.134.122
      5 10.0.134.121

Now enable upsync:

upstream test {
    upsync 127.0.0.1:2379/v2/keys/upsync/test upsync_interval=5s upsync_timeout=5m upsync_type=etcd strong_dependency=off;
    upsync_dump_path /etc/nginx/conf.d/upsync/test.inc;
    include /etc/nginx/conf.d/upsync/test.inc;
}

server {
        listen 80;

        location / {
                include fastcgi_params;
                fastcgi_pass  test;
                fastcgi_param SCRIPT_FILENAME $document_root/index.php;
        }
}

Adding entries with upstreams to etcd, run my test again and see the following result:

for t in $(for i in {1..10}; do date "+%d/%b/%Y:%H:%M" --date "-$i min"; done); do grep -r "*$t.*GET \/ " /var/log/nginx/access.log | sed -r 's/.*upstream_addr:\s(.*):8080.*/\1/g'; done | sort | uniq -c | sort -nr
     45 10.0.137.54
     30 10.0.137.60
     29 10.0.137.57
     23 10.0.134.192
     21 10.0.134.118
     19 10.0.134.116
     17 10.0.137.58
     12 10.0.137.36
     12 10.0.137.35
     10 10.0.137.61
      8 10.0.137.32
      8 10.0.137.23
      7 10.0.137.81
      7 10.0.137.77
      7 10.0.137.74
      7 10.0.137.162
      7 10.0.136.28
      6 10.0.137.92
      6 10.0.135.159
      6 10.0.135.139
      5 10.0.137.144
      5 10.0.135.158
      4 10.0.137.73
      4 10.0.137.71
      4 10.0.137.254
      4 10.0.137.156
      4 10.0.137.122
      4 10.0.136.51
      4 10.0.136.43
      4 10.0.136.26
      3 10.0.137.138
      3 10.0.136.68
      3 10.0.136.48
      3 10.0.136.39
      3 10.0.136.38
      2 10.0.137.52
      2 10.0.137.51
      2 10.0.137.31
      2 10.0.137.173
      2 10.0.137.170
      2 10.0.137.168
      2 10.0.136.65
      2 10.0.136.64
      2 10.0.136.49
      2 10.0.136.45
      2 10.0.135.140
      2 10.0.134.98
      2 10.0.134.212
      2 10.0.134.127
      2 10.0.134.124
      1 10.0.137.171
      1 10.0.137.169
      1 10.0.134.179
      1 10.0.134.123
      1 10.0.134.122
      1 10.0.134.121

As you can see, nginx with upsync forward a lot more requests to several servers than to others. If I specify weight=1 for every backend, then load will be approximately equal. But this does not suit me, because I have different CPU and RAM configurations on different servers under high load. I need exactly the values of weights that I had without upsync. I have a suspicion that upsync does not work correctly with weights and needs the fix.

@gfrankliu
Copy link
Collaborator

That's an interesting finding. You mentioned if you specify weight=1 for every backend, the load will be approximately equal. Can you try another weight, eg: weigh=100, for every backend, will the load still be equal? I am wondering if it was because of weight is 1, or weight is equal.

@Valeriyy
Copy link
Author

Valeriyy commented Jul 7, 2020

That's an interesting finding. You mentioned if you specify weight=1 for every backend, the load will be approximately equal. Can you try another weight, eg: weigh=100, for every backend, will the load still be equal? I am wondering if it was because of weight is 1, or weight is equal.

Here's what it looks like with weight=100:

# for t in $(for i in {1..3}; do date "+%d/%b/%Y:%H:%M" --date "-$i min"; done); do grep -r "*$t.*GET \/ " /var/log/nginx/access.log | sed -r 's
/.*upstream_addr:\s(.*):8080.*/\1/g'; done | sort | uniq -c | sort -nr
     40 10.0.137.61
     34 10.0.137.36
     28 10.0.134.123
     25 10.0.135.159
     23 10.0.137.31
     21 10.0.136.26
     18 10.0.137.51
     17 10.0.137.60
     15 10.0.137.54
     14 10.0.134.122
     13 10.0.137.74
     13 10.0.137.162
     13 10.0.137.156
     13 10.0.137.144
     13 10.0.136.68
      9 10.0.134.192
      8 10.0.137.81
      8 10.0.136.48
      8 10.0.134.124
      6 10.0.137.57
      6 10.0.137.32
      6 10.0.134.127
      5 10.0.137.172
      4 10.0.137.71
      4 10.0.136.45
      4 10.0.135.158
      4 10.0.134.98
      3 10.0.136.28
      3 10.0.134.128
      2 10.0.137.195
      1 10.0.137.77
      1 10.0.137.52
      1 10.0.137.35
      1 10.0.137.243
      1 10.0.137.23
      1 10.0.137.171
      1 10.0.137.170
      1 10.0.137.169
      1 10.0.137.168
      1 10.0.137.138
      1 10.0.136.65
      1 10.0.136.64
      1 10.0.136.51
      1 10.0.136.39
      1 10.0.135.139
      1 10.0.134.212

This is test upstream from usptream_show output:

Upstream name: test; Backend server count: 60
        server 10.0.136.64:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.172:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.254:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.81:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.74:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.68:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.26:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.169:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.32:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.48:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.128:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.124:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.144:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.43:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.162:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.138:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.35:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.38:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.73:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.49:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.77:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.135.140:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.71:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.98:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.173:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.170:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.127:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.23:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.135.158:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.156:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.31:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.51:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.168:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.54:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.123:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.171:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.122:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.195:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.39:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.243:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.122:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.61:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.118:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.92:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.212:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.65:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.36:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.135.139:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.60:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.179:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.192:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.28:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.58:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.57:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.136.45:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.135.159:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.51:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.137.52:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.121:8080 weight=100 max_fails=0 fail_timeout=10s;
        server 10.0.134.116:8080 weight=100 max_fails=0 fail_timeout=10s;

I generate the load simply by using: seq 400 | parallel --pipe -N 30 --round-robin -j30 parallel -j30 -k curl http://myserver

@gfrankliu
Copy link
Collaborator

Is your etcd index changing frequently? upsync watches 127.0.0.1:2379/v2/keys/upsync/test . The long poll will return whenever etcd has index changes, even if your "test" returns same list of servers. When this happens, upsync triggers nginx internal "reload", so the round robin will start from first server again. Depending on how frequent this happens, you may end up with only first few servers being used.

You can "stat" the dump file /etc/nginx/conf.d/upsync/test.inc and see if it is indeed updating (timestamp)

@Valeriyy
Copy link
Author

Valeriyy commented Jul 7, 2020

And I also noticed something interesting. If use parameter include in config with upstreams, that balancing with weith=1 so also is very uneven:

# for t in $(for i in {1..3}; do date "+%d/%b/%Y:%H:%M" --date "-$i min"; done); do grep -r "*$t.*GET \/ " /var/log/nginx/access.log | sed -r 's/.*upstream_addr:\s(.*):8080.*/\1/g'; done | sort | uniq -c | sort -nr
     36 10.0.137.61
     26 10.0.137.36
     26 10.0.134.123
     22 10.0.135.159
     19 10.0.137.31
     17 10.0.136.26
     16 10.0.137.60
     16 10.0.137.54
     16 10.0.137.51
     15 10.0.134.122
     13 10.0.137.74
     13 10.0.136.68
     12 10.0.137.156
     10 10.0.137.144
      9 10.0.137.162
      9 10.0.134.192
      8 10.0.137.81
      8 10.0.137.57
      8 10.0.137.32
      8 10.0.137.172
      8 10.0.136.48
      8 10.0.134.127
      8 10.0.134.124
      7 10.0.136.45
      7 10.0.135.158
      6 10.0.137.71
      6 10.0.134.98
      4 10.0.137.195
      4 10.0.136.64
      4 10.0.136.28
      4 10.0.134.128
      3 10.0.137.35
      3 10.0.137.169
      2 10.0.136.51
      1 10.0.137.77
      1 10.0.137.52
      1 10.0.137.243
      1 10.0.137.23
      1 10.0.137.171
      1 10.0.137.170
      1 10.0.137.168
      1 10.0.137.138
      1 10.0.136.65
      1 10.0.136.39
      1 10.0.135.139
      1 10.0.134.212
      1 10.0.134.179

If I don't use parameter include

upstream test {
    upsync 127.0.0.1:2379/v2/keys/upsync/test upsync_interval=5s upsync_timeout=5m upsync_type=etcd strong_dependency=off;
    upsync_dump_path /etc/nginx/conf.d/upsync/test.inc;
}

the result will be like this:

     12 10.0.137.168
     10 10.0.137.73
     10 10.0.137.61
     10 10.0.137.51
     10 10.0.137.144
      9 10.0.137.81
      9 10.0.137.77
      9 10.0.137.57
      9 10.0.136.68
      9 10.0.134.98
      8 10.0.137.92
      8 10.0.137.71
      8 10.0.137.52
      8 10.0.136.28
      8 10.0.136.26
      8 10.0.135.159
      8 10.0.134.192
      8 10.0.134.128
      8 10.0.134.123
      8 10.0.134.121
      7 10.0.137.60
      7 10.0.137.58
      7 10.0.137.36
      7 10.0.137.31
      7 10.0.137.254
      7 10.0.137.195
      7 10.0.137.172
      7 10.0.137.170
      7 10.0.137.169
      7 10.0.137.122
      7 10.0.136.65
      7 10.0.136.49
      7 10.0.136.39
      7 10.0.134.179
      6 10.0.137.74
      6 10.0.137.162
      6 10.0.137.156
      6 10.0.137.138
      6 10.0.136.64
      6 10.0.136.51
      6 10.0.136.45
      6 10.0.135.139
      6 10.0.134.127
      5 10.0.137.54
      5 10.0.137.35
      5 10.0.137.243
      5 10.0.137.23
      5 10.0.134.212
      5 10.0.134.124
      5 10.0.134.118
      5 10.0.134.116
      4 10.0.137.32
      4 10.0.137.171
      4 10.0.136.48
      4 10.0.136.43
      4 10.0.134.122
      3 10.0.137.173
      3 10.0.136.38
      3 10.0.135.140
      2 10.0.135.158

But excluding the parameter include still doesn't solve the problem of uneven balancing if I use weights from 13 to 15

@Valeriyy
Copy link
Author

Valeriyy commented Jul 7, 2020

Is your etcd index changing frequently? upsync watches 127.0.0.1:2379/v2/keys/upsync/test . The long poll will return whenever etcd has index changes, even if your "test" returns same list of servers. When this happens, upsync triggers nginx internal "reload", so the round robin will start from first server again. Depending on how frequent this happens, you may end up with only first few servers being used.

You can "stat" the dump file /etc/nginx/conf.d/upsync/test.inc and see if it is indeed updating (timestamp)

Every 5 seconds each of the 60 servers PUT the request with TTL=120 to etcd about its backend. And I have a lot of responses from etcd with the status-code 401: {"errorCode":401,"message":"The event in requested index is outdated and cleared","cause":"the requested history has been cleared [487370589/487369756]","index":487371588}. But why upsync do GET request /v2/keys/upsync/test?wait=true&recursive=true&waitIndex=487369756 HTTP/1.0? With request /v2/keys/upsync/test I see all backends in upstream 'test' from JSON. This may affect the nginx balancing?
I really need what unavailable server to not be able to PUT information about itself to etcd, so I specify the TTL.

I don't see the point in checking backends and deleting entries from etcd using some script. upsync is very good if possible to store items in etcd with some TTL

@gfrankliu
Copy link
Collaborator

For testing, please stop those 60 servers from updating etcd and see if the result is different.

The TTL=120 is only applicable to the etcd dns interface, upsync module uses http interface and will get updates immediately (no TTL).

When upsync module first talks to etcd, it will save the index (from the X-Etcd-Index header in the response), so the next request will use that index, eg: /v2/keys/upsync/test?wait=true&recursive=true&waitIndex=487369756 . This long poll request will wait until etcd has a change (index grows bigger than the waitIndex). In your case, it seems your etcd changes way too fast. The 401 status code basically says etcd already has changed more than 1000 ( in your error message 487370589-487369756 > 1000), and since etcd only "remembers" max history of 1000, it throws that error. Since upsync long poll failed, it has to make a normal request (without the waitIndex). You need to stop unnecessary updates (I think even PUT same data to etcd also triggers index increase). Otherwise upsync modules keeps getting updated server list (even if there are no changes), and that triggers load balancer to go back to first server again.

@Valeriyy
Copy link
Author

Valeriyy commented Jul 8, 2020

For testing, please stop those 60 servers from updating etcd and see if the result is different.

The TTL=120 is only applicable to the etcd dns interface, upsync module uses http interface and will get updates immediately (no TTL).

Why do I want to use upsync? nginx checks the backend according to proxy_next_upstream values before proxying user request. Every time I have disaster in one of the data centers, I have overtime on serving a user requests, because nginx to check bunch of backends in searching of an available one. To avoid this problem, upstream list should always have only available backends. Checking backends for availability using some script and executing the command curl -X DELETE http://$etcd_ip:$port/v2/keys/upstreams/$upstream_name/$backend_ip:$backend_port for unavailable backends I don't like it.
If I PUT request with TTL=<short_time> to etcd, then after short time item from etcd will be deleted and the unavailable server will never be able to publish its backends in upstreams and I will never get overtime on checks of unavailable backends.

@gfrankliu
Copy link
Collaborator

Yes, upsync is exactly for your use case.

There are two ways to query etcd: dns or http. TTL is only used for dns queries. upsync uses http to query etcd, so the TTL doesn't matter. upsync watches etcd for any changes in realtime, so if you curl -X DELETE to update etcd, upsync will know immediately, unlike DNS query where we have to wait for TTL expires.

The problem you have is you are unnecessarily making frequent update to etcd, this causes upsync to update its upstream server list frequently and nginx has to go back to load balancer from first server.

@Valeriyy
Copy link
Author

Valeriyy commented Jul 9, 2020

There are two ways to query etcd: dns or http. TTL is only used for dns queries. upsync uses http to query etcd, so the TTL doesn't matter. upsync watches etcd for any changes in realtime, so if you curl -X DELETE to update etcd, upsync will know immediately, unlike DNS query where we have to wait for TTL expires.

I don't know what dns has to do with it. I'm just executing curl http://127.0.0.1:2379/v2/keys/upsync/test -XPUT -d value="{\"weight\":14, \"max_fails\":0, \"fail_timeout\":10}" -d ttl=120 on each from 60 servers. And if upsync do request /v2/keys/upsync/test? instead of request /v2/keys/upsync/test?wait=true&recursive=true&waitIndex=487369756, I would not have any problems.

The problem you have is you are unnecessarily making frequent update to etcd, this causes upsync to update its upstream server list frequently and nginx has to go back to load balancer from first server.

I understand. But how do I get rid of unavailable backends in upstrem? Need to develop script that will monitor backends and execute curl -X DELETE?

@gfrankliu
Copy link
Collaborator

The ttl you set via curl is ignored, unless you use etcd dns.

Everytime you execute that curl, the load balancing count get reset, so it starts from server 1 again. That's why you saw un-even. If you do the curl BEFORE you start the test, you shouldn't have a problem, unless you have something else also updating the etcd. Is your etcd dedicated for this test?

I already explained why waitIndex is used how it can optimize the query.

There are two things you can do for upstream:

  1. If your upstream is doing autoscaling, you can run the curl when it scales up, or when it scales down. You don't need to keep running it in the middle.
  2. You can use https://github.com/xiaokai-wang/nginx_upstream_check_module module so that nginx can check the health of upstream to remove automatically when not healthy

@Valeriyy
Copy link
Author

The ttl you set via curl is ignored, unless you use etcd dns.

I'm not arguing with you in any way, I'm just talking about observations. As well as features in etcd, which is called the 'ttl key': https://etcd.io/docs/v2/api/#using-key-ttl
I PUT a key in etcd with ttl, and after specified ttl, it self-deletes.

2\. You can use https://github.com/xiaokai-wang/nginx_upstream_check_module module so that nginx can check the health of upstream to remove automatically when not healthy

This is very interesting and wonderful module. But it is very rarely updated compared with upsync_module and there were cases with problems in compiled with latest versions nginx.
I think because upsync_module works with waitIndex in etcd, I will use script for transfer JSON with upstreams from the key with frequently updated data to the key that upsync_module will work with. This will avoid 'index is outdated'.

@Valeriyy
Copy link
Author

Valeriyy commented Jul 20, 2020

I think because upsync_module works with waitIndex in etcd, I will use script for transfer JSON with upstreams from the key with frequently updated data to the key that upsync_module will work with. This will avoid 'index is outdated'.

Unfortunately, this idea failed, because etcd sets X-Etcd-Index for the entire instance. Tell me if you can somehow disable the behavior of upsync so that it does not use X-Etcd-Index. Someone need to use waitIndex by default, but I want not to use it, is this possible?
And in general, it may make sense to look at the index in specific keys, not just globally look at X-Etcd-Index

@gfrankliu
Copy link
Collaborator

gfrankliu commented Jul 21, 2020

Using Index is just for optimization. You don't need to "tell" not to use it. If the Index is not usable, like in your case, it will automatically fall back to query etcd without using index. You can ignore the warning about 'index is outdated'. upsync is still working, and can detect if your upstream server is gone.

Since you are not using the Index optimization, you will not be able to detect upstream server gone immediately. If you want the TTL to be 120 seconds, you should set upsync upsync_interval=120s so that it only queries etcd every 120s.

@Valeriyy
Copy link
Author

I just don't confirm this case #71 with etcd version 3 and format v2. May be @CallMeFoxie was using etcd version 2.2 at the time

@gfrankliu
Copy link
Collaborator

I only use consul. We need someone else who uses etcd to confirm.
My understanding is etcd3 supports both v2 and v3 APIs. upsync only supports v2 API, so it should work with etc3 v2 API.

@Valeriyy
Copy link
Author

I only use consul. We need someone else who uses etcd to confirm.

It is not interesting when there is etcd with the ability ttle key. Again, I would use nginx_upstream_check_module if it was updated frequently and was compilling with tatest versions of nginx without any problems

My understanding is etcd3 supports both v2 and v3 APIs. upsync only supports v2 API, so it should work with etc3 v2 API.

Exactly, but I have a suspicion that the problem is only reproducible with etcd version 2.2

@gfrankliu
Copy link
Collaborator

Now I am confused as what exactly is the issue. Is it in the subject If using weights, upsync does not properly distribute requests between a large number of backends ? If so, please set upsync_interval=120s which matches your etcd TTL 120s, that should resolve your issue.

@Valeriyy
Copy link
Author

Now I am confused as what exactly is the issue.

Initially the problem was an uneven distribution of proxied requests between backends. But we have already decided that this is due to frequent updates of X-Etcd-Index. Why? Because I add the key with ttl=n to etcd every 5 seconds. I am interested in making sure that unavailable or dismantled backends don't PUT upstreams to etcd. My script with regular PUTing upstreams to etcd can be replaced with nginx_upstream_check_module, which will just quickly mark 'down=1' for unavailable backends. But as I wrote this module is rarely updated and has compilation problems with latest versions nginx. The only solution to upsync_module problem in my case is not use X-Etcd-Index. Moreover I have tested the case described in issue #71 and I didn't reproduce the problem that caused X-Etcd-Index to be used in upsync_module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants