Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nginx-upsync-module-2.1.3 + tengine 2.3.3 生产环境频繁coredump #300

Open
yourchanges opened this issue Feb 17, 2022 · 1 comment
Open

Comments

@yourchanges
Copy link

yourchanges commented Feb 17, 2022

Ⅰ. Issue Description

最近新引入和启用 了 nginx-upsync-module-2.1.3 ( https://github.com/weibocom/nginx-upsync-module ), tengine 2.3.3 生产环境频繁coredump

Ⅱ. Describe what happened

tengine 2.3.3 生产环境频繁coredump

Ⅲ. Describe what you expected to happen

正常运行

Ⅳ. How to reproduce it (as minimally and precisely as possible)

生产正常运行, 每天产生几十过 coredump文件, 看了都是同一个位置导致的
分析coredump文件如下:

[root@saas1 coredump]# gdb  /usr/local/nginx/sbin/nginx core.599714 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7

Reading symbols from /usr/local/nginx/sbin/nginx...done.
BFD: Warning: /home/coredump/core.599714 is truncated: expected core file size >= 10044428288, found: 104857600.
[New LWP 599714]
Cannot access memory at address 0x7f0e36636128
Cannot access memory at address 0x7f0e36636120
Failed to read a valid object file image from memory.
Core was generated by `nginx: worker process                                          '.
Program terminated with signal 11, Segmentation fault.
#0  ngx_http_upstream_get_peer (rrp=0x29379a0) at src/http/ngx_http_upstream_round_robin.c:642
642	src/http/ngx_http_upstream_round_robin.c: 没有那个文件或目录.
(gdb) bt
#0  ngx_http_upstream_get_peer (rrp=0x29379a0) at src/http/ngx_http_upstream_round_robin.c:642
#1  ngx_http_upstream_get_round_robin_peer (pc=<error reading variable: Cannot access memory at address 0x7fff19ac2ea0>, 
    pc@entry=<error reading variable: Cannot access memory at address 0x7fff19ac2ef8>, data=0x29379a0) at src/http/ngx_http_upstream_round_robin.c:532
(gdb) 
(gdb) p rrp
$1 = (ngx_http_upstream_rr_peer_data_t *) 0x29379a0
(gdb) p rrp->peers
$2 = (ngx_http_upstream_rr_peers_t *) 0x38f4730
(gdb) p rrp->peers->peer
$3 = (ngx_http_upstream_rr_peer_t *) 0x2fb0ee0
(gdb) p rrp->peers->peer->next
$4 = (ngx_http_upstream_rr_peer_t *) 0x2fb0df0
(gdb) p rrp->peers->peer->next->next
$5 = (ngx_http_upstream_rr_peer_t *) 0x2fb0d00
(gdb) p rrp->peers->peer->next->next->next
$6 = (ngx_http_upstream_rr_peer_t *) 0x2fb0c10
(gdb) p rrp->peers->peer->next->next->next->next
$7 = (ngx_http_upstream_rr_peer_t *) 0x0
(gdb) 
(gdb) p rrp->peers->peer->down
$8 = 1
(gdb) p rrp->peers->peer->next->down
$9 = 0
(gdb) p rrp->peers->peer->next->next->down
$10 = 1
(gdb) p rrp->peers->peer->next->next->next->down
$11 = 0
(gdb) p rrp->peers->peer->next->next->next->next->down
Cannot access memory at address 0xb0
(gdb) 

最近新引入和启用 了 nginx-upsync-module-2.1.3 ( https://github.com/weibocom/nginx-upsync-module )

Ⅵ. Environment:

  • Tengine version (use sbin/nginx -V):
/usr/local/nginx/sbin/nginx -V
Tengine version: Tengine/2.3.3
nginx version: nginx/1.18.0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) 
built with OpenSSL 1.1.1m  14 Dec 2021
TLS SNI support enabled
configure arguments: --prefix=/usr/local/nginx --with-http_stub_status_module --with-http_gzip_static_module --with-http_ssl_module --with-http_v2_module --with-openssl=../openssl-1.1.1m --with-pcre=../pcre-8.43/ --with-zlib=../zlib-1.2.11 --with-http_lua_module --with-luajit-lib=/usr/local/lib/ --with-luajit-inc=/usr/local/include/luajit-2.1/ --with-lua-inc=/usr/local/include/luajit-2.1/ --with-lua-lib=/usr/local/lib/ --with-ld-opt=-Wl,-rpath, --add-module=modules/ngx_http_concat_module --add-module=modules/ngx_http_upstream_session_sticky_module --add-module=modules/ngx_http_reqstat_module --add-module=modules/ngx_http_upstream_check_module --add-module=modules/ngx_http_trim_filter_module --add-module=modules/ngx_http_footer_filter_module --add-module=modules/ngx_http_upstream_consistent_hash_module --add-module=modules/ngx_http_upstream_dynamic_module --add-module=modules/ngx_http_user_agent_module --add-module=modules/ngx_http_upstream_dyups_module --add-module=modules/ngx_http_upstream_vnswrr_module --add-module=../nginx-upsync-module-2.1.3
  • OS (e.g. from /etc/os-release): centos 7
  • Kernel (e.g. uname -a): CentOS Linux release 7.7.1908 (Core)
# uname -an
Linux saas1 3.10.0-1062.18.1.el7.x86_64 #1 SMP Tue Mar 17 23:49:17 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Others:

tengine 官方也提了issue: alibaba/tengine#1609

@yourchanges
Copy link
Author

upstream conf:

upstream oamain {
    server 127.0.0.1:8080;
    upsync 127.0.0.1:8500/v1/kv/upstreams/oa/ upsync_timeout=6m upsync_interval=500ms upsync_type=consul strong_dependency=off;
    upsync_dump_path /usr/local/nginx/conf/proxy/oa.upstream;
    include /usr/local/nginx/conf/proxy/oa.upstream;  
}

# cat /usr/local/nginx/conf/proxy/oa.upstream 
server 192.168.254.54:80 weight=1 max_fails=2 fail_timeout=10s;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant