You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are encountering a name resolution issue when adding a server. The nuraft::add_srv() method is successful but the subsequent internal asynchronous connection (from the leader) to the (follower) server appears to be failing with a name resolution issue (see below log excerpts). So, the server is unable to successfully join the cluster.
We are using NuRaft on Linux with blocking mode. From the (leader) server we can successfully resolve the (follower) server using name resolution tools. Manual entries in /etc/hosts does not resolve the issue.
Are there any settings that allows the nuraft::add_srv() call to be more synchronous and create the connection during the nuraft::add_srv() call?
Are there any settings to configure or use an alternate name resolution?
Any guidance or insight is appreciated,
Mark.
LOG EXCERPTS
05/25/2022 10:24:37.597 PID:409 TID:140421944866560 [process_req] Receive a add_server_request message from 0 with LastLogIndex=0, LastLogTerm=0, EntriesLength=1, CommitIndex=0 and Term=0(raft_server.cxx:628)
05/25/2022 10:24:37.605 PID:409 TID:140421944866560 [asio_rpc_client] asio client created: 0x7fb65004ce98(asio_service.cxx:860)
05/25/2022 10:24:37.614 PID:409 TID:140421944866560 [send_req] send req 1000 -> 1002, type join_cluster_request(peer.cxx:44)
05/25/2022 10:24:37.622 PID:409 TID:140421944866560 [send] socket 0x7fb65004ce98 to pemjm-2.policyjobmgr.nbux.svc.cluster.local:2634 is not opened yet(asio_service.cxx:946)
05/25/2022 10:24:37.630 PID:409 TID:140421944866560 [invite_srv_to_join_cluster] sent join request to peer 1002, pemjm-2.policyjobmgr.nbux.svc.cluster.local:2634(handle_join_leave.cxx:134)
05/25/2022 10:24:37.639 PID:409 TID:140421944866560 [process_req] Response back a add_server_response message to 1000 with Accepted=1, Term=1, NextIndex=3(raft_server.cxx:698)
05/25/2022 10:24:38.047 PID:409 TID:140421229422336 [handle_rpc_result] resp of req 1000 -> 1002, type join_cluster_request, failed to resolve host pemjm-2.policyjobmgr.nbux.svc.cluster.local due to error 1, Host not found (authoritative)(peer.cxx:107)
05/25/2022 10:24:38.648 PID:409 TID:140421229422336 [handle_ext_resp_err] receive an rpc error response from peer server, failed to resolve host pemjm-2.policyjobmgr.nbux.svc.cluster.local due to error 1, Host not found (authoritative) 12(raft_server.cxx:1408)
05/25/2022 10:24:38.714 PID:409 TID:140421229422336 [handle_ext_resp_err] retry the request(raft_server.cxx:1448)
05/25/2022 10:24:40.831 PID:409 TID:140421221029632 [on_retryable_req_err] retry the request join_cluster_request for 1002(raft_server.cxx:1460)
05/25/2022 10:24:40.865 PID:409 TID:140421221029632 [send_req] send req 1000 -> 1002, type join_cluster_request(peer.cxx:44)
05/25/2022 10:24:40.881 PID:409 TID:140421221029632 [send_req] rpc local is null(peer.cxx:53)
The text was updated successfully, but these errors were encountered:
There is no way to make "add server" synchronous. Instead, it is possible to provide an API to attach your custom resolver to NuRaft so that you can return the IP address from the given host. Does it make sense to you?
Hi,
We are encountering a name resolution issue when adding a server. The nuraft::add_srv() method is successful but the subsequent internal asynchronous connection (from the leader) to the (follower) server appears to be failing with a name resolution issue (see below log excerpts). So, the server is unable to successfully join the cluster.
We are using NuRaft on Linux with blocking mode. From the (leader) server we can successfully resolve the (follower) server using name resolution tools. Manual entries in /etc/hosts does not resolve the issue.
Are there any settings that allows the nuraft::add_srv() call to be more synchronous and create the connection during the nuraft::add_srv() call?
Are there any settings to configure or use an alternate name resolution?
Any guidance or insight is appreciated,
Mark.
LOG EXCERPTS
05/25/2022 10:24:37.597 PID:409 TID:140421944866560 [process_req] Receive a add_server_request message from 0 with LastLogIndex=0, LastLogTerm=0, EntriesLength=1, CommitIndex=0 and Term=0(raft_server.cxx:628)
05/25/2022 10:24:37.605 PID:409 TID:140421944866560 [asio_rpc_client] asio client created: 0x7fb65004ce98(asio_service.cxx:860)
05/25/2022 10:24:37.614 PID:409 TID:140421944866560 [send_req] send req 1000 -> 1002, type join_cluster_request(peer.cxx:44)
05/25/2022 10:24:37.622 PID:409 TID:140421944866560 [send] socket 0x7fb65004ce98 to pemjm-2.policyjobmgr.nbux.svc.cluster.local:2634 is not opened yet(asio_service.cxx:946)
05/25/2022 10:24:37.630 PID:409 TID:140421944866560 [invite_srv_to_join_cluster] sent join request to peer 1002, pemjm-2.policyjobmgr.nbux.svc.cluster.local:2634(handle_join_leave.cxx:134)
05/25/2022 10:24:37.639 PID:409 TID:140421944866560 [process_req] Response back a add_server_response message to 1000 with Accepted=1, Term=1, NextIndex=3(raft_server.cxx:698)
05/25/2022 10:24:38.047 PID:409 TID:140421229422336 [handle_rpc_result] resp of req 1000 -> 1002, type join_cluster_request, failed to resolve host pemjm-2.policyjobmgr.nbux.svc.cluster.local due to error 1, Host not found (authoritative)(peer.cxx:107)
05/25/2022 10:24:38.648 PID:409 TID:140421229422336 [handle_ext_resp_err] receive an rpc error response from peer server, failed to resolve host pemjm-2.policyjobmgr.nbux.svc.cluster.local due to error 1, Host not found (authoritative) 12(raft_server.cxx:1408)
05/25/2022 10:24:38.714 PID:409 TID:140421229422336 [handle_ext_resp_err] retry the request(raft_server.cxx:1448)
05/25/2022 10:24:40.831 PID:409 TID:140421221029632 [on_retryable_req_err] retry the request join_cluster_request for 1002(raft_server.cxx:1460)
05/25/2022 10:24:40.865 PID:409 TID:140421221029632 [send_req] send req 1000 -> 1002, type join_cluster_request(peer.cxx:44)
05/25/2022 10:24:40.881 PID:409 TID:140421221029632 [send_req] rpc local is null(peer.cxx:53)
The text was updated successfully, but these errors were encountered: