Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of error case during outbound connection step in nym_network_requester::core crashes process #4375

Open
numbleroot opened this issue Feb 7, 2024 · 1 comment
Assignees

Comments

@numbleroot
Copy link

numbleroot commented Feb 7, 2024

On Nym version nym-binaries-v2023.5-rolo, the top-level process of a gateway with an embedded network requester crashes if the outwards TCP connection upon request by a Nym client results in an error:
https://github.com/nymtech/nym/blob/00600ddeeba4ed453c27bd7ea3b144d3d8560141/service-providers/network-requester/src/core.rs#L460:L477

In my case this happens when the destination domain (e.g., analytics.faw.sky.com) doesn't resolve to an IP address (no A record can be found). However, instead of signalling the requesting Nym client via the mixnet and gracefully returning from the function, the entire gateway process is crashed. I believe this is due to a missing shutdown.mark_as_success();, but I might be wrong, of course.

Relevant log lines of the gateway with embedded network requester:

ERROR nym_network_requester::core          > error while connecting to analytics.faw.sky.com:443: failed to lookup address information: No address associated with hostname
DEBUG TaskClient-NetworkRequester-child    > the task client is getting dropped
TRACE TaskClient-NetworkRequester-child    > Notifying stop on unexpected drop

As a test, I added the following lines after line 474 in above referenced code part:

if err.to_string() == "failed to lookup address information: No address associated with hostname" {
    shutdown.mark_as_success();
}

which is of course overly specific to my error case, but does seem to at least prevent the gateway process from crashing. (shutdown needs to become mut in the function parameters).

I got this inspiration from another place in the core.rs source file, handling a similar situation:
https://github.com/nymtech/nym/blob/00600ddeeba4ed453c27bd7ea3b144d3d8560141/service-providers/network-requester/src/core.rs#L552:L568

In particular, line 566:

shutdown.mark_as_success();

After adding this line, the relevant log parts of the gateway with embedded network requester change to:

ERROR nym_network_requester::core          > error while connecting to analytics.faw.sky.com:443: failed to lookup address information: No address associated with hostname
TRACE TaskClient-NetworkRequester-child    > the task client is getting dropped (but instructed to not signal)

Could it be the case that the effort to gracefully shut down or drop tasks hadn't reached this part of the code just yet and marking shutdown as success is all that is needed here to fix it? Or are more substantial changes needed here?

@numbleroot numbleroot changed the title Handling of error case during outbound connection step in nym_network_requester::core crashed process Handling of error case during outbound connection step in nym_network_requester::core crashes process Feb 7, 2024
@octol octol self-assigned this Feb 7, 2024
@octol
Copy link
Contributor

octol commented Feb 7, 2024

Yeah we had a few of these bugs with the graceful shutdown system, I'll have a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants