Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When RDMAClient::Connect() start trigger fi_connect and fi_eq_sread, it cost about 18.5s when RDMAServer::GetEvent(VineyardEventEntry& vineyard_entry) recv the msg. Why? #2023

Open
hsh258 opened this issue Nov 21, 2024 · 2 comments
Labels

Comments

@hsh258
Copy link

hsh258 commented Nov 21, 2024

Describe your problem

A clear and concise description of what your problem is. It might be a bug,
a feature request, or just a problem that need support from the vineyard team.

When RDMAClient::Connect() start trigger fi_connect and fi_eq_sread, it cost about 18.5s when Status RDMAServer::GetEvent(VineyardEventEntry& vineyard_entry) recv the msg. It is so long.
By the way,how to set the interval of client reconnect? tks

Status RDMAClient::Connect() {
CHECK_ERROR(!fi_connect(ep, fi->dest_addr, NULL, 0), "fi_connect failed.");
fi_eq_cm_entry entry;
uint32_t event;
CHECK_ERROR(
fi_eq_sread(eq, &event, &entry, sizeof(entry), -1, 0) == sizeof(entry),
"fi_eq_sread failed.");
if (event != FI_CONNECTED || entry.fid != &ep->fid) {
return Status::Invalid("Unexpected event:" + std::to_string(event));
}
return Status::OK();
}

Status RDMAServer::GetEvent(VineyardEventEntry& vineyard_entry) {
struct fi_eq_cm_entry entry;
uint32_t event;
while (true) {
int rd = fi_eq_sread(eq, &event, &entry, sizeof entry, 500, 0);
if (rd < 0 && (rd != -FI_ETIMEDOUT && rd != -FI_EAGAIN)) {
return Status::IOError("fi_eq_sread broken. ret:" + std::to_string(rd));
}
if (rd == -FI_ETIMEDOUT || rd == -FI_EAGAIN) {
if (state == STOPED) {
return Status::Invalid("Server is stoped.");
}
continue;
}
if (event == FI_SHUTDOWN) {
fid_ep* closed_ep = container_of(entry.fid, fid_ep, fid);
RemoveClient(closed_ep);
continue;
}
vineyard_entry.fi = entry.info;
vineyard_entry.event_id = event;
vineyard_entry.fid = entry.fid;
return Status::OK();
}
}

If is is a bug report, to help us reproducing this bug, please provide information below:

  1. Your Operation System version (uname -a):
  2. The version of vineyard you use (vineyard.__version__):
  3. Versions of crucial packages, such as gcc, numpy, pandas, etc.:
  4. Full stack of the error (if there are a crash):
  5. Minimized code to reproduce the error:

If it is a feature request, please provides a clear and concise description of what you want to happen:

What is the problem:

The behaviour that you expect to work:

Additional context

Add any other context about the problem here.

@dashanji
Copy link
Member

dashanji commented Dec 4, 2024

/cc @vegetableysm

@github-actions github-actions bot added the stale label Dec 12, 2024
Copy link
Contributor

/cc @sighingnow, this issus/pr has had no activity for a long time, please help to review the status and assign people to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants