Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker Error and Training Failure on Running Script #1

Open
zxkyjimmy opened this issue Feb 13, 2024 · 1 comment
Open

Worker Error and Training Failure on Running Script #1

zxkyjimmy opened this issue Feb 13, 2024 · 1 comment

Comments

@zxkyjimmy
Copy link

Hi, I attempted to run your script but encountered an unexpected error. The attached terminal output screenshot indicated that the training process failed to complete.

Screenshot 2024-02-13 at 19 33 01

Below is the command line input that led to the error:

scripts/start-container.sh
tools/quick-run.sh train go az 300 -n go_9x9_az_n200 -conf_str env_board_size=9:actor_num_simulation=200

Please help me understand what is going wrong or provide any guidance on resolving this issue.

Thank you so much for your help.

@moporgic
Copy link
Contributor

Hi, thanks for your report.

This is a known issue caused by the zero server when it receives an unexpected incoming message.
Since the zero server opens a public port for worker connections, any client (including unwanted clients) may connect to the server and send a message to it.

In your case, an unexpected message "_\sFjiS$NqFY:A'*+/." was sent to the server, which triggered a false alarm.
It is likely that this message was sent by a security software, which was checking network vulnerabilities just during the training.

This issue will be fixed in a future release. At current, a workaround is to remove these lines:

std::string error_message = message;
std::replace(error_message.begin(), error_message.end(), '\r', ' ');
std::replace(error_message.begin(), error_message.end(), '\n', ' ');
shared_data_.logger_.addWorkerLog("[Worker Error] \"" + error_message + "\"");
close();

PS, you can run the same command and follow the displayed instructions to continue the training at the point of interruption.

Hope you enjoy using minizero.
Have a nice day :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants