-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setting error exit code when exec* errors #65
Conversation
otherwise when exec* would fail, its the error is indicated by errno however no appropriate exit code would be set and the overall subprocess would be marked as successful with exit code of 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
execve
and execv
return only when there's an error, so we don't need to check the return value and the abort()
is unreachable now, right? From exec(3)
:
The exec() functions return only if an error has occurred. The return value is -1, and errno is set to indicate the error.
That is, I believe b2a881d behaves identically to:
static void
subproc_do_exec(subprocess_t *ctx)
{
if (ctx->envp) {
execve(ctx->cmd, ctx->argv, ctx->envp);
}
else {
execv(ctx->cmd, ctx->argv);
}
// Insert some comment here.
fprintf(stderr, "%s: %s\n", ctx->cmd, strerror(errno));
exit(1);
}
So if the behavior is correct, let's write it closer to that?
But was there a particular reason that we previously had abort()
? I see that we have:
Line 589 in b2a881d
signal(SIGABRT, SIG_DFL); |
The default behavior of abort()
is to not perform any cleanup, such as closing open files, flushing buffers, or calling functions registered with atexit
. exit
does those things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since execv
and execve
don't return on success, you could just do:
if (ctx->envp) {
execve(ctx->cmd, ctx->argv, ctx->envp);
} else {
execv(ctx->cmd, ctx->argv);
}
fprintf(stderr, "%s: %s\n",ctx->cmd, strerror(errno));
exit(1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also, are you sure that abort()
is resulting in an exit code of 0? I'm naively assuming abort()
is a nim internal function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. found out about this on latest chalk when looking into docker error logs. when docker
is not in PATH
exec*
fails and abort would get marked by switchboard with exit code of 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directly on dev
branch:
➜ C_INCLUDE_PATH=$PWD/c inim
👑 INim 0.6.1
Nim Compiler Version 2.0.0 [Linux: amd64] at /root/.nimble/bin/nim
nim> import subproc
nim> echo(runCmdGetEverything("docker", @["version"], passthrough = true).getExit())
Client:
Version: 24.0.7
API version: 1.43
Go version: go1.21.3
Git commit: afdd53b4e3
Built: Sun Oct 29 15:42:02 2023
OS/Arch: linux/amd64
Context: default
Server:
Engine:
Version: 24.0.7
API version: 1.43 (minimum version 1.12)
Go version: go1.21.3
Git commit: 311b9ff0aa
Built: Sun Oct 29 15:42:02 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.7.12
GitCommit: 71909c1814c544ac47ab91d2e8b84718e517bb99.m
runc:
Version: 1.1.11
GitCommit:
docker-init:
Version: 0.19.0
GitCommit: de40ad0
0
nim> echo(runCmdGetEverything("foo", @[], passthrough = true).getExit())
Traceback (most recent call last)
/tmp/inim_1711112083.nim(10) inim_1711112083
/root/Code/co/nimutils/nimutils/subproc.nim(475) runCmdGetEverything
/root/Code/co/nimutils/nimutils/subproc.nim(427) runCommand
SIGABRT: Abnormal termination.
0
you can see there is an abort but exit code is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm naively assuming
abort()
is a nim internal function?
It's the libc abort()
function, right?.
See abort(3)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viega was there a reason for using abort()
vs exit()
? not sure if there is a way to specify exit code while using abort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The musl abort
implemention in its src/exit/abort.c
:
#include <stdlib.h>
#include <signal.h>
#include "syscall.h"
#include "pthread_impl.h"
#include "atomic.h"
#include "lock.h"
#include "ksigaction.h"
_Noreturn void abort(void)
{
raise(SIGABRT);
/* If there was a SIGABRT handler installed and it returned, or if
* SIGABRT was blocked or ignored, take an AS-safe lock to prevent
* sigaction from installing a new SIGABRT handler, uninstall any
* handler that may be present, and re-raise the signal to generate
* the default action of abnormal termination. */
__block_all_sigs(0);
LOCK(__abort_lock);
__syscall(SYS_rt_sigaction, SIGABRT,
&(struct k_sigaction){.handler = SIG_DFL}, 0, _NSIG/8);
__syscall(SYS_tkill, __pthread_self()->tid, SIGABRT);
__syscall(SYS_rt_sigprocmask, SIG_UNBLOCK,
&(long[_NSIG/(8*sizeof(long))]){1UL<<(SIGABRT-1)}, 0, _NSIG/8);
/* Beyond this point should be unreachable. */
a_crash();
raise(SIGKILL);
_Exit(127);
}
That's the implementation on musl master
, though. I didn't look yet at what version we end up using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think every version of abort()
I can think of returns an exit code of 127. The issue would probably be in the subprocess monitoring; I think because of the signal, it's learning the process is gone before there is an exit code to reap.
The reason the patch 'works' is because there's no SIGABRT getting passed to the process group leader.
So I'm fine with this as a tactical thing, because doing better in the signal handler is a lot more work, and not a huge priority.
But, it's still the part that is technically not right, so this isn't a general-purpose fix, just a fix the way we call docker.
More specifically, I'm sure I'm using |
otherwise when exec* would fail, its error is indicated by errno however no appropriate exit code would be set and the overall subprocess would be marked as successful with exit code of 0