Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting error exit code when exec* errors #65

Merged
merged 3 commits into from
Mar 25, 2024
Merged

setting error exit code when exec* errors #65

merged 3 commits into from
Mar 25, 2024

Conversation

miki725
Copy link
Contributor

@miki725 miki725 commented Mar 21, 2024

otherwise when exec* would fail, its error is indicated by errno however no appropriate exit code would be set and the overall subprocess would be marked as successful with exit code of 0

otherwise when exec* would fail, its the error is indicated by errno
however no appropriate exit code would be set and the overall subprocess
would be marked as successful with exit code of 0
Copy link

@ee7 ee7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execve and execv return only when there's an error, so we don't need to check the return value and the abort() is unreachable now, right? From exec(3):

The exec() functions return only if an error has occurred. The return value is -1, and errno is set to indicate the error.

That is, I believe b2a881d behaves identically to:

static void
subproc_do_exec(subprocess_t *ctx)
{
    if (ctx->envp) {
        execve(ctx->cmd, ctx->argv, ctx->envp);
    }
    else {
        execv(ctx->cmd, ctx->argv);
    }
    // Insert some comment here.
    fprintf(stderr, "%s: %s\n", ctx->cmd, strerror(errno));
    exit(1);
}

So if the behavior is correct, let's write it closer to that?

But was there a particular reason that we previously had abort()? I see that we have:

signal(SIGABRT, SIG_DFL);

The default behavior of abort() is to not perform any cleanup, such as closing open files, flushing buffers, or calling functions registered with atexit. exit does those things.

Copy link

@drraid drraid Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since execv and execve don't return on success, you could just do:

    if (ctx->envp) {
      execve(ctx->cmd, ctx->argv, ctx->envp);
    } else {
      execv(ctx->cmd, ctx->argv);
    }
    fprintf(stderr, "%s: %s\n",ctx->cmd, strerror(errno));
    exit(1);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, are you sure that abort() is resulting in an exit code of 0? I'm naively assuming abort() is a nim internal function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. found out about this on latest chalk when looking into docker error logs. when docker is not in PATH exec* fails and abort would get marked by switchboard with exit code of 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly on dev branch:

➜ C_INCLUDE_PATH=$PWD/c inim
👑 INim 0.6.1
Nim Compiler Version 2.0.0 [Linux: amd64] at /root/.nimble/bin/nim
nim> import subproc
nim> echo(runCmdGetEverything("docker", @["version"], passthrough = true).getExit())
Client:
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.21.3
 Git commit:        afdd53b4e3
 Built:             Sun Oct 29 15:42:02 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.21.3
  Git commit:       311b9ff0aa
  Built:            Sun Oct 29 15:42:02 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.12
  GitCommit:        71909c1814c544ac47ab91d2e8b84718e517bb99.m
 runc:
  Version:          1.1.11
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
0
nim> echo(runCmdGetEverything("foo", @[], passthrough = true).getExit())
Traceback (most recent call last)
/tmp/inim_1711112083.nim(10) inim_1711112083
/root/Code/co/nimutils/nimutils/subproc.nim(475) runCmdGetEverything
/root/Code/co/nimutils/nimutils/subproc.nim(427) runCommand
SIGABRT: Abnormal termination.
0

you can see there is an abort but exit code is 0

Copy link

@ee7 ee7 Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm naively assuming abort() is a nim internal function?

It's the libc abort() function, right?.

See abort(3).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viega was there a reason for using abort() vs exit()? not sure if there is a way to specify exit code while using abort

Copy link

@ee7 ee7 Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The musl abort implemention in its src/exit/abort.c:

#include <stdlib.h>
#include <signal.h>
#include "syscall.h"
#include "pthread_impl.h"
#include "atomic.h"
#include "lock.h"
#include "ksigaction.h"

_Noreturn void abort(void)
{
	raise(SIGABRT);

	/* If there was a SIGABRT handler installed and it returned, or if
	 * SIGABRT was blocked or ignored, take an AS-safe lock to prevent
	 * sigaction from installing a new SIGABRT handler, uninstall any
	 * handler that may be present, and re-raise the signal to generate
	 * the default action of abnormal termination. */
	__block_all_sigs(0);
	LOCK(__abort_lock);
	__syscall(SYS_rt_sigaction, SIGABRT,
		&(struct k_sigaction){.handler = SIG_DFL}, 0, _NSIG/8);
	__syscall(SYS_tkill, __pthread_self()->tid, SIGABRT);
	__syscall(SYS_rt_sigprocmask, SIG_UNBLOCK,
		&(long[_NSIG/(8*sizeof(long))]){1UL<<(SIGABRT-1)}, 0, _NSIG/8);

	/* Beyond this point should be unreachable. */
	a_crash();
	raise(SIGKILL);
	_Exit(127);
}

That's the implementation on musl master, though. I didn't look yet at what version we end up using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every version of abort() I can think of returns an exit code of 127. The issue would probably be in the subprocess monitoring; I think because of the signal, it's learning the process is gone before there is an exit code to reap.

The reason the patch 'works' is because there's no SIGABRT getting passed to the process group leader.
So I'm fine with this as a tactical thing, because doing better in the signal handler is a lot more work, and not a huge priority.

But, it's still the part that is technically not right, so this isn't a general-purpose fix, just a fix the way we call docker.

@viega
Copy link
Contributor

viega commented Mar 22, 2024

More specifically, I'm sure I'm using waitpid() with a timeout, and it's coming back immediately when the signal is raised, which would most likely be delivered synchronously with the signal processing. Meanwhile, the program will definitely catch the signal even if in libc, but get to run some code to set the return value. But we're getting notified before that's available anyway.

@miki725 miki725 merged commit 7f2baf3 into dev Mar 25, 2024
2 checks passed
@miki725 miki725 deleted the ms/exit branch March 25, 2024 17:00
miki725 added a commit to crashappsec/n00b that referenced this pull request Nov 1, 2024
miki725 added a commit to crashappsec/n00b that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants