-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
atopacctd: don't hang when running in a linux container (it can't succeed though) #30
Open
sourcejedi
wants to merge
6
commits into
Atoptool:master
Choose a base branch
from
sourcejedi:atopacct2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I've seen my machine fail fork() sometimes :). I don't need this, it just makes me more comfortable working on this code.
The parent process waited to be killed. The problem is that unexpected termination of the child will cause the parent to hang. This fixes a hang when running atopacctd in a systemd-nspawn container. It now exits immediately with an error. The problem was that netlink_open() calls exit() on failure, without killing the parent process. We also now avoid any possibility of killing an unexpected process, in case the *parent* process is terminated unexpectedly and its PID is re-used. These are two examples of why message-passing with a pipe is preferable to working with kill(). A systemd-specific comment is modified. AFAICT, a service readiness protocol is always necessary when you have another service depending on you (and you do not use socket activation). You might think atopacctd would be a candidate for such... In which case, systemd checking for a pidfile ASAP just provides a sanity check that was missing from historical implementations, highlighting that many daemons were not strictly implementing a service readiness protocol, and hence many boot "sequences" were actually a pile of race conditions. Official systemd propaganda for this is currently available[1]; it explicitly suggests the pipe approach. [1] https://www.freedesktop.org/software/systemd/man/daemon.html I'm not really clear from my reading, as to what extent this applied to atopacctd. For all I know, atop doesn't mind if atopacctd is started after it and atop manages to connect immediately when atopacctd is ready. However IMO this commit is a well-defined approach, and it solves a genuine annoyance.
I tried running atopacctd in my systemd-nspawn container. Let's provide a textual error description, which is easier to look up (google). Before: receive NETLINK family, errno -2 After: get NETLINK family for taskstats: No such file or directory Also clarifies where the failure is: we do have NETLINK, but we don't have the NETLINK family for taskstats. Google won't find this commit message. But for reference, the kernel code which generates this error in ctrl_getfamily() is: if (!res->netnsok && !net_eq(genl_info_net(info), &init_net)) { /* family doesn't exist here */ return -ENOENT; } Amusingly, it is indeed permitted to send TASKSTATS_CMD_ATTR_REGISTER_CPUMASK inside a container which shares the init netns (which of course does not share the pid namespace). Dunno if this would considered an info leak, but at least acct() fails in this environment.
Imaging installing atop in a systemd-nspawn container, enabling atopacctd, and rebooting. atopacctd will fail to start in the container. However, if you try to run it manually, in order to see the error message or to troubleshoot it with `strace`, it will then start failing with a *different* error. /var/run/pacct_shadow.d: File exists This is an unfortunate obstacle to troubleshooting. Let's fix it... Currently we don't have code to clean `pacct_shadow.d/` in the event of atopacctd crashing. And the failure during startup behaves similar to a crash. This commit provides a workaround. If `pacct_shadow.d` already exists, then ignore the mkdir() failure. If it's not a directory like we need, we'll notice a failure later on anyway.
Let's skip on containers here, at least for now. Currently, Linux Containers don't support process accounting (and failed services are painful for Debian-style packages which start the service at install time). Quietly ignoring a service like this, is a big part of how systemd manages to run standard distributions inside a container (e.g. skipping systemd-udev). (The earlier approach with LXC was for the container to configure^Wpatch the distribution into submission... it makes me not inclined to feel very guilty for not having an idea for the _sysvinit_ script). I'm not sure how useful atop is in containers in practice... however it can be used as a cheaper way to inspect atop scripts etc. used by a specific distribution, or developing configuration management to install atop. I believe there's a non-zero population who have used systemd-nspawn as "chroot on steroids."
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(Summary of commit messages)
This fixes a hang when running atopacctd in a systemd-nspawn container.
It now exits immediately with an error.
The error message is improved.
This is not enough to avoid unpleasantness with the Debian package, since it tries to starts atopacctd immediately (hence apt fails, leaving the package marked as "half-configured"). I'm not 100% sure what more to do, because Linux containers might implement some process accounting scheme later... but for now I've added
ConditionVirtualization=!container
toatopacct.service
.