Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runInLinuxVM: fix for structured attrs #354535

Merged
merged 1 commit into from
Nov 23, 2024

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Nov 8, 2024

Closes #334705
Addresses #205690

The main issue was that the output variable (i.e. $out and friends) didn't exist. I figured the easiest way to add those is to source stdenv here. Given that we build another derivation in this builder, it's pretty likely that stdenv gets pulled already, so I don't expect a real overhead here.

Also, this mounts /build into the VM: this is required to make sure .attrs.json & .attrs.sh are available. Dropped the mount of xchg into /tmp now since it's also part of /build.

cc @ShamrockLee

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Copy link
Contributor

@ShamrockLee ShamrockLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for working on this problem!

pkgs/build-support/vm/default.nix Outdated Show resolved Hide resolved
pkgs/build-support/vm/default.nix Outdated Show resolved Hide resolved
@Ma27 Ma27 force-pushed the runInLinuxVM-structured-attrs branch from d633c90 to 487dd26 Compare November 8, 2024 18:03
@Ma27
Copy link
Member Author

Ma27 commented Nov 8, 2024

Thanks for the review, squashed the changes into the previous commit.

Copy link
Contributor

@ShamrockLee ShamrockLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with hello made structured built with runInLinuxVM. LGTM!

nix-build --no-out-link --expr "with (import ./. { }); vmTools.runInLinuxVM (hello.overrideAttrs { __structuredAttrs = true; })"

Adding a test under pkgs.tests would be great, but that's not a blocker.

I'm unfamiliar with the implementation of vm.nix-related build helpers. Having another pair of eyes from someone familiar with this topic would be safer.

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Nov 8, 2024
@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Nov 8, 2024
pkgs/build-support/vm/default.nix Outdated Show resolved Hide resolved
pkgs/build-support/vm/default.nix Show resolved Hide resolved
pkgs/build-support/vm/default.nix Outdated Show resolved Hide resolved
@wolfgangwalther
Copy link
Contributor

Because my comments are naturally not on the right lines, when they didn't change, here's what I tested with:

diff --git a/pkgs/build-support/vm/default.nix b/pkgs/build-support/vm/default.nix
index dc5c29fcb733..147a030ac139 100644
--- a/pkgs/build-support/vm/default.nix
+++ b/pkgs/build-support/vm/default.nix
@@ -169,6 +169,7 @@ rec {
 
   stage2Init = writeScript "vm-run-stage2" ''
     #! ${bash}/bin/sh
+    set -euo pipefail
     source /build/xchg/saved-env
     if [ -f "''${NIX_ATTRS_SH_FILE-}" ]; then
       source "$NIX_ATTRS_SH_FILE"
@@ -183,7 +184,6 @@ rec {
     export NIX_BUILD_TOP=/tmp
     export TMPDIR=/tmp
     export PATH=/empty
-    out="$1"
     cd "$NIX_BUILD_TOP"
 
     if ! test -e /bin/sh; then
@@ -240,10 +240,6 @@ rec {
 
 
   vmRunCommand = qemuCommand: writeText "vm-run" ''
-    if [ -f "''${NIX_ATTRS_SH_FILE-}" ]; then
-      source "$NIX_ATTRS_SH_FILE"
-    fi
-    source $stdenv/setup
     export > saved-env
 
     PATH=${coreutils}/bin
@@ -271,8 +267,6 @@ rec {
     ${qemuCommand}
     EOF
 
-    mkdir -p -m 0700 $out
-
     chmod +x ./run-vm
     source ./run-vm

Removing the mkdir line is, what I was not sure about. Not sure about why it is needed in the first place, that is.

@wolfgangwalther
Copy link
Contributor

The mkdir seems to have been added in aa5646f.

@wolfgangwalther
Copy link
Contributor

The mkdir seems to have been added in aa5646f.

I tried to build runInLinuxVM (samba), but it doesn't even build for me.

Having a fix for a very specific package in that place without a comment explaining why... seems entirely wrong. I think we should just remove the mkdir.

@wegank wegank removed the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Nov 17, 2024
Closes NixOS#334705
Addresses NixOS#205690

The main issue was that the output variable (i.e. `$out` and friends)
didn't exist. I figured the easiest way to add those is to source
`stdenv` here. Given that we build another derivation in this builder,
it's pretty likely that `stdenv` gets pulled already, so I don't expect
a real overhead here.

Also, this mounts `/build` into the VM: this is required to make sure
`.attrs.json` & `.attrs.sh` are available. Dropped the mount of `xchg`
into `/tmp` now since it's also part of `/build`.
Copy link
Contributor

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say anything about the temporary directory -> build directory part - I don't know enough about this code to understand the implications of that beyond the fact that it seems to fix things here ;)

Everything else LGTM. I didn't build again, just read through it.

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Nov 23, 2024
@Ma27 Ma27 merged commit 6deea16 into NixOS:master Nov 23, 2024
17 of 18 checks passed
@Ma27 Ma27 deleted the runInLinuxVM-structured-attrs branch November 23, 2024 08:35
@ShamrockLee
Copy link
Contributor

appimage.tests.image-hello-cowsay breaks after this commit, as $out is no longer occupied by runInLinuxVM during preVM, and the rm "$out" line fails.

It's a good change since $out is no longer arbitrarily occupied. Still, it needs a release note entry because of its backward incompatibility.

I should have tested apptainer.tests.image-hello-cowsay before approving this PR. That's my fault.

@ShamrockLee
Copy link
Contributor

In addition, the disk creation functionality with preVM seems to break when __structuredAttrs = true.

@wolfgangwalther
Copy link
Contributor

appimage.tests.image-hello-cowsay breaks after this commit, as $out is no longer occupied by runInLinuxVM during preVM, and the rm "$out" line fails.

I am seeing this failure when building apptainer.tests.image-hello-cowsay:

mounting Nix store...
mounting host's build directory...
starting stage 2 (/nix/store/wmp357hsbipw1hv71q39l24aj50llvnj-vm-run-stage2)
rmdir: failed to remove '/nix/store/7nwhbfr8kvc6rvhbq0xzx2xnz4d9przd-apptainer-image-hello-cowsay.sif': No such file or directory

This doesn't look like "$out is empty".

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Nov 24, 2024

This doesn't look like "$out is empty".

It's not "$out is empty", but "$out does not exist". (Here, "$out" is /nix/store/7nwhbfr8kvc6rvhbq0xzx2xnz4d9przd-apptainer-image-hello-cowsay.sif)

@wolfgangwalther
Copy link
Contributor

Ah ok, misunderstood. So it's this removed line that broke it?

mkdir -p -m 0700 $out

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Nov 24, 2024

Ah ok, misunderstood. So it's this removed line that broke it?

mkdir -p -m 0700 $out

I think so.

It's a good change, as most people would rm $out when using runInLinuxVM, but we must address the backward incompatibility.

@wolfgangwalther
Copy link
Contributor

and the rm "$out" line fails.

Grepping for rm "$out" didn't lead me anywhere until I realized that I needed to be looking for rmdir "$out" instead.

It's a good change, as most people would rm $out when using runInLinuxVM, but we must address the backward incompatibility.

I can only find two uses of runInLinuxVM in nixpkgs. Your's is easily fixed by removing the rmdir. Still have to understand how it's used for docker.

Is it used more often outside of nixpkgs?

@ShamrockLee
Copy link
Contributor

ShamrockLee commented Nov 24, 2024

Is it used more often outside of nixpkgs?

A search across GitHub could inform us about that.

Do you know how to exclude forked repositories from the GitHub code search result?

@wolfgangwalther
Copy link
Contributor

In addition, the disk creation functionality with preVM seems to break when __structuredAttrs = true.

I think that's because of this:

  createEmptyImage = {
    # Disk image size in MiB
    size,
    # Name that will be written to ${destination}/nix-support/full-name
    fullName,
    # Where to write the image files, defaulting to $out
    destination ? "$out"
  }: ''
    mkdir -p ${destination}
    diskImage=${destination}/disk-image.qcow2
    ${qemu}/bin/qemu-img create -f qcow2 $diskImage "${toString size}M"

    mkdir ${destination}/nix-support
    echo "${fullName}" > ${destination}/nix-support/full-name
  '';

This needs $out to be defined.

With my proposal in #354535 (comment) we removed the mkdir line, but also the loading of attrs at the beginning of vmRunCommand, assuming that $out was not needed anymore. We failed to see that eval "$preVM" could do anything, including depending on $out.

So I guess we should add those lines back:

    vmRunCommand = qemuCommand: writeText "vm-run" ''
+    if [ -f "''${NIX_ATTRS_SH_FILE-}" ]; then
+      source "$NIX_ATTRS_SH_FILE"
+    fi
+    source $stdenv/setup

WDYT?

@wolfgangwalther
Copy link
Contributor

Is it used more often outside of nixpkgs?

A search across GitHub could inform us about that.

Do you know how to exclude forked repositories from the GitHub code search result?

I don't know how to exclude forks, but I browsed through the results and randomly picked uses of runInLinuxVM. From what I can see nobody assumes that $out will be created for them - many have mkdir -p $out in their preVM script. Of course, in the preVM script $out was not created previously, yet. So anyone doing magic there needs to create it anyway.

I didn't see anyone removing the existing $out.

@wolfgangwalther
Copy link
Contributor

Still have to understand how it's used for docker.

It's used in runWithOverlay, which does a preVM = vmTools.createEmptyImage. This will create $out as a directory anyway. So this should continue to work as is.

I think the only broken case is indeed apptainer.tests.image-hello-cowsay.

@ShamrockLee
Copy link
Contributor

I don't know how to exclude forks, but I browsed through the results and randomly picked uses of runInLinuxVM. From what I can see nobody assumes that $out will be created for them - many have mkdir -p $out in their preVM script. Of course, in the preVM script $out was not created previously, yet. So anyone doing magic there needs to create it anyway.

I didn't see anyone removing the existing $out.

I see. Thank you for checking it out.

@ShamrockLee
Copy link
Contributor

In addition, the disk creation functionality with preVM seems to break when __structuredAttrs = true.

I think that's because of this:

  createEmptyImage = {
    # Disk image size in MiB
    size,
    # Name that will be written to ${destination}/nix-support/full-name
    fullName,
    # Where to write the image files, defaulting to $out
    destination ? "$out"
  }: ''
    mkdir -p ${destination}
    diskImage=${destination}/disk-image.qcow2
    ${qemu}/bin/qemu-img create -f qcow2 $diskImage "${toString size}M"

    mkdir ${destination}/nix-support
    echo "${fullName}" > ${destination}/nix-support/full-name
  '';

This needs $out to be defined.

With my proposal in #354535 (comment) we removed the mkdir line, but also the loading of attrs at the beginning of vmRunCommand, assuming that $out was not needed anymore. We failed to see that eval "$preVM" could do anything, including depending on $out.

So I guess we should add those lines back:

    vmRunCommand = qemuCommand: writeText "vm-run" ''
+    if [ -f "''${NIX_ATTRS_SH_FILE-}" ]; then
+      source "$NIX_ATTRS_SH_FILE"
+    fi
+    source $stdenv/setup

WDYT?

If it is needed to make preVM work, adding them back is sensible.

@ElvishJerricco
Copy link
Contributor

This broke systemd-boot tests, and probably all tests built with disk images.

@wolfgangwalther
Copy link
Contributor

This broke systemd-boot tests, and probably all tests built with disk images.

Just to double-check, this is with #358705 included, yes?

Can you give me a quick reproducer line what I need to build to see the failing test?

@wolfgangwalther
Copy link
Contributor

nixosTests.systemd-boot.basic fails for me on current master.

@wolfgangwalther
Copy link
Contributor

It does build fine right before this PR.

Relevant log when it succeeds:

[...]
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
loading kernel modules...
mounting Nix store...
mounting host's temporary directory...
starting stage 2 (/nix/store/iv0cw96ybnfncihnamdzhjlni6zr72xy-vm-run-stage2)
hwclock: select() to /dev/rtc0 to wait for clock tick timed out
tune2fs 1.47.1 (20-May-2024)
Setting maximal mount count to -1
Setting interval between checks to 0 seconds
Setting time filesystem last checked to Tue Nov 26 08:13:14 2024
[...]

The same part when it fails:

[...]
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
loading kernel modules...
mounting Nix store...
mounting host's build directory...
starting stage 2 (/nix/store/wmp357hsbipw1hv71q39l24aj50llvnj-vm-run-stage2)
hwclock: select() to /dev/rtc0 to wait for clock tick timed out
[   11.152361] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[...]

This is happening in stage 2.

When I revert the set -euo pipefail we have added there, then it passes again.

In fact when I only remove the set -e here, then it passes again.

This means that the current tests throw an error somewhere - but silently so. Imho, this PR just revealed a different problem. I am investigating further.

@wolfgangwalther
Copy link
Contributor

Ah, it's in the logs already:

hwclock: select() to /dev/rtc0 to wait for clock tick timed out

So, this means that this command is actually returning an error:

    ${util-linux}/bin/hwclock -s

@wolfgangwalther
Copy link
Contributor

It would be great to understand when the hwclock command started to fail.

@Ma27
Copy link
Member Author

Ma27 commented Nov 26, 2024

Thanks for tracking it down, was busy investigating a production issue at work.
Can try to find out what went wrong with the hwclock invocation this ~afternoon.

I think this might be a channel-blocker, so I guess if I don't get to a solution quickly, I'd temporarily remove the set -euo pipefail.

@wolfgangwalther
Copy link
Contributor

I think this might be a channel-blocker, so I guess if I don't get to a solution quickly, I'd temporarily remove the set -euo pipefail.

Alternatively, just swallow the error for hwclock manually with || true temporarily.

@ElvishJerricco
Copy link
Contributor

I'm not 100% that hwclock command is even needed anymore. The commit that added it is over 10 years old and doesn't provide an explanation for why it's there other than a bug in an ancient version of qemu.

@wolfgangwalther
Copy link
Contributor

I'm not 100% that hwclock command is even needed anymore. The commit that added it is over 10 years old and doesn't provide an explanation for why it's there other than a bug in an ancient version of qemu.

+1 for just removing those hacks.

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Nov 26, 2024
Commit 97ed6b4 broke the systemd-boot
tests (among others) because of the `hwclock -s` invocation. This was
broken for a while, but not noticed because we didn't have a `set -e`
before.

The error

    hwclock: select() to /dev/rtc0 to wait for clock tick timed out

MAY be related to an open QEMU bug[1]: I can't reproduce the error on
aarch64-linux and x86_64-linux with `partitionTableType = "legacy";`.
Also, the issue disappears on x86_64-linux when adding `--directisa`.

However, the invocation was added in f73ff05
10 years ago which didn't give any reasoning or pointer to what KVM bug
this may be. Given that this must have happened on an ancient version,
we agreed on removing it altogether[2].

[1] https://gitlab.com/qemu-project/qemu/-/issues/1762
[2] NixOS#354535 (comment)
@@ -223,11 +229,12 @@ rec {
-nographic -no-reboot \
-device virtio-rng-pci \
-virtfs local,path=${storeDir},security_model=none,mount_tag=store \
-virtfs local,path=/build,security_model=none,mount_tag=sa \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this isn't backwards compatible and creates quite a few annoyances. Right now, it breaks disko, see nix-community/disko#900

Formatting 'main.raw', fmt=raw size=10737418240
qemu-kvm: -virtfs local,path=/build,security_model=none,mount_tag=sa: cannot initialize fsdev 'sa': failed to open '/build': No such file or directory

The other paths in this invocation are configurable, but /build requires me to do a switch_root before. I would like to allow adding a custom prefix in front of this path. I'll submit a PR for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently looking into this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I don't have the bandwidth for it this weekend, can take a look at this on ~Monday if needed (please ping me here if that's the case).

iFreilicht added a commit to nix-community/disko that referenced this pull request Nov 30, 2024
Fixes #900

This was caused by NixOS/nixpkgs#354535
originally. The breaking changes introduced there have been resolved by
NixOS/nixpkgs#360413, but one addition survived,
which was the line `source $stdenv/setup`.

Because we used `>` instead of `>>`, `saved-env` was overwritten, so
even with the second PR, the script failed with the following error:

    /nix/store/pw...ykc-vm-run-stage2: line 16: stdenv: unbound variable

Once this and the second PR mentioned above are merged, #903 will be
unblocked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vmTools.runInLinuxVM fails upon __structuredAttrs = true, complaining mkdir: missing operand
6 participants