Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Restoring cache appears to be missing files (e.g. libgcc_s.so.1) #44

Open
obreitwi opened this issue Jun 7, 2024 · 9 comments
Open
Assignees

Comments

@obreitwi
Copy link

obreitwi commented Jun 7, 2024

Describe the bug

When building in a restored cache, the builder is unable to find libgcc_s.so.1.

This leads to various errors, such as:

/nix/store/00mg4vlhzmm7gi9bd5v5ydjlgrywpc3n-go-1.22.3/share/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1 
/nix/store/bgcaxhhxswzvmxjbbgvvaximm5hwghz1-binutils-2.41/bin/ld: cannot find -lgcc_s: No such file or directory            
collect2: error: ld returned 1 exit status                                                                                  

when building a go service via gomod2nix or

closure-paths> python3: error while loading shared libraries: libgcc_s.so.1: cannot open shared object file: No such file or directory
error: builder for '/nix/store/gyb8h4dps65bbk6x6is9mxhg2akdv7vv-closure-paths.drv' failed with exit code 127

when building a docker image via dockerTools.buildLayeredImage.

To Reproduce

(Currently the error occurs in a private repo. I plan on re-creating it in a public repo for easier reproduction, but I wanted to report the issue first nonetheless)

Steps to reproduce would include:

  1. Run workflow successful with caching enabled (auto-optimize-store = false, sandbox = true), observe that all build targets get built successfully.
  2. Run the exact same workflow again with minor modifications to the source code or docker image contents, this time restoring the newly created crash.
  3. Observe one of the errors above, depending on which workflow was triggered.

Expected behavior

Restoring the cache should restore all files; to nix, it should not make a difference.

Additional context

A workaround is simply disabling the cache for now, but I want to understand what is going on… 😉

@obreitwi obreitwi changed the title [BUG] Restoring caches missing files (e.g. libgcc_s.so.1) [BUG] Restoring cache is missing files (e.g. libgcc_s.so.1) Jun 7, 2024
@obreitwi obreitwi changed the title [BUG] Restoring cache is missing files (e.g. libgcc_s.so.1) [BUG] Restoring cache appears to be missing files (e.g. libgcc_s.so.1) Jun 7, 2024
Munksgaard added a commit to Munksgaard/citest that referenced this issue Jun 9, 2024
@deemp
Copy link
Collaborator

deemp commented Jun 10, 2024

@obreitwi, please verify that files are missing due to the cache-nix-action. There are two ways:

Files may be missing due to building an incorrect list of paths to exclude.

const excludePaths = readdirSync("/nix/store")
.map(x => `../../../../../nix/store/${x}`)
.concat(
readdirSync("/nix/var/nix")
.filter(x => x != "db")
.map(x => `../../../../../nix/var/nix/${x}`)
)
.concat(
readdirSync("/nix/var/nix/db")
.filter(x => x != "db.sqlite")
.map(x => `../../../../../nix/var/nix/db/${x}`)
);
const tmp = await cacheUtils.createTempDirectory();
const excludeFromFile = `${tmp}/nix-store-paths`;
writeFileSync(excludeFromFile, excludePaths.join("\n"));
extraTarArgs = ["--exclude-from", excludeFromFile];

I'd be really nice if you provide a way to reproduce the problem, e.g., with a nixpkgs package.

@morguldir
Copy link

Here's a failed run with debug logging https://github.com/morguldir/conduwuit/actions/runs/9771677418/job/26975127257

Also changing to v5.1.0 like @Munksgaard did fixed it, even when the cache was made with v5.2.1: https://github.com/morguldir/conduwuit/actions/runs/9771781896/job/26975369404

@deemp
Copy link
Collaborator

deemp commented Aug 23, 2024

@obreitwi, @morguldir, @girlbossceo, @Munksgaard please provide a minimal reproducible example, so that I can debug properly.

@deemp
Copy link
Collaborator

deemp commented Aug 24, 2024

There may be a problem in the SQL script due to hash collisions, but hash collisions are highly unlikely for SHA256.

insert into ValidPaths (path, hash, registrationTime, deriver, narSize, ultimate, sigs, ca)
select path,
hash,
registrationTime,
deriver,
narSize,
ultimate,
sigs,
ca
from ValidPaths2
where hash not in (select hash from ValidPaths);

@obreitwi
Copy link
Author

@obreitwi, @morguldir, @girlbossceo, @Munksgaard please provide a minimal reproducible example, so that I can debug properly.

Really appreciate you looking into it… 👍
In our private repo, we switched to magic-nix-cache as a short-term workaround (which worked immediately), so my suspicion is still on cache-nix-action.
I still want to help and solve the issue, but am currently a bit short on time (vacation season being what it is). Will provide a minimal example as soon as I find the time!

@deemp
Copy link
Collaborator

deemp commented Aug 27, 2024

my suspicion is still on cache-nix-action.

@obreitwi, thanks for reducing problem space!

we switched to magic-nix-cache as a short-term workaround (which worked immediately)

I created cache-nix-action after I got rate limit errors with magic-nix-cache. The problem is that c-n-a manipulates the stores in a slightly non-conventional way, and I'm not sure whether it does so correctly because I'm not an expert in nix 😅.

Will provide a minimal example as soon as I find the time!

Looking forward to receiving an example! Currently, I'll try to use the @morguldir and @Munksgaard examples (@morguldir, @Munksgaard, thanks for providing them!), though they're:

  • definitely not minimal
  • use languages that I'm not familiar with
  • not particularly verbose

@deemp
Copy link
Collaborator

deemp commented Aug 29, 2024

@deemp
Copy link
Collaborator

deemp commented Aug 29, 2024

I still have questions:

  • How to determine whether a derivation needs nativeBuildInputs if it builds successfully?
  • Why does the derivation succeed the first time it's built?
  • Why can't your examples build after changes to src?

@Munksgaard
Copy link

Thank you for your work on this @deemp. I'm currently swamped with other work, but I'll try to re-engage with this issue sometime next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants