Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-step image build-and-push has erroneous cache hits #10

Open
zsusswein opened this issue Dec 18, 2024 · 5 comments
Open

Two-step image build-and-push has erroneous cache hits #10

zsusswein opened this issue Dec 18, 2024 · 5 comments

Comments

@zsusswein
Copy link
Collaborator

zsusswein commented Dec 18, 2024

EDIT 2024-12-19: Have a clear example today in CDCgov/cfa-epinow2-pipeline#134

I updated the DESCRIPTION, adding {primarycensored}. The first-step dependencies image had a cache hit. This cache hit should not have occurred -- the DESCRIPTION has changed and the cache key is supposed to include the hash of the DESCRIPTION. Then the second step image build with R CMD check failed, reporting "Package suggested but not available: ‘primarycensored’" even though {primarycensored} is on CRAN and can be installed successfully elsewhere.

@gvegayon can you look into this? I'd love to have a solution that isn't manually dropping the cache when this strategy fails.

FYI @dylanhmorris @damonbayer -- I think you might be using this strategy and you might be seeing this too?


I'm a bit belated in opening this issue and don't have a great clear-cut example to share. But I want to open it anyway to track the issue and ping next time it happens.

I've noticed that very occasionally (i.e., twice now) we see an erroneous cache hit in our deployment. In this case, I temporarily dropped the tests for R/azure.R and R CMD check failed because of one of those tests. I fixed the problem by dropping the cache and re-running and CI subsequently passed.

This example is hard to track and non-reproducible because it's old and I dropped the cache. Next time I see the problem crop up, I'll update here.

@amondal2
Copy link

It might be worth testing this hashFiles function on two different versions of DESCRIPTION.md to see that it returns different keys. Not sure how to do that easily, locally though https://github.com/CDCgov/cfa-epinow2-pipeline/actions/runs/12404725539/workflow#L70

@gvegayon
Copy link
Member

I don't think you are using the action. The workflow file attached to the job does not use CDCgov/cfa-actions. That said, I think the code is the same. I'm unsure about the cache-hit thing. I re-ran the action, and the cache-hit persists, but I don't see the key stored in the cache list of the repo, so it smells like a 🐛! If anyone can confirm this, we should post an issue on GitHub. I tried looking for an issue related to this (false positive), but was unable to find anything.

@gvegayon
Copy link
Member

This is what I see from calling gh cache ls --json id,key:

[END)
  {
    "id": 396,
    "key": "docker-dependencies-Linux-6d19bd7b980701f22c2af8652e06f82ed05ebeefd65b4033dc89fe1c3abecd38-build-main-pool"
  },
  {
    "id": 410,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-7f0445647fb5a2aac10e6a6845e489bbe0cff9662f88cc3cfca939c2fd8e1b7f"
  },
  {
    "id": 412,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-84b70f73dd13fdbaa431ccc8dcd245057774fde673c98049e605e1f0ede96df8"
  },
  {
    "id": 411,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-460bc19305d5154a0091cdf9ecf08b62e3e63297ae0fbad8ddc190de03304291"
  },
  {
    "id": 408,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-84b70f73dd13fdbaa431ccc8dcd245057774fde673c98049e605e1f0ede96df8"
  },
  {
    "id": 407,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-7f0445647fb5a2aac10e6a6845e489bbe0cff9662f88cc3cfca939c2fd8e1b7f"
  },
  {
    "id": 409,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-460bc19305d5154a0091cdf9ecf08b62e3e63297ae0fbad8ddc190de03304291"
  },
  {
    "id": 393,
    "key": "Ubuntu 22.04.5 LTS-R version 4.4.2 (2024-10-31)-1-4c07a1f8fdd193a63c4c9ec0d23873a8a5822196c33e385f05398cd9b830e0aa"
  },
  {
    "id": 406,
    "key": "docker-dependencies-Linux-6d19bd7b980701f22c2af8652e06f82ed05ebeefd65b4033dc89fe1c3abecd38-clarify-impure"
  },
  {
    "id": 397,
    "key": "docker-dependencies-Linux-6d19bd7b980701f22c2af8652e06f82ed05ebeefd65b4033dc89fe1c3abecd38-128-remove-old-action"
  }
]

And this is what I see from the step of that run:

Run actions/cache@v4
  with:
    key: docker-dependencies-Linux-f4bcfb55392f2a994483280942820d4c4affe91f35b437367ba7e4c0b6021444-51-check-gi-density
    lookup-only: true
    path: ./DESCRIPTION
    enableCrossOsArchive: false
    fail-on-cache-miss: false
    save-always: false
  env:
    REGISTRY: cfaprdbatchcr.azurecr.io
    IMAGE_NAME: cfa-epinow2-pipeline
Lookup only - skipping download
Cache found and can be restored from key: docker-dependencies-Linux-f4bcfb55392f2a994483280942820d4c4affe91f35b437367ba7e4c0b6021444-51-check-gi-density

So I have no idea why it's making a hit since there's no hash with f4bc* in the list of existing caches!

@amondal2
Copy link

Question -- why do we have lookup-only as true here? Don't we want to download the new image if there are changes?

@gvegayon
Copy link
Member

The caching step we are doing here is to verify whether we need to rebuild the container image or not. The download happens automatically if the container image is not built. lookup-only: true tells the cache that we don't care about the files. In other words, we are using the actions/cache just to keep track of whether we have to rebuild or not; the caching is not saving anything other than the key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants