Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore osx cache #14485

Closed
wants to merge 1 commit into from
Closed

Restore osx cache #14485

wants to merge 1 commit into from

Conversation

r0qs
Copy link
Member

@r0qs r0qs commented Aug 11, 2023

Fix #12925.

@r0qs r0qs marked this pull request as ready for review August 11, 2023 16:54
.circleci/config.yml Outdated Show resolved Hide resolved
Comment on lines 371 to 377
- /usr/local/Cellar/boost
- /usr/local/Cellar/cmake
- /usr/local/Cellar/wget
- /usr/local/Cellar/coreutils
- /usr/local/Cellar/diffutils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's really safe to do it like this. If you do not restore it all, we end up with some bits coming from cached homebrew and some coming from the image. What are the parts you are not caching?

This does seem to pass tests and gives a nice speedup so I'm a bit torn on this. Caching the whole /usr/local/Cellar and /usr/local/Homebrew like we used to seems like it would be less prone to break in weird ways at some point in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know if it is safe to do this way, but I decided to cache only what we need because if we cache everything inside the Cellar directory, it actually does not save us time. Restoring the cache ends up being slower (for example, see: https://app.circleci.com/pipelines/github/r0qs/solidity/640/workflows/b7416bf3-b054-48a1-ada1-d89c8e047a95/jobs/19678)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we first remove everything from /usr/local/Cellar before restoring? My impression is that the slowdown really comes from printing all those errors (or maybe it's that just handling them goes through some horribly inefficient code path in tar?). It used to be pretty fast back when we used to do that.

Copy link
Member Author

@r0qs r0qs Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we first remove everything from /usr/local/Cellar before restoring?

Hum...not sure if this would be the best way, since it would attempt to install/update all the packages twice in the first run (when there is no cache available). Also, it would cause errors like this: https://app.circleci.com/pipelines/github/ethereum/solidity/30938/workflows/1d8b9f77-a106-47a3-91d7-d561b4cfe8d2/jobs/1375906, i.e. some commands will not be available until we reinstall everything. I would rather just keep it as it is now, i.e. caching and restoring only the packages we need.

My impression is that the slowdown really comes from printing all those errors (or maybe it's that just handling them goes through some horribly inefficient code path in tar?). It used to be pretty fast back when we used to do that.

Yes, I have the same impression.

Copy link
Member

@cameel cameel Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would cause errors like this

But isn't that because you do not recreate an empty directory after deleting it?

it would "install" the packages in the Cellar twice in the first run

What's the problem with that? It's fine if the first run is longer as long as it's not extremely so. We update macOS dependencies quite rarely so the cache will almost always be present.

Copy link
Member Author

@r0qs r0qs Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would cause errors like this

But isn't that because you do not recreate an empty directory after deleting it?

No, I was referring to this error here:

/Users/distiller/.bash_profile: line 19: rbenv: command not found
/Users/distiller/.bash_profile: line 22: pyenv: command not found
Running `brew update --auto-update`...

I copied the wrong link in the previous comment: https://app.circleci.com/pipelines/github/ethereum/solidity/30938/workflows/1d8b9f77-a106-47a3-91d7-d561b4cfe8d2/jobs/1375906?invite=true#step-105-0_65

it would "install" the packages in the Cellar twice in the first run

What's the problem with that? It's fine if the first run is longer as long as it's not extremely so. We update macOS dependencies quite rarely so the cache will almost always be present.

Well, I suspect it will be almost twice the time that it is currently, which is around 5 minutes.

Copy link
Member

@cameel cameel Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the wrong link in the previous comment: https://app.circleci.com/pipelines/github/ethereum/solidity/30938/workflows/1d8b9f77-a106-47a3-91d7-d561b4cfe8d2/jobs/1375906?invite=true#step-105-0_65

Looks like the same link to me, just an extra anchor.

Well, I suspect it will be almost twice the time that it is currently, which is around 5 minutes.

That would be bad if we had to do it on every run, but still acceptable if it only runs rarely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the wrong link in the previous comment: https://app.circleci.com/pipelines/github/ethereum/solidity/30938/workflows/1d8b9f77-a106-47a3-91d7-d561b4cfe8d2/jobs/1375906?invite=true#step-105-0_65

Looks like the same link to me, just an extra anchor.

Yes, this is what I meant haha

Well, I suspect it will be almost twice the time that it is currently, which is around 5 minutes.

That would be bad if we had to do it on every run, but still acceptable if it only runs rarely.

Ok, if you think it is acceptable, I will do the change then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I meant haha

Ah, it was a link to a specific line in the output? Didn't even know you could do that :) I did not notice and just scrolled immediately to the end of output by habit :)

In any case, isn't .bash_profile only for interactive sessions? Not really sure why it would need rbenv and pyenv there. Is their macOS env really using Ruby and Python installed somewhere in user's profile via those instead of the global interpreters?

I see that homebrew is immediately installing its own portable Ruby. I wonder if it's because of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what I meant haha

Ah, it was a link to a specific line in the output? Didn't even know you could do that :) I did not notice and just scrolled immediately to the end of output by habit :)

haha yes, the first two lines of the Install build dependencies step.

In any case, isn't .bash_profile only for interactive sessions?

Yeah, as far as I know it is loaded during login. So I imagine that when the circleci user logins in the container to run the commands it also loads the .bash_profile of that user.

Not really sure why it would need rbenv and pyenv there. Is their macOS env really using Ruby and Python installed somewhere in user's profile via those instead of the global interpreters?

I'm not sure either, as I don't use macOS. But there are some environment variables for pyenv and rbenv that are set by default in the circleci macOS image, i.e. : PYENV_SHELL=bash and RBENV_SHELL=bash.

I see that homebrew is immediately installing its own portable Ruby. I wonder if it's because of that?

Yeah, it could be.

.circleci/osx_install_dependencies.sh Outdated Show resolved Hide resolved
@r0qs r0qs force-pushed the osx-cache branch 2 times, most recently from e8af597 to 81438b1 Compare August 14, 2023 11:54
@r0qs r0qs self-assigned this Aug 21, 2023
@r0qs r0qs force-pushed the osx-cache branch 5 times, most recently from 942c70d to 9a910f7 Compare August 21, 2023 15:00
@cameel
Copy link
Member

cameel commented Aug 21, 2023

I just got a different idea. How about caching not the installed dir but the binary packages that Homebrew downloads? Then it would still install them but that should be quick.

There are some suggestions on how to do that here: How to use Homebrew to install local archive.

Actually, maybe caching only /Users/distiller/Library/Caches/Homebrew already accomplishes that?

@cameel
Copy link
Member

cameel commented Aug 21, 2023

Also, does homebrew touch /usr/local/{bin,sbin,lib,include}? If not, we could restore caching those even if we give up on homebrew.

@r0qs
Copy link
Member Author

r0qs commented Aug 21, 2023

I just got a different idea. How about caching not the installed dir but the binary packages that Homebrew downloads? Then it would still install them but that should be quick.

There are some suggestions on how to do that here: How to use Homebrew to install local archive.

But the homebrew packages are already cached when we cache the /Users/distiller/Library/Caches/Homebrew directory.

static:~ distiller$ brew --cache -s wget
/Users/distiller/Library/Caches/Homebrew/downloads/6468b4ed952a797bf746f9180cc0b8d1e7edb4ab441976c21c254c5948e12aa5--wget-1.21.3.tar.gz

Actually, maybe caching only /Users/distiller/Library/Caches/Homebrew already accomplishes that?

Yes, but it needs the correspondent Cellar directory as well so the brew link will be able to update the PATH for the cached kegs (this was the reason I cached the specific kegs on the Cellar directory):

static:~ distiller$ brew info wget
==> wget: stable 1.21.3 (bottled), HEAD
Internet file retriever
https://www.gnu.org/software/wget/
/usr/local/Cellar/wget/1.21.3 (89 files, 4.2MB) *
  Poured from bottle on 2022-12-07 at 19:43:50
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/wget.rb
License: GPL-3.0-or-later
==> Dependencies
Build: pkg-config ✘
Required: libidn2 ✘, [email protected] ✘
==> Options
--HEAD
	Install HEAD version
==> Analytics
install: 79,265 (30 days), 271,426 (90 days), 462,952 (365 days)
install-on-request: 79,148 (30 days), 270,948 (90 days), 462,042 (365 days)
build-error: 15 (30 days)

But yeah, I'm not sure if this is the correct way to cache a homebrew package, since I'm not a macOS user.

Also, does homebrew touch /usr/local/{bin,sbin,lib,include}? If not, we could restore caching those even if we give up on homebrew.

Yes it does, the binaries/libs/headers are installed there.

static:~ distiller$ which wget
/usr/local/bin/wget

UPDATE: the brew link command recreate the link to the respective files in /usr/local/{bin,lib,include}. So you are right, we don't need to cache those.

Comment on lines 377 to 379
- /usr/local/opt
- /usr/local/share
- /usr/local/var
Copy link
Member Author

@r0qs r0qs Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that homebrew install more things that if not cached may indeed break the image. You can check it running the brew doctor command in the container.

Warning: Some installed formulae are missing dependencies.
You should `brew install` the missing dependencies:
  brew install ca-certificates gettext gmp icu4c libidn2 libunistring lz4 [email protected] xz zstd

Run `brew missing` for more details.
static:Cellar distiller$ brew missing
boost: icu4c xz lz4 zstd
coreutils: gmp
wget: gettext libunistring libidn2 ca-certificates [email protected]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, the only way that I could make it work was not deleting the whole Cellar directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /usr/local/var as volatile as /var? Caching /var would go way beyond what I'd call reasonable :)

Overall, looks like this is not the way to go. Too many assumptions about what Homebrew is doing. We need something more robust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added that only for testing, attempting to fix the errors reported by the brew doctor. Upon further look, it turned out that the important directory there is only /usr/local/var/homebrew which contains symlinks to the respective files under the Cellar directory.

@r0qs r0qs force-pushed the osx-cache branch 2 times, most recently from 516c52b to 445a48a Compare August 22, 2023 12:31
@cameel
Copy link
Member

cameel commented Aug 22, 2023

Actually, maybe caching only /Users/distiller/Library/Caches/Homebrew already accomplishes that?
Yes, but it needs the correspondent Cellar directory as well so the brew link will be able to update the PATH for the cached kegs (this was the reason I cached the specific kegs on the Cellar directory):

But I meant only caching the binary packages which Homebrew uses to install things. I.e. we would cache Caches/Homebrew to make sure the packages would not have to be downloaded (which, I am assuming, is the most time consuming part of the installation) and then issue brew install commands to let Homebrew install those packages. We would not touch anything under /usr/local. This would solve any problems we might have with inconsistent state of these installation dirs.

I was just not sure if caching Caches/Homebrew would really make brew install instant. Maybe we need to explicitly download those packages somewhere else and pass them to some Homebrew command that can install them directly.

Seeing the issues you're running into, I really think this is the right way forward here. We could do this for evmone, z3 and jsoncpp as well - basically only cache the wget command.

@cameel
Copy link
Member

cameel commented Aug 22, 2023

So you are right, we don't need to cache those.

But I meant the opposite :) I.e. that if Homebrew does not touch them then we can safely cache them independently, to reenable caching at least for evmone, z3 and jsoncpp, even if we don't manage to do it for Homebrew.

But apparently it does use these dirs? In any case, I'd rather try with the solution I posted above, which should be much less messy.

Also, perhaps we should shelve this for now and go back to it later. I thought it would be useful to try to reenable the cache and see if it works, but we're way beyond that. This was meant to be quick and easy :P

@r0qs
Copy link
Member Author

r0qs commented Aug 23, 2023

But I meant only caching the binary packages which Homebrew uses to install things. I.e. we would cache Caches/Homebrew to make sure the packages would not have to be downloaded (which, I am assuming, is the most time consuming part of the installation) and then issue brew install commands to let Homebrew install those packages. We would not touch anything under /usr/local. This would solve any problems we might have with inconsistent state of these installation dirs.

Right. Just to add more context, the files are installed in the Cellar directory, and brew just create symlinks to it during installation. For instance:

static:~ distiller$ ls -la /usr/local/bin/wget
lrwxr-xr-x  1 distiller  admin  30 Aug 23 09:40 /usr/local/bin/wget -> ../Cellar/wget/1.21.4/bin/wget

The binary packages that homebrew use to install things are in the /Users/distiller/Library/Caches/Homebrew/:

static:~ distiller$ brew --cache wget
/Users/distiller/Library/Caches/Homebrew/downloads/3e0f3235fe3eca0a1afea7053a674a1691e35fd0de2c9330147e4fd4bd543e25--wget--1.21.4.monterey.bottle.tar.gz

I was just not sure if caching Caches/Homebrew would really make brew install instant. Maybe we need to explicitly download those packages somewhere else and pass them to some Homebrew command that can install them directly.

We could try to force install from a local bottle using: brew install --force-bottle <package> however it requires the Cellar files of the respective package:

static:~ distiller$ brew install --force-bottle wget
==> Downloading https://formulae.brew.sh/api/formula.jws.json

==> Downloading https://formulae.brew.sh/api/cask.jws.json

Error: No such file or directory - /usr/local/var/homebrew/linked/wget

Because everything in /usr/local/{bin,sbin,lib,include,var,opt} is a symlink to Cellar:

static:~ distiller$ ls -la /usr/local/var/homebrew/linked/wget
lrwxr-xr-x  1 distiller  admin  27 Aug 23 09:40 /usr/local/var/homebrew/linked/wget -> ../../../Cellar/wget/1.21.4

So I don't see a way to not cache the Cellar directory, which bring us back to my original approach.

Seeing the issues you're running into, I really think this is the right way forward here. We could do this for evmone, z3 and jsoncpp as well - basically only cache the wget command.

Yes, got distracted by the homebrew errors and forgot about those other packages, for those we will need to cache the /usr/local/{bin,lib,include} since those packages are installed there.

@cameel
Copy link
Member

cameel commented Aug 23, 2023

Yes, got distracted by the homebrew errors and forgot about those other packages, for those we will need to cache the /usr/local/{bin,lib,include} since those packages are installed there.

Why not cache only the downloaded archive like I suggested?

We could try to force install from a local bottle using: brew install --force-bottle <package> however it requires the Cellar files of the respective package:

What happens if you try without --force-bottle? According to Pre-download a file for a formula, a simple brew install should be enough to use the bottle from the cache if it's there.

The way I'm suggesting is this:

  • Use some brew command that only downloads the formula (but does not build or install the package) get formulas for all the packages we need.
  • Build a string containing names and versions of all those packages for use as the cache key.
    • This way the cache will be invalidated whenever one of the packages is updated. But not their dependencies, which we can live with.
  • Restore /Users/home/Library/Caches/Homebrew/downloads/ from that cache key if available.
    • If available, brew install will use the packages from there.
    • If not available, brew install will populate the cache dir for us prior to installing, including dependencies as well.
  • Run brew install on all packages to install them.
  • Cache /Users/home/Library/Caches/Homebrew/downloads/ under our cache key for use next time.

@r0qs
Copy link
Member Author

r0qs commented Aug 24, 2023

Yes, got distracted by the homebrew errors and forgot about those other packages, for those we will need to cache the /usr/local/{bin,lib,include} since those packages are installed there.

Why not cache only the downloaded archive like I suggested?

Because evmone, z3 and jsoncpp are not installed via homebrew and we use part of the installation path of z3 to determine whether we should restore the cache (i.e. /usr/local/lib/libz3.a). But I guess I overlooked your previous comment, and reading it again I see that you said to cache the archives of those packages as well. So yeah, we can do it and remove that if from the osx_install_dependencies.sh.

What happens if you try without --force-bottle? According to Pre-download a file for a formula, a simple brew install should be enough to use the bottle from the cache if it's there.

Yes, indeed. However, in all tests that I tried, even if the package exists in the homebrew's cache but it does not exists in the Cellar directory (i.e. it is not restored by the CI cache) the homebrew's cache does not really prevents the download. Not really sure why. The use of --force-bottle was an attempt to force brew to use the local bottle.

The way I'm suggesting is this:

* Use some brew command that only downloads the formula (but does not build or install the package) get formulas for all the packages we need.

I believe that for this we can use the brew fetch <package>. And then, we could cache the /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/<package_formula>.rb for the packages that we install using homebrew. However, I don't know if caching only this directory would break homebrew somehow (e.g. when we update xcode version).

Also, having the formula does not help too much I think, since the formula only specify how the package will be installed and does not really contain the necessary files to install it, see for example the wget formula.

@r0qs r0qs force-pushed the osx-cache branch 2 times, most recently from ec35d2d to 8996ae2 Compare September 5, 2023 13:41
@cameel
Copy link
Member

cameel commented Sep 8, 2023

said to cache the archives of those packages as well

I mean, I think we should cache only the archives. This makes things simpler because you don't have to worry about some random stuff being in the directories we would otherwise have to cache.

Downloading the binaries is the slow part. Installation is quick, so we can just repeat it every time.

I believe that for this we can use the brew fetch <package>. And then, we could cache the /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/<package_formula>.rb for the packages that we install using homebrew.

I'd not cache the formulas. I think they should be redownloaded each time. This is because we need a way to know if cache is up to date or not. We can only know that by looking at what the latest versions of the packages are and I assume that homebrew will by default download the latest version of the formula and we'll see the version inside. If any package has been updated, we invalidate the cache, run installation from scratch and cache it.

But for that I'm assuming homebrew must have some command that just downloads the formula independently of the binary packages. Alternatively, maybe it has a command that tells us the latest version of a formula. If it does not, the last resort would be to download the formula manually with wget or git.

@r0qs r0qs marked this pull request as draft September 14, 2023 09:32
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Sep 28, 2023
@ethereum ethereum deleted a comment from github-actions bot Sep 28, 2023
@r0qs r0qs removed the stale The issue/PR was marked as stale because it has been open for too long. label Sep 28, 2023
@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Oct 24, 2023
@r0qs r0qs removed the stale The issue/PR was marked as stale because it has been open for too long. label Oct 24, 2023
@ethereum ethereum deleted a comment from github-actions bot Oct 24, 2023
Copy link

github-actions bot commented Nov 8, 2023

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Nov 8, 2023
Copy link

This pull request was closed due to a lack of activity for 7 days after it was stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-due-inactivity stale The issue/PR was marked as stale because it has been open for too long.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CircleCI dependency caching broken on macOS
2 participants