Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set CPUs for soltest.sh based on the number of available cores #14478

Merged
merged 1 commit into from
Aug 9, 2023

Conversation

cameel
Copy link
Member

@cameel cameel commented Aug 7, 2023

Recently, while discussing our soltest parallelisation with @blishko I realized that we always run 3 processes inside each image, regardless of how many cores are available. As a result the available cores are underutilized and soltest jobs do not finish as fast as they potentially could. This PR makes an attempt to fix that.

Results

Memory

One of our concerns was that this might have been set to 3 to avoid exceeding available memory. In my test runs this does not seem to be the case though. Even with this change applied, for most of soltest jobs memory usage peaks at 25%. In some cases it reaches 50%. It's nowhere near 100%.

Timing

Here's the timing comparison between a run on my experimental circleci-config-commands branch (should have the same timing as develop) and an earlier run from this PR:

job before after parallelism machine
t_ubu_clang_soltest 35s 41s 20 2 CPU / 4 GB (medium)
t_archlinux_soltest 1m 0s 50s 20 2 CPU / 4 GB (medium)
t_ubu_soltest_all 8m 11s 8m 26s 50 4 CPU / 8 GB (large)
t_ubu_force_release_soltest_all 9m 40s 7m 42s 50 4 CPU / 8 GB (large)
t_osx_soltest 15m 21s 10m 22s 1 4 CPU / 8 GB (MacOS medium Gen2)
t_win_soltest 25m 46s 24m 24s 1 4 CPU / 8 GB (Windows medium)

Note that on Windows we use a PowerShell script that is not parallelized, so we have just one process in the container. On macOS the job is not parallelized (parallelism = 1), but the script is.

Oddly, the change did not significantly affect t_ubu_soltest_all. I checked and it did spawn 3 processes with 75% CPU use before and does spawn 5 now with 100% utilization. I did not do a precise comparison, but just by eyeballing the resource graph on CircleCI it looks like there's no change in performance. The total running time is influenced by a few slower runs among the 50 while most finish after 3-4 min, and after the change the stragglers are still there and the majority still finishes around the same time (or maybe even a bit later). At least it does not seem to make things worse. In any case, it's hard to tell just from the graph, but hopefully the total running time is actually shorter and saves us some credits.

On macOS there's a big improvement and for that alone this seems worth it.

@cameel cameel requested review from r0qs and blishko August 7, 2023 16:46
@cameel cameel self-assigned this Aug 7, 2023
@blishko
Copy link
Contributor

blishko commented Aug 7, 2023

@cameel , why do you specify number of CPUs for large resource class as 5 in the config when is has 4 CPUs available?

@cameel
Copy link
Member Author

cameel commented Aug 7, 2023

The name of this variable is a bit misleading. It's not meant to literally represent the number of CPUs. More like the number of processes to run on those cores. The usual recommendation is to run N+1 jobs on N cores to saturate them. The idea is that when a process becomes blocked on an I/O operation, the core is basically free and some other process can take it over. I've even seen 1.5N+1 or 2N+1 recommended, but that does not seem necessary here since the graphs already show 100% usage.

I considered renaming it, but I don't really have a good name. We already have "parallelism" and "jobs" used for CircleCI things so naming gets confusing.

I could make it literally mean the number of CPUs and have the script add +1, but I like that it matches the value passed to -j. Makes it easier to see that the value is right.

@blishko
Copy link
Contributor

blishko commented Aug 9, 2023

OK! Thanks for the explanation. From my side, this is OK.
The memory issues I have been experiencing are local to my branch.

Probably someone else should also look at this and approve?

@cameel
Copy link
Member Author

cameel commented Aug 9, 2023

Well, this is just a CI change and it's pretty simple so it does not require a very strict review. I think we can just merge it if you don't see any obvious problems there and approve.

@cameel cameel force-pushed the set-soltest-cpus-in-circleci-config branch from c1170e4 to ec240c6 Compare August 9, 2023 09:12
Copy link
Contributor

@blishko blishko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cameel cameel enabled auto-merge August 9, 2023 09:15
@cameel cameel merged commit 3edf91a into develop Aug 9, 2023
1 check passed
@cameel cameel deleted the set-soltest-cpus-in-circleci-config branch August 9, 2023 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

2 participants