Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce impact of OPENMP #45040

Open
VinInn opened this issue May 24, 2024 · 15 comments
Open

reduce impact of OPENMP #45040

VinInn opened this issue May 24, 2024 · 15 comments

Comments

@VinInn
Copy link
Contributor

VinInn commented May 24, 2024

as detailed in #44923 cmssw links openmp gcc library and at some point threads are spawn in number of ncpu-1 (for each cmssw thread)

setting export OMP_NUM_THREADS=1 will make openmp not to spawn any thread (as gomp uses the main thread as thread-0)

I suggest to make export OMP_NUM_THREADS=1 default in scram (or to call the equivalnet API function in cmsRun)

@cmsbuild
Copy link
Contributor

cmsbuild commented May 24, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @VinInn.

@sextonkennedy, @rappoccio, @Dr15Jones, @smuzaffar, @makortel, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@fwyzard
Copy link
Contributor

fwyzard commented May 27, 2024

For the record, on the new Bergamo machines (2x256 cores), HLT fails with CMSSW_14_0_7_MULTIARCHS because

libgomp: Thread creation failed: Resource temporarily unavailable

Now I'm testing if export OMP_NUM_THREADS=1 works around the problem.

@fwyzard
Copy link
Contributor

fwyzard commented May 27, 2024

Now I'm testing if export OMP_NUM_THREADS=1 works around the problem.

It does:

Running 4 times over 10300 events with 8 jobs, each with 64 threads, 48 streams and 1 GPUs
  1312.8 ±   0.4 ev/s (10000 events, 99.1% overlap)

@smuzaffar
Copy link
Contributor

cms-sw/cmsdist#9207 proposes to set OMP_NUM_THREADS=1 for cmssw runtime env

@fwyzard
Copy link
Contributor

fwyzard commented May 27, 2024

While it is fine as a temporary measure for the HLT, I'm not sure this is what we should do as a general setting.

A trivial example of what can go wrong:

$ nproc
512

$ export OMP_NUM_THREADS=1

$ nproc
1

@mmusich
Copy link
Contributor

mmusich commented May 27, 2024

cms-sw/cmsdist#9207 proposes to set OMP_NUM_THREADS=1 for cmssw runtime env

mmmh, there are certainly applications (e.g. millepede alignment) that need more than 1 thread, see e.g.:

process.AlignmentProducer.algoConfig.pedeSteerer.pedeCommand = "export OMP_STACKSIZE=20M; MKL_THREADING_LAYER=GNU; export OMP_NUM_THREADS=10; export MKL_NUM_THREADS=10; \

@cms-sw/alca-l2 @henriettepetersen @TomasKello FYI

@smuzaffar
Copy link
Contributor

may be call to omp_set_num_threads() within cmsRun process

-  OMP_NUM_THREADS (if present) specifies initially the number of threads;
-  calls to omp_set_num_threads() override the value of OMP_NUM_THREADS;
-  the presence of the num_threads clause overrides both other values.

@VinInn
Copy link
Contributor Author

VinInn commented May 27, 2024

In my opinion NO application should rely on default settings and should properly taylor the number of threads to the task in hands and the environment (as done for edm). Therefore setting export OMP_NUM_THREADS=1 is perfectly fine in my opinion.
btw: do not understand what nproc has to to with openmp
in any case

[innocent@devfu-c2b04-44-01 ~]$ nproc --all
256
[innocent@devfu-c2b04-44-01 ~]$ export OMP_NUM_THREADS=1
[innocent@devfu-c2b04-44-01 ~]$ nproc --all
256
[innocent@devfu-c2b04-44-01 ~]$ nproc
1

@VinInn
Copy link
Contributor Author

VinInn commented May 27, 2024

For those interested to what others do w/r/t this issue please have a look to this
https://github.com/joblib/threadpoolctl/blob/master/README.md

this is for python, still something similar in C++ could become useful in cmssw

@VinInn
Copy link
Contributor Author

VinInn commented May 27, 2024

instead of nproc

taskset -pc $$
pid 2177350's current affinity list: 0-255
taskset -p $$
pid 2177350's current affinity mask: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

is most probably more reliable
(parsing the list/mask is left as an exercise to the reader)

@makortel
Copy link
Contributor

From the framework perspective

  • We are concerned relying on environment variables would be brittle
  • We would much prefer to completely disallow OpenMP threading being used in shared objects loaded by cmsRun
    • If this is not completely possible, the CMSSW components using OpenMP (directly or indirectly) should explicitly set the number of OpenMP threads to 1
    • We could look into tools that would identify OpenMP dependency and flag it during a PR

Regarding XGBoost

@mmusich
Copy link
Contributor

mmusich commented May 28, 2024

There seems to exist a pat::XGBooster abstraction already in PhysicsTools/PatAlgos, that does the proper XGBoosterSetParam(..., "nthread", "1") call

  • The abstraction should really be moved to its own package, do decouple it and PhysicsTools/PatAlgos dependencies
  • One option for PhotonXGBoostEstimator would be to use this XGBooster abstraction

This seemed relatively easy to do so I gave it a go at #45085. In case the GBRForest implementation is already underway we can discard it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants