-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we use different number of OpenMP threads per MPI process #13408
Comments
Thanks. I will try it. |
We can set different number of threads per MPI process calling omp_set_num_threads(my_rank_nthr). see below: |
Sure, but the question is -- how do you tell Slurm to provide a different number of cores to the MPI processes. Slurm cannot wait until FDS starts to allocate the extra cores to certain MPI processes. |
I think this is what we need: |
I tried with several options in spark, including heterogeneous job submissions. The following configuration worked for me (see the attached code and job submission script):
The output is as follows:
It appears that the MPI rank (RANK_ID) is assigned sequentially based on the order of processes in the mpirun command. This means that if you know which MPI process requires more threads, you can simply specify it directly in the mpirun command in appropriate order. |
How do you know that you have been allocated 30 cores? |
I am not sure. Let me check if there is a command to know how many cores are allocated to a job. |
You can look at the .log file. For example in race_test_4.log I get this. One MPI job pinned to 4 cpus. [0] MPI startup(): ===== CPU pinning ===== |
This is what the pinning looks like for the sample case
I assume that we only get 1 core per MPI process. The OpenMP threads are, I suppose, crammed onto a single core. |
Thanks, Jason and Kevin. CPU pinning is a great idea, and I'm currently testing it. I've also added a C code that retrieves the CPUID at runtime (using the sched_getcpu() system call) to show which CPU a particular thread is executing on. The code is attached. Test 1: 1 MPI, 4 Threads, only 1 CPU requested through ntasks, and I_MPI_PIN_DOMAIN is not set.
Test 2: 1 MPI, 4 Threads, only 1 CPU requested through ntasks, I_MPI_PIN_DOMAIN=omp.
Test 3: 1 MPI, 4 Threads, 4 CPU requested through ntasks, I_MPI_PIN_DOMAIN=omp.
Test 4: 1 MPI, 8 Threads, 8 CPU requested through ntasks, I_MPI_PIN_DOMAIN=omp.
Now come to multiple mpi processes with different threads:
Test 6: 3 MPI, 30 totalThreads, 30 CPU requested through ntasks, I_MPI_PIN_DOMAIN=omp. But used taskset to directly allocate the CPU's.
|
Why not this:
|
Yes, that's perfectly fine. It simply means you'll need to specify each MPI task individually. When you require a single MPI process with a different number of threads, I thought it would be easier to just specify that process separately, rest can be grouped. :) |
For the moment, this is going to be a "special case". I don't want to work this into qfds.sh, for example. My thought would be to use qfds.sh to create a basic script that can be modified. Then I'd like to see if the detailed chemistry cases can be run by assigning more CPUs to the meshes that need it. |
Agreed. Marcos and I will work on making the chemistry call thread-safe. It seems quite possible. |
Here is a simple Hello World program that uses both OpenMP and MPI. Can we run this with a different number of OpenMP threads per MPI process.
The text was updated successfully, but these errors were encountered: