Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kathleen config #280

Merged
merged 6 commits into from
Aug 22, 2024
Merged

Add kathleen config #280

merged 6 commits into from
Aug 22, 2024

Conversation

ilectra
Copy link
Collaborator

@ilectra ilectra commented Mar 7, 2024

The sombrero example runs fine. The conquest one doesn't , so probably the spack environment needs tweaking, but the conquest benchmark is itself under development, so I don't want to delay merging this system because of it. Discuss.

@ilectra ilectra requested review from giordano and tkoskela March 7, 2024 12:17
@ilectra ilectra self-assigned this Mar 7, 2024
@ilectra ilectra changed the title Ic/add kathleen config Add kathleen config Mar 7, 2024
@tkoskela
Copy link
Member

tkoskela commented Mar 7, 2024

Yeah I wouldn't worry too much if a benchmark in a draft PR (#227) doesn't run properly. However, it seemed like there was an issue with openmpi?

@ilectra
Copy link
Collaborator Author

ilectra commented Mar 8, 2024

Can you do a last review and merge if happy with it, @tkoskela and @giordano ? Both intel-mpi and OpenMPI work with the current versions in the spack.yaml.

Copy link
Member

@tkoskela tkoskela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments but I think I actually need to do my homework and test these things.

Comment on lines 175 to 178
'num_cpus': 80,
'num_cpus_per_core': 2,
'num_sockets': 2,
'num_cpus_per_socket': 20,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should represent the physical, not virtual cores. If a benchmark wants to run on a "full node" it would grab the num_cpus number and request that many cores per node. I assume with -l threads=2 it will get rejected by the scheduler? I guess I need to go and test this. But since we are disabling hyperthreading by default I think the processor config should reflect this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 are the the physical cores, this is the output of lscpu on a compute node:

$ lscpu                                                                                                                                                                            
Architecture:          x86_64                                                                                                                                                                                     
CPU op-mode(s):        32-bit, 64-bit                                                                                                                                                                             
Byte Order:            Little Endian                                                                                                                                                                              
CPU(s):                80                                                                                                                                                                                         
On-line CPU(s) list:   0-79                                                                                                                                                                                       
Thread(s) per core:    2                                                                                                                                                                                          
Core(s) per socket:    20                                                                                                                                                                                         
Socket(s):             2                                                                                                                                                                                          
NUMA node(s):          2                                                                                                                                                                                          
Vendor ID:             GenuineIntel                                                                                                                                                                               
CPU family:            6                                                                                                                                                                                          
Model:                 85                                                                                                                                                                                         
Model name:            Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz                                                                                                                                                   
Stepping:              7                                                                                                                                                                                          
CPU MHz:               2500.000                                                                                                                                                                                   
BogoMIPS:              5000.00                                                                                                                                                                                    
Virtualization:        VT-x                                                                                                                                                                                       
L1d cache:             32K                                                                                                                                                                                        
L1i cache:             32K                                                                                                                                                                                        
L2 cache:              1024K                                                                                                                                                                                      
L3 cache:              28160K                                                                                                                                                                                     
NUMA node0 CPU(s):     0-19,40-59                                                                                                                                                                                 
NUMA node1 CPU(s):     20-39,60-79                                                                                                                                                                                
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
 rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
 avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 sme
p bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt
s pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities

Are you suggesting we should set num_cpus_per_core=1 and then num_cpus=40?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since we are disabling hyperthreading by default

What do you mean exactly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we should set num_cpus_per_core=1 and then num_cpus=40?

Yes, I think we should. We do the same e.g. on Archer2 where we set

'processor': {
'num_cpus': 128,
'num_cpus_per_core': 1,
'num_sockets': 2,
'num_cpus_per_socket': 64,

even though the number of CPU(s) reported by lscpu is 256 with 2 Threads per core.

At least we should be consistent on how we set this across our systems.

},
{
'name': 'threads',
'options': ['-l threads=2'], # disable hyperthreading (default). To enable, use threads=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this on the command line? Editing this file to toggle this is not ideal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I misunderstood how the resource should be declared/used. How's this:

Suggested change
'options': ['-l threads=2'], # disable hyperthreading (default). To enable, use threads=1
'options': ['-l threads={virtual_cpus_per_thread}'], # default is 2 to disable hyperthreading. To enable, use {'threads' : {'virtual_cpus_per_thread': 1}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is the value for the self.extra_resources inside a benchmark that would want to enable hyperthreading.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be controlled via use_multithreading? At the moment the SGE scheduler doesn't do anything special in ReFrame about that, but if desired we could do something like https://github.com/reframe-hpc/reframe/blob/2b289c55e1c1015dd7f2dd04fdeac33f3a30305d/reframe/core/schedulers/slurm.py#L199-L202 for Slurm (as long as SGE has some generic enough configurations for this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be great if this could be handled by ReFrame and controlled by use_multithreading in a benchmark.

@tkoskela tkoskela merged commit f41b992 into main Aug 22, 2024
6 checks passed
@tkoskela tkoskela deleted the ic/add-kathleen-config branch August 22, 2024 08:52
github-actions bot pushed a commit that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants