-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kathleen config #280
Add kathleen config #280
Conversation
Yeah I wouldn't worry too much if a benchmark in a draft PR (#227) doesn't run properly. However, it seemed like there was an issue with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some comments but I think I actually need to do my homework and test these things.
benchmarks/reframe_config.py
Outdated
'num_cpus': 80, | ||
'num_cpus_per_core': 2, | ||
'num_sockets': 2, | ||
'num_cpus_per_socket': 20, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should represent the physical, not virtual cores. If a benchmark wants to run on a "full node" it would grab the num_cpus
number and request that many cores per node. I assume with -l threads=2
it will get rejected by the scheduler? I guess I need to go and test this. But since we are disabling hyperthreading by default I think the processor config should reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20 are the the physical cores, this is the output of lscpu
on a compute node:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 80
On-line CPU(s) list: 0-79
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2500.000
BogoMIPS: 5000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 28160K
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 sme
p bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pt
s pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities
Are you suggesting we should set num_cpus_per_core=1
and then num_cpus=40
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But since we are disabling hyperthreading by default
What do you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we should set
num_cpus_per_core=1
and thennum_cpus=40
?
Yes, I think we should. We do the same e.g. on Archer2 where we set
excalibur-tests/benchmarks/reframe_config.py
Lines 60 to 64 in 3a9f51f
'processor': { | |
'num_cpus': 128, | |
'num_cpus_per_core': 1, | |
'num_sockets': 2, | |
'num_cpus_per_socket': 64, |
even though the number of CPU(s) reported by lscpu
is 256 with 2 Threads per core.
At least we should be consistent on how we set this across our systems.
benchmarks/reframe_config.py
Outdated
}, | ||
{ | ||
'name': 'threads', | ||
'options': ['-l threads=2'], # disable hyperthreading (default). To enable, use threads=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do this on the command line? Editing this file to toggle this is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I misunderstood how the resource should be declared/used. How's this:
'options': ['-l threads=2'], # disable hyperthreading (default). To enable, use threads=1 | |
'options': ['-l threads={virtual_cpus_per_thread}'], # default is 2 to disable hyperthreading. To enable, use {'threads' : {'virtual_cpus_per_thread': 1}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above is the value for the self.extra_resources
inside a benchmark that would want to enable hyperthreading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be controlled via use_multithreading
? At the moment the SGE scheduler doesn't do anything special in ReFrame about that, but if desired we could do something like https://github.com/reframe-hpc/reframe/blob/2b289c55e1c1015dd7f2dd04fdeac33f3a30305d/reframe/core/schedulers/slurm.py#L199-L202 for Slurm (as long as SGE has some generic enough configurations for this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be great if this could be handled by ReFrame and controlled by use_multithreading
in a benchmark.
The sombrero example runs fine. The conquest one doesn't , so probably the spack environment needs tweaking, but the conquest benchmark is itself under development, so I don't want to delay merging this system because of it. Discuss.