Skip to content

yashkurkure/hpc_testbed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Power Consumption of Heterogeneous Clusters

Chameleon Cloud is a large-scale, deeply reconfigurable experimental platform built to support Computer Sciences systems research. It hosts baremetal nodes with many configurations and also gives users full control of the software stack including root privileges, kernel customization, and console access.

Chameleon's Heterogeneity

Chameleon supports x86_64 (Intel) and aarch64 (Cavium, QLogic, Fujitsu) CPU architectures as well as Nvidia and AMD GPUs. Specific model information can be found in Chameleon's Hardware Discovery.

Note: Does not support AMD CPUs but can be added to CHI@EVL site of Chameleon Cloud.

Power consumption and capping

Various units of a heterogenous system might have differnt ways of accessing power and energy readings. For example, Intel processors after and including Sandy Bridge provides the RAPL (Running Average Power Limit) interface via the linux kernel under /sys/devices/virtual/powercap/intel-rapl. For GPUs, Nvidia provides nvidia-smi (System Management Interface). These tools allow power mangement by interacting with the model specific registers (MSRs).

Chameleon provides root access which allows us to use tools and read necessary files such as the intel-rapl files in the Kernel's \sys interface.

Intel CPUs post Sandybridge Architecture Intel RAPL, Kernel docs
Nvidia GPUs post 2011 nvidia smi
arm ACPI (Unsure about this ???)
AMD Family 17h, 19h RAPL (same register contents, but the MSR numbers are different), 17h support Kernel Patch, 19h support Kernel Patch
AMD GPUs ???
FPGAs ???

A more comprehensive list can be found in this article.

On Chameleon ~ Experiment on Intel Skylake using RAPL on Chameleon Cloud

Power monitoring

This exeperiment demonstrates energy monitoring an Intel Skylake CPU using RAPL on Chameleon Cloud.

# cpuifo
Architecture:          x86_64
CPU(s):                48
Thread(s) per core:    2
Core(s) per socket:    12
Model name:            Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz

This node was leased at the CHI@TACC site and the CPU contains 2 physical packages PP0 and PP1.

The energy consumption can be understood by reading files for each package. For package 0 read the files:

  • /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
  • /sys/class/powercap/intel-rapl/intel-rapl:0/max_energy_uj

and for package N:

  • /sys/class/powercap/intel-rapl/intel-rapl:{N}/energy_uj
  • /sys/class/powercap/intel-rapl/intel-rapl:{N}/max_energy_uj

energy_uj keeps a counter of energy consumed in micro joules and when this counter reaches the value of max_energy_uj, it is reset to 0.

The below energy plots were produced by querying the files for a 30s window while a 10s stress test was run the 32 cores.

image

image

The power plots were created by sampling for a window of 1s:

image

Power capping

Power capping can be performed by writing to the following files:

  • Long term constraint - Package 0: /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_power_limit_uw
  • Short term constraint - Package 1: /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_1_power_limit_uw
  • Long term constraint - Package 0: /sys/class/powercap/intel-rapl/intel-rapl:1/constraint_0_power_limit_uw
  • Short term constraint - Package 1: /sys/class/powercap/intel-rapl/intel-rapl:1/constraint_1_power_limit_uw

Each of these files is accompanied by the time_window after which the constraint can be reevaluated. For eg: /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_time_window_us for Long term constraint of Package 0.

You may also find the name of the constraint inside constraint_0_name

[cc@skylake powerman]$ cat /sys/class/powercap/intel-rapl/intel-rapl:0/constraint_0_name
long_term

Below plot shows a 50W cap applied to package 0 while performing a stress test: image

Chameleon reserving resources

Chameleon resources are available at multiple sites, e.g., CHI@TACC, CHI@UC, CHI@Edge. Each of the sites host their own resources for each project to use. It is possible to lease resources at each chameleon site. The maximum length of a lease is 7 days. You can find more details about reservations here.

Once a reservation is obtained for a device, a bare metal instance can be launched with various options of images such as Ubuntu, CentOS, etc. More on launching instances and setting up ssh access can be found here.

References

Links

Papers

Articles

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published