maxvmem is reported incorrectly in qstat -j #36

hoppo · 2024-07-06T17:27:25Z

Hi,

I've noticed something curious about the reporting of memory - but perhaps I'm missed some configuration somewhere.
We previously used OGS/GE 2011.11p1 - and the behavior was not as we are seeing in this version (SGE r191.8.1.9-180-g657a9d1)

We use vmem as a limiting resource.
If a user consumes more vmem than they requested, their job is killed.
However, I have seen jobs which have been killed but report the maxvmem as much lower than what was requested (checked after the job died using qacct).

I have checked a running job using top, and it shows the following.

 PID    USER       PR  NI  VIRT    RES     SHR  S   %CPU  %MEM  TIME+   COMMAND                
 197143 xxxxxx     20   0  107.0g  11.3g   4.4g S   5.3   2.2   5:46.06 xxxxxxxxxx.......

When I check this using using the qstat -j job_id command, I see the following:-

 usage         7:            cpu=00:05:04, mem=2570.35349 GB s, io=51.30660 GB, vmem=11.082G, maxvmem=11.108G

It would seem that qstat (and qacct) is misreporting res memory as vmem

OS - ubuntu 22.04

Any help or suggestions would be welcome.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxvmem is reported incorrectly in qstat -j #36

maxvmem is reported incorrectly in qstat -j #36

hoppo commented Jul 6, 2024

maxvmem is reported incorrectly in qstat -j #36

maxvmem is reported incorrectly in qstat -j #36

Comments

hoppo commented Jul 6, 2024