Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Utilization query reporting > 100% for certain instances #491

Open
chadbaldwin opened this issue Aug 20, 2024 · 1 comment
Open

CPU Utilization query reporting > 100% for certain instances #491

chadbaldwin opened this issue Aug 20, 2024 · 1 comment

Comments

@chadbaldwin
Copy link

Here from SQL Server Community Slack chat:
https://app.slack.com/client/T1LTZ0BQV/C1MS1RA4B

Note: I am not currently a SQLWATCH user, but while doing research I found the query being used by SQLWATCH. Leaving this issue here by request of @marcingminski

I came across a bug regarding this commonly used query:

select
--original PR https://github.com/marcingminski/sqlwatch/commit/b8a8a5bbaf134dcd6afb4d5b9fef13e052a5c164
--by https://github.com/marcingminski/sqlwatch/commits?author=sporri
@date_snapshot_current
, percent_processor_time=convert(real,ProcessUtilization)
, percent_idle_time=convert(real,SystemIdle)
FROM (
SELECT SystemIdle=record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int'),
ProcessUtilization=record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int')
FROM (
SELECT TOP 1 CONVERT(xml, record) AS [record]
FROM sys.dm_os_ring_buffers WITH (NOLOCK)
WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' collate database_default
AND record LIKE N'%<SystemHealth>%' collate database_default
ORDER BY [timestamp] DESC
) AS x
) AS y
OPTION (keep plan);

(Also used in dbadash and other tools/scripts).

It seems there is an issue when running this query against non-hypervisor instances. The ProcessUtilization value does not appear to be 100-based and I have not yet determined what it actually is based on. Just working on theory at this point. The value starts at 0, like normal, but as SQL load increases, the value quickly surpasses where you would expect it to be.

For example, I have an instance with the following stats:

cpu_count                      : 224
hyperthread_ratio              : 56
max_workers_count              : 7552
scheduler_count                : 224
scheduler_total_count          : 315
affinity_type_desc             : AUTO
virtual_machine_type_desc      : NONE
softnuma_configuration_desc    : ON
process_physical_affinity      : {
                                   {0,ffffffffffffff}
                                   {1,ffffffffffffff}
                                   {2,ffffffffffffff}
                                   {3,ffffffffffffff}
                                 }
socket_count                   : 4
cores_per_socket               : 28
numa_node_count                : 16
container_type_desc            : NONE

When I run the linked query above, I get:

percent_processor_time    : 180
percent_idle_time         : 55

My theory is that it might be multiplied by the number of sections in process_physical_affinity, but that is a complete guess for now.

One solution I have considered is to just make the query "less bad". For example, if virtual_machine_type_desc is NONE then do not use ProcessUtilization, instead use 100-SystemIdle, as SystemIdle does seem to reliably be 100-based. Unfortunately this lumps OS load into there, but that seems better than having nothing or something completely wrong.

@chadbaldwin
Copy link
Author

UPDATE: I guess I've got blinders on and I have been looking at this problem for too long. I don't think there is any need to look at process_physical_affinity. As it seems when virtual_machine_type_desc is NONE then the number of sections in process_physical_affinity is equal to the socket count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant