You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are porting various alerts from Nagios to the prometheus ecosystem and we've found one check that is kind of useful in Nagios that seems to be missing from the node exporter. It's a check that looks at EXT filesystems with the tune2fs -l command and (basically) greps for the FS Error count field.
This should normally be zero but under certain circumstances (failing disk, filesystem bug, power outage), it will rise. running fsck on the filesystem will fix this (and, normally, after a power outage, a reboot will run fsck, but under certain circumstances, it might not fully do it).
So I think the node exporter should do this. I've tried to find metrics about this in our node exporters and couldn't find anything under the node_filesystem_* namespace. There is node_filesystem_readonly and, according to this postnode_filesystem_device_error (but I can't see that metric here), but neither of those are the same as the error count.
Am I missing something or this is missing from the node exporter?
Here's a copy of the check, called dsa-check-filesystems here:
right, running tune2fs seemed like an odd idea in the first place, i was hoping for something exactly like that.
the PR you linked to has been merged, so we're getting close? :)
i don't quite understand what it takes to percolate stuff from procfs into the node exporter itself, now we'd need a stub to call that ext4.fs.ProcStat() thing next? or does procfs need to make a release first?
We are porting various alerts from Nagios to the prometheus ecosystem and we've found one check that is kind of useful in Nagios that seems to be missing from the node exporter. It's a check that looks at EXT filesystems with the
tune2fs -l
command and (basically) greps for theFS Error count
field.This should normally be zero but under certain circumstances (failing disk, filesystem bug, power outage), it will rise. running
fsck
on the filesystem will fix this (and, normally, after a power outage, a reboot will run fsck, but under certain circumstances, it might not fully do it).So I think the node exporter should do this. I've tried to find metrics about this in our node exporters and couldn't find anything under the
node_filesystem_*
namespace. There isnode_filesystem_readonly
and, according to this postnode_filesystem_device_error
(but I can't see that metric here), but neither of those are the same as the error count.Am I missing something or this is missing from the node exporter?
Here's a copy of the check, called
dsa-check-filesystems
here:The text was updated successfully, but these errors were encountered: