-
Notifications
You must be signed in to change notification settings - Fork 11
Not auto acknowledging alerts? #5
Comments
i'm guessing you have a permissions issue... i should probably add to the documentation that you probably want to run this poller as the |
I am running it as Nagios user but it doesn't seem to be updating it still. Running it in debug mode shows that it is able to pull up a list of current alerts for the Nagios instance, but no acknowledgement. |
How is it matching the acknowledgement back to the Nagios alert? We changed the message sent to PagerDuty slightly to include the incident error message into the title. I'm wondering if the script is not able to match the incident back to Nagios because of that. |
oh yes, this mechanism does depend on a few environment macros / fields passed by pagerduty_nagios.pl, in particular HOSTDISPLAYNAME, SERVICEDISPLAYNAME, and SERVICEPROBLEMID. the alerts emitted by pagerduty probably use the former 2, and if you have mutated those by adding data like the status message, then this won't be able to use those to identify/key the correct service sent by this to the nagios command pipe. if this is your problem, then perhaps instead you could mutate the LASTSERVICESTATE macro, which would not interfere this way. |
So I just put the command for the PagerDuty alert back to the stock way and it is still not acknowledging incidents back to Nagios. I don't think it's a problem of the incident title at this point. What should I expect to see in the script output in debug mode? |
i think at this point i'd need to see some raw data. could you log into pagerduty and click on the incident number and save the page, and also send me your status.dat, and also the contents of /tmp/pd_ack_to_nagios_ack_poller.last_id or wherever you put that state file. could you share these with me via gist or google drive? |
ok i got the incident page and last_id file, but not the status.dat however, i already see a problem... under the second "Details" table (the one with the grey background), i only see 3 fields, i think this means you might have disabled some environment macros? some people do this for performance reasons: https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/3/en/tuning.html i would have thought this would have broken the https://github.com/PagerDuty/pagerduty-nagios-pl as it is specifically mentioned in it's README.md . can you make sure you have |
I turned it back on but it's still behaving the same. Here is a zip of all the 3 requested items. https://drive.google.com/file/d/0B_dX7tp_c7k0UTZjejRQOHFtREU/view?usp=sharing |
i still don't see the necessary fields in your incident detail page... e.g. in my environment there are almost 200 fields. i assume you restarted nagios after setting
(be careful with this though... might be risky depending on workload or what else is running on your nagios server... and you might need to try several times before you catch a plugin process running) also, i wonder what happens if you're using the embedded perl interpreter... |
I ran a while loop using the command you gave and I'm seeing some data being created as Nagios is processing stuff. Enabling the embedded perl interpreter did not help with anything. For instance, the cat command outputs stuff like:
|
Ok problem solved. The pagerduty_nagios.pl script wasn't running properly it appears. Once that was fixed this script is now working. Thanks for all your assist! |
great! please comment here if you think the details of your solution are something anyone else could benefit from. best luck! |
I am able to get the script to run with the API key I generated and it seems to be able to pull current alerts in Nagios when I run it in debug, but the acknowledgement posted on PagerDuty never makes its way back to Nagios? Would be great if I can get some insight on how to make this work as we'd prefer not to have to open up Nagios portal on WAN side.
The text was updated successfully, but these errors were encountered: