Showing posts with label s3. Show all posts
Showing posts with label s3. Show all posts

Monday, May 7, 2012

debugging nagios remote nrpe commands

Nagios NRPE debugging steps:
  1. run the command manually on the target host
  2. enable debugging in nrpe.cfg and watch syslog
  3. dig in deeper with debug jobs.
Debugging nagios remote nrpe commands can feel very opaque. Normally I find my issue in step 1 of the debug escalation. Today I had to hit all three steps while debugging a test that wrapped around s3cmd. I eventually found that HOME is not set in the environment used to run nrpe commands.
check_nrpe -H 10.7.202.92 -c check_ui_s3_backup                                 
NRPE: Unable to read output
I run my check remotely and receive the dreaded general error "unable to read output." This means the script failed to run and didn't produce any output to STDOUT. STDERR seems to be ignored, even with logging enabled.

Step 1a: go to the server and verify the command being run by nrpe.

[andrew@ip-10-7-202-92]% grep check_ui_s3_backup /etc/nagios/nrpe.d/herbie.cfg
command[check_ui_s3_backup]=HOME=~postgres /usr/lib/nagios/plugins/herbie/check_ui_s3_backup
Step 1b: run the command manually. Here I find that the script fails if I don't have the config file:
[andrew@ip-10-7-202-92]% /usr/lib/nagios/plugins/herbie/check_ui_s3_backup
ERROR: /home/andrew/.s3cfg: No such file or directory
ERROR: Configuration file not available.
ERROR: Consider using --configure parameter to create one.
That should be a simple fix. Find the user running the nrpe command and give them a .s3cfg. Easy-Peasy.
cp .s3cfg ~nagios/
sudo -u nagios -H /usr/lib/nagios/plugins/herbie/check_ui_s3_backup
OK - Last backup 0 days ago.
Ok, it works locally. Recheck it remotely. It fails?!!?! This is where we start the gnashing of teeth and pulling of hair
[andrew@ip-10-7-203-10]% check_nrpe -H 10.7.202.92 -c check_ui_s3_backup
NRPE: Unable to read output
Step 2: Enable logging in nrpe.cfg, run the remote check and inspect the logs. Surprise, nothing useful.
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Connection from 10.7.203.10 port 15286
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Host address is in allowed_hosts
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Handling the connection...
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Host is asking for command 'check_ui_s3_backup' to be run...
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Running command: /usr/lib/nagios/plugins/herbie/check_ui_s3_backup
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Command completed with return code 3 and output: 
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Return Code: 3, Output: NRPE: Unable to read output
May  7 19:08:29 ip-10-7-202-92 nrpe[17159]: Connection from 10.7.203.10 closed.
Step 3: debug jobs (aka printf aka "hail marry" debugging). I create two new nrpe entries and restart nagios-nrpe-server. The first will show me the user running the command and the second will show the environment, using whoami and env respectively.
command[check_ui_test]=whoami
command[check_ui_test2]=env
[andrew@ip-10-7-203-10]% check_nrpe -H 10.7.202.92 -c check_ui_test
nagios
[andrew@ip-10-7-203-10]% check_nrpe -H 10.7.202.92 -c check_ui_test2
NRPE_PROGRAMVERSION=2.12
TERM=screen-256color-bce
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
LANG=en_US.UTF-8
NRPE_MULTILINESUPPORT=1
PWD=/
Yep, the tests are running as the expected user, nagios.

Holy Schmoly! Look at that environment! HOME is not set. Simple enough to fix for my check, but wow was I not expecting that. Also useful to note the minimal PATH and that the working directory is /.

Update check to explicitly set HOME in the environment and restart nrpe:

command[check_ui_s3_backup]=HOME=~nagios /usr/lib/nagios/plugins/herbie/check_ui_s3_backup
Restart nrpe:
[andrew@ip-10-7-202-92]% sudo service nagios-nrpe-server restart
 * Stopping nagios-nrpe nagios-nrpe
   ...done.
 * Starting nagios-nrpe nagios-nrpe
   ...done.
Check:
[andrew@ip-10-7-203-10]% check_nrpe -H 10.7.202.92 -c check_ui_s3_backup
OK - Last backup 0 days ago.