2

I have a server with HP Smart Array hardware RAID controller. To monitor its status, I use cpqarrayd. /etc/default/cpqarrayd contains DAEMON_OPTS="-t localhost:162" to send SNMP traps when something happens. Traps are handled by snmptrapd, /etc/snmp/snmptrapd.conf contains

disableAuthorization yes traphandle default mailx -s "SNMP Trap" [email protected] 

The e-mails recieved this way contain SNMP Traps, but they are not human-readable, and it's impossible to tell what they are about, or whether they were issued by cpqarrayd or not. Is it possible to send human-readable e-mails when RAID status changes?

Solution

The following script placed in cron.hourly:

#!/bin/sh CCISS_DEVICE=/dev/cciss/c0d1 STATUS_FILE=/var/cciss_vol_status TMP_FILE=$TMPDIR/status-$$.$RANDOM mv $STATUS_FILE $TMP_FILE cciss_vol_status $CCISS_DEVICE >$STATUS_FILE if ! cmp -s $STATUS_FILE $TMP_FILE ; then mailx -s "CCISS status changed" [email protected] <$STATUS_FILE fi rm $TMP_FILE 
1
  • 1
    What server model is this? Commented Apr 25, 2014 at 12:18

2 Answers 2

1

First, see: How do I get my HP servers to email me when a drive fails?

In short, the HP SNMP Management Agents that are installed as part of the Service Pack for ProLiant or Management Component Pack (Debian) will provide you the proper alerts for the system's health. This includes traps for disks, array controller, fan, temperature, power supplies, ILO, NICs, etc.

This is fully supported under Debian. You will find the downloads in the HP Software Delivery Repository.

Two parts to this (configured automatically by the installer):

In your snmpd.conf file:

# Following entries were added by HP Insight Management Agents at # Thu Mar 18 04:14:43 PDT 2010 dlmod cmaX /usr/lib64/libcmaX64.so 

That registers the HP health agents with SNMP.

And the /opt/hp/hp-snmp-agents/cma.conf file:

############################################################ # # cma.conf: HP Insight Management Agents configuration file # ############################################################ ######################################################################## # trapemail is used for configuring email command(s) which will be # executed whenever a SNMP trap is generated. # Multiple trapemail lines are allowed. # Note: any command that reads standard input can be used. For example: #             trapemail /usr/bin/logger #       will log trap messages into system log (/var/log/messages). ######################################################################## trapemail /bin/mail -s 'HP Insight Management Agents Trap Alarm' [email protected] 

Typical RAID alert emails will look like:

Trap-ID=3040 Accelerator Board Battery status change, slot number: 1. Battery failed. Status: Failed.. 

or

Trap-ID=3034 Logical Drive Status Change: Slot 1, Drive: 2.Status is now Rebuilding. 

or

Trap-ID=3034 Logical Drive Status Change: Slot 1, Drive: 1.Status is now OK. 

EDIT:

It appears you're having difficulty with a 100-series ProLiant, HP Health agents and Debian. This is a supported solution, but depending on how you've installed and configured the solution, you may have problems. Given that, you can probably just install the cciss_vol_status utility and run a periodic check via cron.

9
  • Installing hp-health fails for me with Error: No supported management controller found. Probably, dl180 is not supported. What's funny, uninstalling it fails with the same eroor. Commented Apr 28, 2014 at 6:36
  • @MichaelIvko Please provide the specific server model and generation, plus your OS distribution and version. Commented Apr 28, 2014 at 10:57
  • HP Proliant DL180, Debian Wheezy Commented Apr 28, 2014 at 11:14
  • @MichaelIvko It doesn't matter. See the following link and adapt for your Debian purposes. Commented Apr 28, 2014 at 11:31
  • I have a different error, same as in this unanswered question. There's no segfault, and mcelogd isn't running. And I am wary of relying on a software that cannot even be uninstalled without modifying an initscript by hand. I'll probably have to figure out another solution. Commented Apr 28, 2014 at 12:42
0

snmptt (SNMP Trap Translator) is a great little tool for this. You can teach it typical OIDs and messages and translate them to some sensible message. Take a look and see if it's any good for your needs.

EDIT: Oh, if you don't already have, go and download an SNMP MIB for your device and put it in /usr/share/snmp/mibs directory. Then restart the snmpd and snmptrapd.

2
  • But how do I know which OIDs and messages correspond to cpqarrayd? Commented Apr 25, 2014 at 11:07
  • 1
    Modified my reply a bit; MIB files do provide human readable explanations for OIDs, so if you don't have one already, those custom MIBs sure look like numeric. :D Commented Apr 25, 2014 at 11:16

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.