Feature #1012

xymon scripts that query /dev/disk/by-id look at the wrong disks when a raid disk is present

Added by thekingofspain almost 7 years ago. Updated almost 6 years ago.

Status:Closed Start date:05/06/2017
Priority:Normal Due date:
Assignee:- % Done:

100%

Category:- Spent time: -
Target version:-

Description

the xymon-smart.sh and xymon-hddtmp.sh scripts have the following like the query disks:

ls /dev/disk/by-id/* | grep ve '-part' -ve '/wwn' |

the line should be the following to remove the mdadm raid devices:

ls /dev/disk/by-id/* | grep ve '-part' -ve '/wwn' ve '/md' |

In addition an the disks being used in mdadm raid are not going to to be listed in the subsequent mount query thus
an addition check should be added to agaist the /proc/mdstat for the mounted raid disk

#check if device is directly mounted
if ! mount | grep -q /dev/$DISKDEV
then # check if device is mounted by mdadm
if ! cat /proc/mdstat | grep -q $DISKDEV
then
continue
fi
fi

Related issues

related to LinHES - Bug #1013: xymon scripts that query /dev/disk/by-id look at the wron... Closed 05/06/2017
related to LinHES - Bug #1014: xymon disk script skip disks used in raid and list the ra... Closed 05/06/2017

Associated revisions

Revision 168df166
Added by brfransen almost 6 years ago

xymon: closes #1012

History

I tried to attach two files on the submit of this defect and received a web error on the submit. This then caused two other duplicate defects to be created. Attaching my versions of the files below.

xymon-hddtemp.sh
#!/bin/sh

# NOTE: Must be run as root, so you probably need to setup sudo for this.

ls /dev/disk/by-id/* | grep -ve '-part' -ve '/md-' -ve '/wwm- ' |
while read DISK
do
    DISKDEV=`ls -l $DISK | awk -F/ '{print $NF}'`
    DISKNAME=`echo $DISK | awk -F/ '{print $5}' | tr ":" "_"`

    #check if device is optical
    if [[ $DISKDEV == "sr"* ]]
    then
        continue
    fi

    #check if device is mounted
    if ! mount | grep -q /dev/$DISKDEV
    then
        # check if device is used by mdadm
        if ! cat /proc/mdstat | grep -q $DISKDEV
        then
            continue
        fi
    fi

    #check if SMART is disabled and enable
    DRES=`sudo /usr/bin/smartctl -A $DISK`
    if [[ $DRES == *"SMART Disabled. Use option -s with argument 'on'"* ]]
    then
        sudo /usr/bin/smartctl -s on $DISK
        DRES=`sudo /usr/bin/smartctl -A $DISK`
    fi

    hddtemp=`echo "$DRES" | grep Temperature_Celsius | awk '{print $10}'`

    TEMP=": $hddtemp" 
    if [[ $hddtemp == "" ]]
    then
        TEMP="- No Temp Sensor Found" 
        COLOR="4&clear" 
    elif test $hddtemp -gt 55
    then
        COLOR="1&red" 
    elif test $hddtemp -ge 50
    then
        COLOR="2&yellow" 
    else
        COLOR="3&green" 
    fi

    echo "${COLOR} $DISKNAME $TEMP" 

done > /tmp/hddcheck

COLOR=`cat /tmp/hddcheck | awk '{print $1}' | sort | uniq | head -1 | cut -c3-`

# Report status to Xymon Server
$XYMON $XYMSRV "status ${MACHINE}.hddtemp ${COLOR} Hard Drive Temperatures (in &degC)

xymon-smart.sh
#!/bin/sh

# NOTE: Must be run as root, so you probably need to setup sudo for this.

if test -f /tmp/dres; then rm -f /tmp/dres; fi

ls /dev/disk/by-id/* | grep -ve '-part' -ve '/md-' -ve '/wwn-' |
while read DISK
do
    DISKDEV=`ls -l $DISK | awk -F/ '{print $NF}'`

    #check if device is optical
    if [[ $DISKDEV == "sr"* ]]
    then
        continue
    fi

    #check if device is directly mounted
    if ! mount | grep -q /dev/$DISKDEV
    then
        # check if device is used by mdadm
        if ! cat /proc/mdstat | grep -q $DISKDEV
        then
            continue
        fi
    fi

    DRES=`sudo /usr/bin/smartctl -H -n standby $DISK`
    DCODE=$?

    #check if SMART is disabled and enable
    if [[ $DRES == *"SMART Disabled. Use option -s with argument 'on'"* ]]
    then
        sudo /usr/bin/smartctl -s on $DISK
        DRES=`sudo /usr/bin/smartctl -H -n standby $DISK`
        DCODE=$?
    fi

    DSTBY=$(( $DCODE & 2 ))
    DFAIL=$(( $DCODE & 8 ))
    DWARN=$(( $DCODE & 32 ))

    if test $DSTBY -ne 0
    then
        COLOR="4&clear" 
    elif test $DFAIL -ne 0
    then
        COLOR="1&red" 
    elif test $DWARN -ne 0
    then
        COLOR="2&yellow" 
    else
        COLOR="3&green" 
    fi

    echo "${COLOR} $DISK (/dev/$DISKDEV)" 

    echo "${COLOR} $DISK (/dev/$DISKDEV)" | cut -c2- >>/tmp/dres
    echo "" >>/tmp/dres
    echo "$DRES" | egrep -v "^smartctl|^Copyright|^$|^===" >>/tmp/dres
    echo "-----------------------------------------------------------------------------" >>/tmp/dres
    echo "" >>/tmp/dres
    echo "" >>/tmp/dres
done >/tmp/dcheck

COLOR=`cat /tmp/dcheck | awk '{print $1}' | sort | uniq | head -1 | cut -c3-`

$XYMON $XYMSRV "status ${MACHINE}.smart ${COLOR} SMART Health Check

`cat /tmp/dcheck | cut -c2-`

============================== Detailed status ==============================

`cat /tmp/dres`
" 

rm -f /tmp/dres /tmp/dcheck

exit 0

typo in xymon-hddtmp.sh

should be '/wwn-' vs '/wwm-'
not the baseline version of the xymon-hddtmp.sh script did not have the wwn filter but the xymon-smart.sh did.

Updated by brfransen almost 7 years ago

  • Tracker changed from Bug to Feature

Updated by brfransen almost 6 years ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

Also available in: Atom PDF