RAID1 Monitor HowTo
 Suitable for: e-smith 4.1.2/Mitel SME5
Author:  Darrell May
Contributor:  

Problem:  You want to enable automatic monitoring of your RAID1 array.

Solution:  dmc-mitel-raidmonitor-0.0.1-5.noarch.rpm


STEP 1:  install the rpm

[root@e-smith]# rpm -ivh dmc-mitel-raidmonitor-0.0.1-5.noarch.rpm


STEP 2:  initialize and activate raidmonitor

[root@e-smith]# /usr/local/bin/raidmonitor -iv

A report similar to the following will print to the display:

...................
RAID Monitor Report
...................

Current /proc/mdstat configuration saved in /root/raidmonitor/mdstat:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdb1[1] hda1[0] 264960 blocks [2/2] [UU]
md0 : active raid1 hdb5[1] hda5[0] 15936 blocks [2/2] [UU]
md1 : active raid1 hdb6[1] hda6[0] 19727680 blocks [2/2] [UU]
unused devices: <none>

Current partition configuration saved in /root/raidmonitor/sfdisk.out:

# partition table of /dev/hda
unit: sectors

/dev/hda1 : start=       63, size=  530082, Id=fd, bootable
/dev/hda2 : start=   530145, size=39487770, Id= 5
/dev/hda3 : start=        0, size=       0, Id= 0
/dev/hda4 : start=        0, size=       0, Id= 0
/dev/hda5 : start=   530208, size=   32067, Id=fd
/dev/hda6 : start=   562338, size=39455577, Id=fd
# partition table of /dev/hdb
unit: sectors

/dev/hdb1 : start=       63, size=  530082, Id=fd, bootable
/dev/hdb2 : start=   530145, size=39487770, Id= 5
/dev/hdb3 : start=        0, size=       0, Id= 0
/dev/hdb4 : start=        0, size=       0, Id= 0
/dev/hdb5 : start=   530208, size=   32067, Id=fd
/dev/hdb6 : start=   562338, size=39455577, Id=fd

Current cron entry is:

15 * * * * root /usr/local/bin/raidmonitor -v

Stopping crond:                                            [   OK   ]
Starting crond:                                            [   OK   ]

STEP 3: Ok what's next?

For usage simply enter raidmonitor without args:

[root@e-smith /root]# /usr/local/raidmonitor
usage: /usr/local/bin/raidmonitor [-iv]
-i initialize raidmonitor
-v verify RAID configuration matches and displays ALARM message if mismatch found

Let me explain what we are doing here.

  1. /root/raidmonitor/mdstat is a snapshot of your last known good /proc/mdstat RAID config.
  2. raidmonitor is now running as a cron job and checks every 15 minutes to compare if the current /proc/mdstat matches the last known good snapshot /root/raidmonitor/mdstat.
  3. If there is any change in /proc/mdstat than your RAID has experienced a problem and root is e-mailed an !ALARM! RAID configuration problem error message.  Here is an example:
ALARM! RAID configuration problem

Current configuration is:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda1[0] 264960 blocks [2/1] [U_]
md0 : active raid1 hda5[0] 15936 blocks [2/1] [U_]
md1 : active raid1 hda6[0] 19727680 blocks [2/1] [U_]
unused devices: <none>

Last known good configuration was:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdb1[1] hda1[0] 264960 blocks [2/2] [UU]
md0 : active raid1 hdb5[1] hda5[0] 15936 blocks [2/2] [UU]
md1 : active raid1 hdb6[1] hda6[0] 19727680 blocks [2/2] [UU]
unused devices: <none>

As you can see, this shows in our current configuration that we have lost our second RAID1 ide drive [2/1] [U_].  Note if we had lost our first drive, these would be reversed [1/2] [_U].

On to recovery.....