RAID1
Recovery HowTo Suitable for: e-smith 4.1.2/Mitel SME5 |
Author: Darrell May Contributor: Problem: You want to easily recover from a RAID1 failure. Solution: Implement the steps outlined in the RAID1 Monitor HowTo. Next follow these steps: |
|
STEP 1: Backup your computer!
I can not stress this point strongly enough. Your first priority on a failed RAID1 system should be to perform an immediate backup. So, DO IT NOW! [root@myezserver /root]# /sbin/e-smith/backup |
|
STEP 2: Power down, replace the failed
drive, power up.
First, before we continue, I just want to show you that for testing purposes only, to completely erase a drive, do the following: [root@myezserver /root]# dd if=/dev/zero of=/dev/hdb This will write zeroes across the entire /dev/hdb drive. Remember for all command-line entries in this HowTO to substitute your correct /dev/hdX where: /dev/hda = primary master /dev/hdb = primary slave /dev/hdc = secondary master /dev/hdd = secondary slave |
|
Step 3: Recover the partition
information and use this information to quickly prepare the replacement
drive.
[root@myezserver /root]# cat /root/raidmonitor/sfdisk.out # partition table of /dev/hda unit: sectors /dev/hda1 : start= 63, size= 530082, Id=fd, bootable /dev/hda2 : start= 530145, size=39487770, Id= 5 /dev/hda3 : start= 0, size= 0, Id= 0 /dev/hda4 : start= 0, size= 0, Id= 0 /dev/hda5 : start= 530208, size= 32067, Id=fd /dev/hda6 : start= 562338, size=39455577, Id=fd # partition table of /dev/hdb unit: sectors /dev/hdb1 : start= 63, size= 530082, Id=fd, bootable /dev/hdb2 : start= 530145, size=39487770, Id= 5 /dev/hdb3 : start= 0, size= 0, Id= 0 /dev/hdb4 : start= 0, size= 0, Id= 0 /dev/hdb5 : start= 530208, size= 32067, Id=fd /dev/hdb6 : start= 562338, size=39455577, Id=fd Cut and paste your correct # partition table of /dev/hdX. In my case I am replacing /dev/hdb so this is the information I need to transfer into a file for quick import: [root@myezserver /root]# pico hdb.out Which now contains the following entries, right?: # partition table of /dev/hdb [root@myezserver /root]# sfdisk
/dev/hdb < hdb.out New situation: |
|
STEP 4: Review your last known
good RAID configuration:
[root@myezserver /root]# /usr/local/bin/raidmonitor
-v |
|
STEP 5: Add your newly prepared
and correctly partitioned hard drive into the RAID1 array. You use
the information above as your guide:
[root@myezserver /root]# /sbin/raidhotadd
/dev/md2 /dev/hdb1 |
|
STEP 6: Use raidmonitor to watch
the recovery process. Note this information will also be e-mailed
to root every 15 min. until the recovery is completed.
[root@myezserver /root]# /usr/local/bin/raidmonitor
-v |
|
STEP 7: Recover and restore the last known good master boot record (MBR) onto the drive you just replaced: [root@myezserver /root]# /sbin/lilo -C /root/raidmonitor/lilo.conf -b /dev/hdb |
|
STEP 8: Shutdown the server,
reboot and test the RAID functions
If you have the time, you should test the RAID functionality to make sure the server will boot under simulated hdd failures.
OK, now you can confidently say your ready for anything. Remember if anything goes wrong here, you simply reconnect all the hardware, perform a fresh RAID install and then restore from your backup tape. You did perform STEP 1 correct? |
|
STEP 9: When all looks well, re-initialze raidmonitor: [root@myezserver /root]# /usr/local/bin/raidmonitor -iv |
|
STEP 10: Go have drink. Job well done ;-> |
|