4

I have a SAN system with 10 drive slots setup with Software RAID10, and all the md0-5 added into a single volume group. The SATA port in slot 10 recently failed and will not accept any drive we put in there. I'm extremely nervous about leaving drive 9 unmirrored. My proposed solution is to add a USB external drive (matching size and manufacturer to drive #9) to the server and assign that as the RAID1 partner for #9. I realize that USB is going to be much slower than SATA, but I am more concerned about data protection than drive speed.

Does anyone see any issues with that plan (other than performance)?

cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdj1[1] 976759936 blocks [2/1] [U_]

md3 : active raid1 sdc1[1] sda1[0] 976759936 blocks [2/2] [UU]

md2 : active raid1 sdh1[1] sdg1[0] 976759936 blocks [2/2] [UU]

md4 : active raid1 sdi1[0] sde1[1] 976759936 blocks [2/2] [UU]

md0 : active raid1 sdf1[0] sdb1[1] 976759936 blocks [2/2] [UU]

1
  • 1
    Not to be an obnoxious pedant, but I'd be extremely hesitant to call a Linux software RAID configuration a "SAN system." Commented Nov 24, 2010 at 16:05

3 Answers 3

5

RAID10 is a RAID0 of RAID1 arrays you would end up with just one volume in the end, so you would have one physical volume to give to LVM. Like so:

 LV1 LV2 \__________\___________.... | VG | PV | ______________________MD5________________________ / / | \ \ _MD0_ _MD1_ _MD2_ _MD3_ _MD4_ / \ / \ / \ / \ / \ D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 

What you describe with "all the md0-5 added into a single volume group" sounds like 5 separate RAID1 (or RAID10 - the RAID10 driver essentially acts as RAID1 for arrays of two drives) arrays which you have added to LVM separately, so you have a volume group consisting of 5 physical volumes. Like so:

 LV1 LV2 \__________\___________.... | ______________________VG_________________________ / / | \ \ PV1 PV2 PV3 PV4 PV5 | | | | | _MD0_ _MD1_ _MD2_ _MD3_ _MD4_ / \ / \ / \ / \ / \ D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 

(this isn't actually RAID10 (RAID-1-then-0) it is RAID-1-then-JBOD)

Is this the case?

If so then you could instead just remove PV5 from the volume group, assuming there is enough free space in the system in total and the filesystems you have support being resized (i.e. et2/3/4 with resize2fs) if needed:

  1. Reduce the filesystems and the logical volumes that contain them until there is at least enough free space in the volume group to fill PV5, unless there is already enough free space in the volume group.
  2. Use pvmove to move all block allocated to that physical volume by LVM to others
  3. (optional) Use vgreduce to remove that PV5 from the volume group

Now the broken array is not part of the LVM setup. You can add it back once you have fixed the situation so that RAID1 pair is no longer running degraded.

To actually answer you question...

Other than performance issues, which you've already identified, and the chance of a USB drive being accidentally disconnected (which is unlikely if the machine that hosts your SAN is safely out of the way of humans an other disturbances) I see no problem with replacing your disk 10 with one connected via USB.

If the machine that hosts your SAN has a spare PCI or PCI-E slot, I would instead suggest taking that route, adding an extra SATA controller to hang the drive off. If you get a controller that offers five ports (or can fit in two cards that offer five or more in total) I would be tempted to split the drives up so each pair has one drive connected to the motherboard and one connected to the add-on controller - that way your whole array has more chance of surviving a motherboard controller failure that kills all the drives attached to it (a very very rare occurrence, but it could happen).

In either case, if you do have five separate arrays each as a physical volume to LVM (not as one array so one PV in LVM), I would recommend getting the data off the degraded pair at least temporarily unless you can add the replacement drive right now.

(To confirm the layout you have, it would be worth rewording your question and/or adding the output of the commands cat /proc/mdstat, pvs, vgs and lvs.)

4
  • cat /proc/mdstat Personalities : [raid1] md4 : active raid1 sdj1[1] 976759936 blocks [2/1] [U_] md3 : active raid1 sdc1[1] sda1[0] 976759936 blocks [2/2] [UU] md2 : active raid1 sdh1[1] sdg1[0] 976759936 blocks [2/2] [UU] md4 : active raid1 sdi1[0] sde1[1] 976759936 blocks [2/2] [UU] md0 : active raid1 sdf1[0] sdb1[1] 976759936 blocks [2/2] [UU] Commented Nov 24, 2010 at 18:12
  • Add that to your question, then you can format it nicely and there is less chance of other potential responders missing the detail. Commented Nov 24, 2010 at 18:15
  • I inherited the device so I'm still coming up to speed with how it was originally built, but mdstat shows 5 RAID1 arrays (shown above). lvscan shows a single LVM with md0-4 combined into a single logical volume, which is then associated with a volume group. fdisk -l only shows the md0-4, so I suspect it is setup like your scenario #2. Commented Nov 24, 2010 at 18:23
  • Sounds like it. The original creator(s) may have started with one pair in an array and added the other pairs as time went on. Or just created it this way (rather than a RAID0 of RAID1s for better performance in a number of load patterns) to allow arrays to be dropped while degraded as I detail above. Commented Nov 24, 2010 at 19:51
1

It's a raid 10, I'd be less concerned about the array's health with one disk dead than by using a USB drive. If it had been raid 5 it might be a different matter but I think you'll be fine without a tenth disk until you get around to fixing your controller - so long as you're sorting that out soon - you are right :)

3
  • The way it is setup, the disks are setup in pairs 0+1, 2+3, etc. That leaves disk 9 unpaired, and if it goes down, all the data on that drive would be lost Commented Nov 24, 2010 at 17:00
  • All the SATA ports are on the motherboard, so the only way I see to repair the one that is not working is to replace the motherboard. That's a bit more risk than I'm willing to take at this point. Commented Nov 24, 2010 at 17:05
  • what motherboard is it and why do you think it'd be risky to swap? Commented Nov 24, 2010 at 17:19
0

I think the performance will not be good at all. Even worse, the USB drive can be unplugged while the system is writing and/or readying from it.

Can you copy your data on the drive #9 to other mirror drives?

1
  • All the disks are paired up, and the overall array is at about 80% capacity. Besides, the RAID1 pairs are grouped together into a single VolumeGroup, so there is no easy way to just extract the data off the drive. Commented Nov 24, 2010 at 17:03

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.