Reconstructing the disk order in RAID 6 with 7 disks

Question

a little background to this question first: I am running a RAID-6 within a QNAP TS869L external RAID/NAS system. I started with 5 disks of 3 TB each back in the day, and later added another 2 disks of 3TB to the RAID. The QNAP internals handled the growing and re-syncing etc, and everything seemd to be perfectly fine.

About 2 weeks ago, I had one of the disks (disk #5, disk #2 has gone bad in the mean time) fail, and somehow (I have no idea why), also disks 1 and 2 got kicked out of the array. I replaced disk #5, but the RAID didn't start working again.

After some calls to QNAP technical support, they re-created the array (using mdadm --create --force --assume-clean ...), but the resulting array couldn't find a filesystem, and I was kindly referred to contact a data recovery company that I can't afford.

After some digging through old log files, resetting the disk to factory default, etc, I found a few errors that were made during this re-create - I wish I still had some of the original metadata, but unfortunately i don't (I definitely learned that lesson).

I'm currently at the point where I know the correct chunk-size (64K), metadata-version (1.0; factory default was 0.9, but from what I read 0.9 doesn't handle disks over 2 TB, mine are 3 TB), and I now find the ext4 filesystem that should be on the disks.

Only variable left to determine is the right disk order!

I started using the description found in answer #4 of "Recover RAID 5 data after created new array instead of re-using" but am a little confused on what the order should be for a proper RAID-6. RAID-5 is pretty well documented in a number of places, but RAID-6 much less so.

Also, does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks, or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been?

Thanks

some more mdadm output that might be helpful:

mdadm version:

[~] # mdadm --version mdadm - v2.6.3 - 20th August 2007

mdadm details from one of the disks in the array:

[~] # mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Name : 0 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Raid Devices : 7 Used Dev Size : 5857395112 (2793.02 GiB 2998.99 GB) Array Size : 29286975360 (13965.12 GiB 14994.93 GB) Used Size : 5857395072 (2793.02 GiB 2998.99 GB) Super Offset : 5857395368 sectors State : clean Device UUID : 7c572d8f:20c12727:7e88c888:c2c357af Update Time : Tue Jun 10 13:01:06 2014 Checksum : d275c82d - correct Events : 7036 Chunk Size : 64K Array Slot : 0 (0, 1, failed, 3, failed, 5, 6) Array State : Uu_u_uu 2 failed

mdadm details for the array in the current disk-order (based on my best guess reconstructed from old log-files)

[~] # mdadm --detail /dev/md0 /dev/md0: Version : 01.00.03 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Array Size : 14643487680 (13965.12 GiB 14994.93 GB) Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB) Raid Devices : 7 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jun 10 13:01:06 2014 State : clean, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K Name : 0 UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Events : 7036 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 0 0 2 removed 3 8 51 3 active sync /dev/sdd3 4 0 0 4 removed 5 8 99 5 active sync /dev/sdg3 6 8 83 6 active sync /dev/sdf3

output from /proc/mdstat (md8, md9, and md13 are internally used RAIDs holding swap, etc; the one I'm after is md0)

[~] # more /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] md0 : active raid6 sdf3[6] sdg3[5] sdd3[3] sdb3[1] sda3[0] 14643487680 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/5] [UU_U_UU] md8 : active raid1 sdg2[2](S) sdf2[3](S) sdd2[4](S) sdc2[5](S) sdb2[6](S) sda2[1] sde2[0] 530048 blocks [2/2] [UU] md13 : active raid1 sdg4[3] sdf4[4] sde4[5] sdd4[6] sdc4[2] sdb4[1] sda4[0] 458880 blocks [8/7] [UUUUUUU_] bitmap: 21/57 pages [84KB], 4KB chunk md9 : active raid1 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sda1[0] sdb1[1] 530048 blocks [8/7] [UUUUUUU_] bitmap: 37/65 pages [148KB], 4KB chunk unused devices: <none>

When you ask "does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks" and say "or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been?" it's actually the same question twice. The answer is Yes & Yes. The layout does change when you grow the array (the first Yes) and when complete the resync has adjusted the layout to be the way a native 7 disk RAID6 would have been (the second Yes). — Ian Macintosh
– Ian Macintosh, Commented Jun 11, 2014 at 15:12
Are you saying you lost Disks #1,2, and 5 in the array at the same time? — Rex
– Rex, Commented Jun 11, 2014 at 15:25
First, I lost disk #5, this is what triggered this mess. #1 & #2 got somehow kicked out of the array at the same time, or at least were marked as "missing" by the time I installed the replacement for #5. Both drives were working just fine at the time. In the mean time, #2 has developed some read-issues, so will be replaced soon as well. #1 is perfectly fine, as far as I can tell. — rkotulla
– rkotulla, Commented Jun 11, 2014 at 18:15
That should have been #3 with the read-issues in the comment above, that's why #3 and #5 are listed as "missing" in the mdadm output. — rkotulla
– rkotulla, Commented Jun 11, 2014 at 18:21
You said "best guess reconstructed from old log files"? Share the relevant portion of these old log files with us! :-) — Ian Macintosh
– Ian Macintosh, Commented Jun 12, 2014 at 15:06

Ian Macintosh · Accepted Answer · 2014-06-11 15:23:16Z

I would suggest using the same order as the other arrays because they most likely were created under the identical conditions as the array in question.

Remember to always "--assume-clean" on any assemble or create - you probably know this well enough but worth re-mentioning.

Ideally you should actually be working off of images (dd) of the original drives not the actual drives themselves. I realise things aren't always ideal :-)

Finally, if you can, "mount -o ro" if you can for just another level of "Don't write to the drives please" security :-)

Good point for the --assume-clean, so far I religiously used it when creating the array. As for using the same order as the other disks, md9 and md13 already have different orders (md9: abcdefg, md13: abdgfed), so which one to pick? I tried running e2fsck -n /dev/md0 in both configurations, and both turned up a TON of checksum errors, etc. — rkotulla
– rkotulla, Commented Jun 11, 2014 at 18:20
It just strikes me that mdadm --monitor normally emails root when the raid suffers a failure. You should have that email? It contains the original disk layout. cat /var/mail/root or check your email logs? — Ian Macintosh
– Ian Macintosh, Commented Jun 12, 2014 at 15:03

Stack Exchange Network

Reconstructing the disk order in RAID 6 with 7 disks

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

Reconstructing the disk order in RAID 6 with 7 disks

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions