0

a little background to this question first: I am running a RAID-6 within a QNAP TS869L external RAID/NAS system. I started with 5 disks of 3 TB each back in the day, and later added another 2 disks of 3TB to the RAID. The QNAP internals handled the growing and re-syncing etc, and everything seemd to be perfectly fine.

About 2 weeks ago, I had one of the disks (disk #5, disk #2 has gone bad in the mean time) fail, and somehow (I have no idea why), also disks 1 and 2 got kicked out of the array. I replaced disk #5, but the RAID didn't start working again.

After some calls to QNAP technical support, they re-created the array (using mdadm --create --force --assume-clean ...), but the resulting array couldn't find a filesystem, and I was kindly referred to contact a data recovery company that I can't afford.

After some digging through old log files, resetting the disk to factory default, etc, I found a few errors that were made during this re-create - I wish I still had some of the original metadata, but unfortunately i don't (I definitely learned that lesson).

I'm currently at the point where I know the correct chunk-size (64K), metadata-version (1.0; factory default was 0.9, but from what I read 0.9 doesn't handle disks over 2 TB, mine are 3 TB), and I now find the ext4 filesystem that should be on the disks.

Only variable left to determine is the right disk order!

I started using the description found in answer #4 of "Recover RAID 5 data after created new array instead of re-using" but am a little confused on what the order should be for a proper RAID-6. RAID-5 is pretty well documented in a number of places, but RAID-6 much less so.

Also, does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks, or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been?

Thanks


some more mdadm output that might be helpful:

mdadm version:

[~] # mdadm --version mdadm - v2.6.3 - 20th August 2007 

mdadm details from one of the disks in the array:

[~] # mdadm --examine /dev/sda3 /dev/sda3: Magic : a92b4efc Version : 1.0 Feature Map : 0x0 Array UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Name : 0 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Raid Devices : 7 Used Dev Size : 5857395112 (2793.02 GiB 2998.99 GB) Array Size : 29286975360 (13965.12 GiB 14994.93 GB) Used Size : 5857395072 (2793.02 GiB 2998.99 GB) Super Offset : 5857395368 sectors State : clean Device UUID : 7c572d8f:20c12727:7e88c888:c2c357af Update Time : Tue Jun 10 13:01:06 2014 Checksum : d275c82d - correct Events : 7036 Chunk Size : 64K Array Slot : 0 (0, 1, failed, 3, failed, 5, 6) Array State : Uu_u_uu 2 failed 

mdadm details for the array in the current disk-order (based on my best guess reconstructed from old log-files)

[~] # mdadm --detail /dev/md0 /dev/md0: Version : 01.00.03 Creation Time : Tue Jun 10 10:27:58 2014 Raid Level : raid6 Array Size : 14643487680 (13965.12 GiB 14994.93 GB) Used Dev Size : 2928697536 (2793.02 GiB 2998.99 GB) Raid Devices : 7 Total Devices : 5 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Tue Jun 10 13:01:06 2014 State : clean, degraded Active Devices : 5 Working Devices : 5 Failed Devices : 0 Spare Devices : 0 Chunk Size : 64K Name : 0 UUID : 1c1614a5:e3be2fbb:4af01271:947fe3aa Events : 7036 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 2 0 0 2 removed 3 8 51 3 active sync /dev/sdd3 4 0 0 4 removed 5 8 99 5 active sync /dev/sdg3 6 8 83 6 active sync /dev/sdf3 

output from /proc/mdstat (md8, md9, and md13 are internally used RAIDs holding swap, etc; the one I'm after is md0)

[~] # more /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] md0 : active raid6 sdf3[6] sdg3[5] sdd3[3] sdb3[1] sda3[0] 14643487680 blocks super 1.0 level 6, 64k chunk, algorithm 2 [7/5] [UU_U_UU] md8 : active raid1 sdg2[2](S) sdf2[3](S) sdd2[4](S) sdc2[5](S) sdb2[6](S) sda2[1] sde2[0] 530048 blocks [2/2] [UU] md13 : active raid1 sdg4[3] sdf4[4] sde4[5] sdd4[6] sdc4[2] sdb4[1] sda4[0] 458880 blocks [8/7] [UUUUUUU_] bitmap: 21/57 pages [84KB], 4KB chunk md9 : active raid1 sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sda1[0] sdb1[1] 530048 blocks [8/7] [UUUUUUU_] bitmap: 37/65 pages [148KB], 4KB chunk unused devices: <none> 
5
  • When you ask "does the layout, i.e. distribution of parity and data chunks across the disks, change after the growing of the array from 5 to 7 disks" and say "or does the re-sync re-organize them in such a way a native 7-disk RAID-6 would have been?" it's actually the same question twice. The answer is Yes & Yes. The layout does change when you grow the array (the first Yes) and when complete the resync has adjusted the layout to be the way a native 7 disk RAID6 would have been (the second Yes). Commented Jun 11, 2014 at 15:12
  • Are you saying you lost Disks #1,2, and 5 in the array at the same time? Commented Jun 11, 2014 at 15:25
  • First, I lost disk #5, this is what triggered this mess. #1 & #2 got somehow kicked out of the array at the same time, or at least were marked as "missing" by the time I installed the replacement for #5. Both drives were working just fine at the time. In the mean time, #2 has developed some read-issues, so will be replaced soon as well. #1 is perfectly fine, as far as I can tell. Commented Jun 11, 2014 at 18:15
  • That should have been #3 with the read-issues in the comment above, that's why #3 and #5 are listed as "missing" in the mdadm output. Commented Jun 11, 2014 at 18:21
  • You said "best guess reconstructed from old log files"? Share the relevant portion of these old log files with us! :-) Commented Jun 12, 2014 at 15:06

1 Answer 1

1

I would suggest using the same order as the other arrays because they most likely were created under the identical conditions as the array in question.

Remember to always "--assume-clean" on any assemble or create - you probably know this well enough but worth re-mentioning.

Ideally you should actually be working off of images (dd) of the original drives not the actual drives themselves. I realise things aren't always ideal :-)

Finally, if you can, "mount -o ro" if you can for just another level of "Don't write to the drives please" security :-)

2
  • Good point for the --assume-clean, so far I religiously used it when creating the array. As for using the same order as the other disks, md9 and md13 already have different orders (md9: abcdefg, md13: abdgfed), so which one to pick? I tried running e2fsck -n /dev/md0 in both configurations, and both turned up a TON of checksum errors, etc. Commented Jun 11, 2014 at 18:20
  • It just strikes me that mdadm --monitor normally emails root when the raid suffers a failure. You should have that email? It contains the original disk layout. cat /var/mail/root or check your email logs? Commented Jun 12, 2014 at 15:03

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.