39

Folks please help - I am a newb with a major headache at hand (perfect storm situation).

I have a 3 1tb hdd on my ubuntu 11.04 configured as software raid 5. The data had been copied weekly onto another separate off the computer hard drive until that completely failed and was thrown away. A few days back we had a power outage and after rebooting my box wouldn't mount the raid. In my infinite wisdom I entered

mdadm --create -f... 

command instead of

mdadm --assemble 

and didn't notice the travesty that I had done until after. It started the array degraded and proceeded with building and syncing it which took ~10 hours. After I was back I saw that that the array is successfully up and running but the raid is not

I mean the individual drives are partitioned (partition type f8 ) but the md0 device is not. Realizing in horror what I have done I am trying to find some solutions. I just pray that --create didn't overwrite entire content of the hard driver.

Could someone PLEASE help me out with this - the data that's on the drive is very important and unique ~10 years of photos, docs, etc.

Is it possible that by specifying the participating hard drives in wrong order can make mdadm overwrite them? when I do

mdadm --examine --scan 

I get something like ARRAY /dev/md/0 metadata=1.2 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b name=<hostname>:0

Interestingly enough name used to be 'raid' and not the host hame with :0 appended.

Here is the 'sanitized' config entries:

DEVICE /dev/sdf1 /dev/sde1 /dev/sdd1 CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root ARRAY /dev/md0 metadata=1.2 name=tanserv:0 UUID=f1b4084a:720b5712:6d03b9e9:43afe51b Here is the output from mdstat cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[0] sdf1[3] sde1[1] 1953517568 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> fdisk shows the following: fdisk -l Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000bf62e Device Boot Start End Blocks Id System /dev/sda1 * 1 9443 75846656 83 Linux /dev/sda2 9443 9730 2301953 5 Extended /dev/sda5 9443 9730 2301952 82 Linux swap / Solaris Disk /dev/sdb: 750.2 GB, 750156374016 bytes 255 heads, 63 sectors/track, 91201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000de8dd Device Boot Start End Blocks Id System /dev/sdb1 1 91201 732572001 8e Linux LVM Disk /dev/sdc: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00056a17 Device Boot Start End Blocks Id System /dev/sdc1 1 60801 488384001 8e Linux LVM Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000ca948 Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/dm-0: 1250.3 GB, 1250254913536 bytes 255 heads, 63 sectors/track, 152001 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/dm-0 doesn't contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x93a66687 Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xe6edc059 Device Boot Start End Blocks Id System /dev/sdf1 1 121601 976760001 fd Linux raid autodetect Disk /dev/md0: 2000.4 GB, 2000401989632 bytes 2 heads, 4 sectors/track, 488379392 cylinders Units = cylinders of 8 * 512 = 4096 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table 

Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all.

Is there any tool that will help me to revive at least some of the data? Can someone tell me what and how the mdadm --create does when syncs to destroy the data so I can write a tool to un-do whatever was done?

After the re-creating of the raid I run fsck.ext4 /dev/md0 and here is the output

root@tanserv:/etc/mdadm# fsck.ext4 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193


Per Shanes' suggestion I tried

root@tanserv:/home/mushegh# mkfs.ext4 -n /dev/md0 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=128 blocks, Stripe width=256 blocks 122101760 inodes, 488379392 blocks 24418969 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=0 14905 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 

and run fsck.ext4 with every backup block but all returned the following:

root@tanserv:/home/mushegh# fsck.ext4 -b 214990848 /dev/md0 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Invalid argument while trying to open /dev/md0 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> 

Any suggestions?

Regards!

4
  • 1
    Perhaps one day people may realise why RAID5 is a terrible idea. Until then, 1) people will lose data. 2) We'll get questions like these. Commented Jan 7, 2012 at 11:13
  • 12
    @Tom O'Connor ... because clearly, RAID5 is to blame for user error. :rolleyes: Commented Jan 7, 2012 at 12:04
  • 2
    Hopefully, Shane's answer can save the data, but, again, proof why RAID alone is not best for storage. Need backups too. (but +1 for the question and epic answer that resulted) Commented Jan 8, 2012 at 12:23
  • 4
    I know it gets repeated often, but raid is not a backup solution. The message really needs driving home. Commented Jan 18, 2012 at 7:53

5 Answers 5

97

Ok - something was bugging me about your issue, so I fired up a VM to dive into the behavior that should be expected. I'll get to what was bugging me in a minute; first let me say this:

Back up these drives before attempting anything!!

You may have already done damage beyond what the resync did; can you clarify what you meant when you said:

Per suggestions I did clean up the superblocks and re-created the array with --assume-clean option but with no luck at all.

If you ran a mdadm --misc --zero-superblock, then you should be fine.

Anyway, scavenge up some new disks and grab exact current images of them before doing anything at all that might do any more writing to these disks.

dd if=/dev/sdd of=/path/to/store/sdd.img 

That being said.. it looks like data stored on these things is shockingly resilient to wayward resyncs. Read on, there is hope, and this may be the day that I hit the answer length limit.


The Best Case Scenario

I threw together a VM to recreate your scenario. The drives are just 100 MB so I wouldn't be waiting forever on each resync, but this should be a pretty accurate representation otherwise.

Built the array as generically and default as possible - 512k chunks, left-symmetric layout, disks in letter order.. nothing special.

root@test:~# mdadm --create /dev/md0 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[3] sdc1[1] sdb1[0] 203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> 

So far, so good; let's make a filesystem, and put some data on it.

root@test:~# mkfs.ext4 /dev/md0 mke2fs 1.41.14 (22-Dec-2010) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=512 blocks, Stripe width=1024 blocks 51000 inodes, 203776 blocks 10188 blocks (5.00%) reserved for the super user First data block=1 Maximum filesystem blocks=67371008 25 block groups 8192 blocks per group, 8192 fragments per group 2040 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 30 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. root@test:~# mkdir /mnt/raid5 root@test:~# mount /dev/md0 /mnt/raid5 root@test:~# echo "data" > /mnt/raid5/datafile root@test:~# dd if=/dev/urandom of=/mnt/raid5/randomdata count=10000 10000+0 records in 10000+0 records out 5120000 bytes (5.1 MB) copied, 0.706526 s, 7.2 MB/s root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Ok. We've got a filesystem and some data ("data" in datafile, and 5MB worth of random data with that SHA1 hash in randomdata) on it; let's see what happens when we do a re-create.

root@test:~# umount /mnt/raid5 root@test:~# mdadm --stop /dev/md0 mdadm: stopped /dev/md0 root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none> root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 21:07:06 2012 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 21:07:06 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 21:07:06 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sdd1[2] sdc1[1] sdb1[0] 203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> 

The resync finished very quickly with these tiny disks, but it did occur. So here's what was bugging me from earlier; your fdisk -l output. Having no partition table on the md device is not a problem at all, it's expected. Your filesystem resides directly on the fake block device with no partition table.

root@test:~# fdisk -l ... Disk /dev/md1: 208 MB, 208666624 bytes 2 heads, 4 sectors/track, 50944 cylinders, total 407552 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 524288 bytes / 1048576 bytes Disk identifier: 0x00000000 Disk /dev/md1 doesn't contain a valid partition table 

Yeah, no partition table. But...

root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) /dev/md1: clean, 12/51000 files, 12085/203776 blocks 

Perfectly valid filesystem, after a resync. So that's good; let's check on our data files:

root@test:~# mount /dev/md1 /mnt/raid5/ root@test:~# cat /mnt/raid5/datafile data root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Solid - no data corruption at all! But this is with the exact same settings, so nothing was mapped differently between the two RAID groups. Let's drop this thing down before we try to break it.

root@test:~# umount /mnt/raid5 root@test:~# mdadm --stop /dev/md1 

Taking a Step Back

Before we try to break this, let's talk about why it's hard to break. RAID 5 works by using a parity block that protects an area the same size as the block on every other disk in the array. The parity isn't just on one specific disk, it's rotated around the disks evenly to better spread read load out across the disks in normal operation.

The XOR operation to calculate the parity looks like this:

DISK1 DISK2 DISK3 DISK4 PARITY 1 0 1 1 = 1 0 0 1 1 = 0 1 1 1 1 = 0 

So, the parity is spread out among the disks.

DISK1 DISK2 DISK3 DISK4 DISK5 DATA DATA DATA DATA PARITY PARITY DATA DATA DATA DATA DATA PARITY DATA DATA DATA 

A resync is typically done when replacing a dead or missing disk; it's also done on mdadm create to assure that the data on the disks aligns with what the RAID's geometry is supposed to look like. In that case, the last disk in the array spec is the one that is 'synced to' - all of the existing data on the other disks is used for the sync.

So, all of the data on the 'new' disk is wiped out and rebuilt; either building fresh data blocks out of parity blocks for what should have been there, or else building fresh parity blocks.

What's cool is that the procedure for both of those things is the exact same: an XOR operation across the data from the rest of the disks. The resync process in this case may have in its layout that a certain block should be a parity block, and think it's building a new parity block, when in fact it's re-creating an old data block. So even if it thinks it's building this:

DISK1 DISK2 DISK3 DISK4 DISK5 PARITY DATA DATA DATA DATA DATA PARITY DATA DATA DATA DATA DATA PARITY DATA DATA 

...it may just be rebuilding DISK5 from the layout above.

So, it's possible for data to stay consistent even if the array's built wrong.


Throwing a Monkey in the Works

(not a wrench; the whole monkey)

Test 1:

Let's make the array in the wrong order! sdc, then sdd, then sdb..

root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdc1 /dev/sdd1 /dev/sdb1 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:06:34 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:06:34 2012 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:06:34 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sdb1[3] sdd1[1] sdc1[0] 203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> 

Ok, that's all well and good. Do we have a filesystem?

root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> 

Nope! Why is that? Because while the data's all there, it's in the wrong order; what was once 512KB of A, then 512KB of B, A, B, and so forth, has now been shuffled to B, A, B, A. The disk now looks like jibberish to the filesystem checker, it won't run. The output of mdadm --misc -D /dev/md1 gives us more detail; It looks like this:

Number Major Minor RaidDevice State 0 8 33 0 active sync /dev/sdc1 1 8 49 1 active sync /dev/sdd1 3 8 17 2 active sync /dev/sdb1 

When it should look like this:

Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 3 8 49 2 active sync /dev/sdd1 

So, that's all well and good. We overwrote a whole bunch of data blocks with new parity blocks this time out. Re-create, with the right order now:

root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:11:08 2012 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:11:08 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:11:08 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) /dev/md1: clean, 12/51000 files, 12085/203776 blocks 

Neat, there's still a filesystem there! Still got data?

root@test:~# mount /dev/md1 /mnt/raid5/ root@test:~# cat /mnt/raid5/datafile data root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Success!

Test 2

Ok, let's change the chunk size and see if that gets us some brokenness.

root@test:~# umount /mnt/raid5 root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:21:19 2012 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:21:19 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:21:19 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> 

Yeah, yeah, it's hosed when set up like this. But, can we recover?

root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:21:51 2012 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:21:51 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:21:51 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) /dev/md1: clean, 12/51000 files, 12085/203776 blocks root@test:~# mount /dev/md1 /mnt/raid5/ root@test:~# cat /mnt/raid5/datafile data root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Success, again!

Test 3

This is the one that I thought would kill data for sure - let's do a different layout algorithm!

root@test:~# umount /mnt/raid5 root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --layout=right-asymmetric --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:32:34 2012 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:32:34 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:32:34 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sdd1[3] sdc1[1] sdb1[0] 203776 blocks super 1.2 level 5, 512k chunk, algorithm 1 [3/3] [UUU] unused devices: <none> root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... Superblock has an invalid journal (inode 8). 

Scary and bad - it thinks it found something and wants to do some fixing! Ctrl+C!

Clear<y>? cancelled! fsck.ext4: Illegal inode number while checking ext3 journal for /dev/md1 

Ok, crisis averted. Let's see if the data's still intact after resyncing with the wrong layout:

root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:33:02 2012 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:33:02 2012 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Sat Jan 7 23:33:02 2012 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) /dev/md1: clean, 12/51000 files, 12085/203776 blocks root@test:~# mount /dev/md1 /mnt/raid5/ root@test:~# cat /mnt/raid5/datafile data root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Success!

Test 4

Let's also just prove that that superblock zeroing isn't harmful real quick:

root@test:~# umount /mnt/raid5 root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1 root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) /dev/md1: clean, 12/51000 files, 12085/203776 blocks root@test:~# mount /dev/md1 /mnt/raid5/ root@test:~# cat /mnt/raid5/datafile data root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Yeah, no big deal.

Test 5

Let's just throw everything we've got at it. All 4 previous tests, combined.

  • Wrong device order
  • Wrong chunk size
  • Wrong layout algorithm
  • Zeroed superblocks (we'll do this between both creations)

Onward!

root@test:~# umount /mnt/raid5 root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1 root@test:~# mdadm --create /dev/md1 --chunk=64 --level=5 --raid-devices=3 --layout=right-symmetric /dev/sdc1 /dev/sdd1 /dev/sdb1 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sdb1[3] sdd1[1] sdc1[0] 204672 blocks super 1.2 level 5, 64k chunk, algorithm 3 [3/3] [UUU] unused devices: <none> root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) fsck.ext4: Superblock invalid, trying backup blocks... fsck.ext4: Bad magic number in super-block while trying to open /dev/md1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> root@test:~# mdadm --stop /dev/md1 mdadm: stopped /dev/md1 

The verdict?

root@test:~# mdadm --misc --zero-superblock /dev/sdb1 /dev/sdc1 /dev/sdd1 root@test:~# mdadm --create /dev/md1 --chunk=512 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1 mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md1 started. root@test:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid5 sdd1[3] sdc1[1] sdb1[0] 203776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] unused devices: <none> root@test:~# fsck.ext4 /dev/md1 e2fsck 1.41.14 (22-Dec-2010) /dev/md1: clean, 13/51000 files, 17085/203776 blocks root@test:~# mount /dev/md1 /mnt/raid5/ root@test:~# cat /mnt/raid5/datafile data root@test:~# sha1sum /mnt/raid5/randomdata 847685a5d42524e5b1d5484452a649e854b59064 /mnt/raid5/randomdata 

Wow.

So, it looks like none of these actions corrupted data in any way. I was quite surprised by this result, frankly; I expected moderate odds of data loss on the chunk size change, and some definite loss on the layout change. I learned something today.


So .. How do I get my data??

As much information as you have about the old system would be extremely helpful to you. If you know the filesystem type, if you have any old copies of your /proc/mdstat with information on drive order, algorithm, chunk size, and metadata version. Do you have mdadm's email alerts set up? If so, find an old one; if not, check /var/spool/mail/root. Check your ~/.bash_history to see if your original build is in there.

So, the list of things that you should do:

  1. Back up the disks with dd before doing anything!!
  2. Try to fsck the current, active md - you may have just happened to build in the same order as before. If you know the filesystem type, that's helpful; use that specific fsck tool. If any of the tools offer to fix anything, don't let them unless you're sure that they've actually found the valid filesystem! If an fsck offers to fix something for you, don't hesitate to leave a comment to ask whether it's actually helping or just about to nuke data.
  3. Try building the array with different parameters. If you have an old /proc/mdstat, then you can just mimic what it shows; if not, then you're kinda in the dark - trying all of the different drive orders is reasonable, but checking every possible chunk size with every possible order is futile. For each, fsck it to see if you get anything promising.

So, that's that. Sorry for the novel, feel free to leave a comment if you have any questions, and good luck!

footnote: under 22 thousand characters; 8k+ shy of the length limit

27
  • 9
    That is one amazing answer. Commented Jan 8, 2012 at 8:55
  • 4
    I don't even know what to say... BRAVO!!! Kudos to Shane Madden. I am going to backup the disks and get started with your suggestions. Thanks, no really a big thanks!!! Commented Jan 8, 2012 at 9:55
  • 3
    I just...wow. Brilliant answer. I think the only answer to break the 30,000 character limit is Evan Andersons "How Does Subnetting Work" essay. Commented Jan 8, 2012 at 12:21
  • 4
    Best answer on SF ever as far as I'm concerned. Commented Jan 8, 2012 at 13:24
  • 14
    You, sir, win the internet. Commented Jan 8, 2012 at 20:23
6

I had a similar problem:
after a failure of a software RAID5 array I fired mdadm --create without giving it --assume-clean, and could not mount the array anymore. After two weeks of digging I finally restored all data. I hope the procedure below will save someone's time.

Long Story Short

The problem was caused by the fact that mdadm --create made a new array that was different from the original in two aspects:

  • different order of partitions
  • different RAID data offset

As it's been shown in the brilliant answer by Shane Madden, mdadm --create does not destroy the data in most cases! After finding the partition order and data offset I could restore the array and extract all data from it.

Prerequisites

I had no backups of RAID superblocks, so all I knew was that it was a RAID5 array on 8 partitions created during installation of Xubuntu 12.04.0. It had an ext4 filesystem. Another important piece of knowledge was a copy of a file that was also stored on the RAID array.

Tools

Xubuntu 12.04.1 live CD was used to do all the work. Depending on your situation, you might need some of the following tools:

version of mdadm that allows to specify data offset

sudo apt-get install binutils-dev git git clone -b data_offset git://neil.brown.name/mdadm cd mdadm make 

bgrep - searching for binary data

curl -L 'https://github.com/tmbinc/bgrep/raw/master/bgrep.c' | gcc -O2 -x c -o bgrep - 

hexdump, e2fsck, mount and a hexadecimal calculator - standard tools from repos

Start with Full Backup

Naming of device files, e.g. /dev/sda2 /dev/sdb2 etc., is not persistent, so it's better to write down your drives' serial numbers given by

sudo hdparm -I /dev/sda 

Then hook up an external HDD and back up every partition of your RAID array like this:

sudo dd if=/dev/sda2 bs=4M | gzip > serial-number.gz 

Determine Original RAID5 Layout

Various layouts are described here: http://www.accs.com/p_and_p/RAID/LinuxRAID.html
To find how strips of data were organized on the original array, you need a copy of a random-looking file that you know was stored on the array. The default chunk size currently used by mdadm is 512KB. For an array of N partitions, you need a file of size at least (N+1)*512KB. A jpeg or video is good as it provides relatively unique substrings of binary data. Suppose our file is called picture.jpg. We read 32 bytes of data at N+1 positions starting from 100k and incrementing by 512k:

hexdump -n32 -s100k -v -e '/1 "%02X"' picture.jpg ; echo DA1DC4D616B1C71079624CDC36E3D40E7B1CFF00857C663687B6C4464D6C77D2 hexdump -n32 -s612k -v -e '/1 "%02X"' picture.jpg ; echo AB9DDDBBB05CA915EE2289E59A116B02A26C82C8A8033DD8FA6D06A84B6501B7 hexdump -n32 -s1124k -v -e '/1 "%02X"' picture.jpg ; echo BC31A8DC791ACDA4FA3E9D3406D5639619576AEE2E08C03C9EF5E23F0A7C5CBA ... 

We then search for occurrences of all of these bytestrings on all of our raw partitions, so in total (N+1)*N commands, like this:

sudo ./bgrep DA1DC4D616B1C71079624CDC36E3D40E7B1CFF00857C663687B6C4464D6C77D2 /dev/sda2 sudo ./bgrep DA1DC4D616B1C71079624CDC36E3D40E7B1CFF00857C663687B6C4464D6C77D2 /dev/sdb2 ... sudo ./bgrep DA1DC4D616B1C71079624CDC36E3D40E7B1CFF00857C663687B6C4464D6C77D2 /dev/sdh2 /dev/sdh2: 52a7ff000 sudo ./bgrep AB9DDDBBB05CA915EE2289E59A116B02A26C82C8A8033DD8FA6D06A84B6501B7 /dev/sda2 /dev/sdb2: 52a87f000 ... 

These commands can be run in parallel for different disks. Scan of a 38GB partition took around 12 minutes. In my case, every 32-byte string was found only once among all eight drives. By comparing offsets returned by bgrep you obtain a picture like this:

| offset \ partition | b | d | c | e | f | g | a | h | |--------------------+---+---+---+---+---+---+---+---| | 52a7ff000 | P | | | | | | | 1 | | 52a87f000 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | P | | 52a8ff000 | | | | | | | P | 9 | 

We see a normal left-symmetric layout, which is default for mdadm. More importantly, now we know the order of partitions. However, we don't know which partition is the first in the array, as they can be cyclicly shifted.

Note also the distance between found offsets. In my case it was 512KB. The chunk size can actually be smaller than this distance, in which case the actual layout will be different.

Find Original Chunk Size

We use the same file picture.jpg to read 32 bytes of data at different intervals from each other. We know from above that the data at offset 100k is lying on /dev/sdh2, at offset 612k is at /dev/sdb2, and at 1124k is at /dev/sdd2. This shows that the chunk size is not bigger than 512KB. We verify that it is not smaller than 512KB. For this we dump the bytestring at offset 356k and look on which partition it sits:

hexdump -n32 -s356k -v -e '/1 "%02X"' P1080801.JPG ; echo 7EC528AD0A8D3E485AE450F88E56D6AEB948FED7E679B04091B031705B6AFA7A sudo ./bgrep 7EC528AD0A8D3E485AE450F88E56D6AEB948FED7E679B04091B031705B6AFA7A /dev/sdb2 /dev/sdb2: 52a83f000 

It is on the same partition as offset 612k, which indicates that the chunk size is not 256KB. We eliminate smaller chunk sizes in the similar fashion. I ended up with 512KB chunks being the only possibility.

Find First Partition in Layout

Now we know the order of partitions, but we don't know which partition should be the first, and which RAID data offset was used. To find these two unknowns, we will create a RAID5 array with correct chunk layout and a small data offset, and search for the start of our file system in this new array.

To begin with, we create an array with the correct order of partitions, which we found earlier:

sudo mdadm --stop /dev/md126 sudo mdadm --create /dev/md126 --assume-clean --raid-devices=8 --level=5 /dev/sdb2 /dev/sdd2 /dev/sdc2 /dev/sde2 /dev/sdf2 /dev/sdg2 /dev/sda2 /dev/sdh2 

We verify that the order is obeyed by issuing

sudo mdadm --misc -D /dev/md126 ... Number Major Minor RaidDevice State 0 8 18 0 active sync /dev/sdb2 1 8 50 1 active sync /dev/sdd2 2 8 34 2 active sync /dev/sdc2 3 8 66 3 active sync /dev/sde2 4 8 82 4 active sync /dev/sdf2 5 8 98 5 active sync /dev/sdg2 6 8 2 6 active sync /dev/sda2 7 8 114 7 active sync /dev/sdh2 

Now we determine offsets of the N+1 known bytestrings in the RAID array. I run a script for a night (Live CD doesn't ask for password on sudo :):

#!/bin/bash echo "1st:" sudo ./bgrep DA1DC4D616B1C71079624CDC36E3D40E7B1CFF00857C663687B6C4464D6C77D2 /dev/md126 echo "2nd:" sudo ./bgrep AB9DDDBBB05CA915EE2289E59A116B02A26C82C8A8033DD8FA6D06A84B6501B7 /dev/md126 echo "3rd:" sudo ./bgrep BC31A8DC791ACDA4FA3E9D3406D5639619576AEE2E08C03C9EF5E23F0A7C5CBA /dev/md126 ... echo "9th:" sudo ./bgrep 99B5A96F21BB74D4A630C519B463954EC096E062B0F5E325FE8D731C6D1B4D37 /dev/md126 

Output with comments:

1st: /dev/md126: 2428fff000 # 1st 2nd: /dev/md126: 242947f000 # 480000 after 1st 3rd: # 3rd not found 4th: /dev/md126: 242917f000 # 180000 after 1st 5th: /dev/md126: 24291ff000 # 200000 after 1st 6th: /dev/md126: 242927f000 # 280000 after 1st 7th: /dev/md126: 24292ff000 # 300000 after 1st 8th: /dev/md126: 242937f000 # 380000 after 1st 9th: /dev/md126: 24297ff000 # 800000 after 1st 

Based on this data we see that the 3rd string was not found. This means that the chunk at /dev/sdd2 is used for parity. Here is an illustration of the parity positions in the new array:

| offset \ partition | b | d | c | e | f | g | a | h | |--------------------+---+---+---+---+---+---+---+---| | 52a7ff000 | | | P | | | | | 1 | | 52a87f000 | 2 | P | 4 | 5 | 6 | 7 | 8 | | | 52a8ff000 | P | | | | | | | 9 | 

Our aim is to deduce which partition to start the array from, in order to shift the parity chunks into the right place. Since parity should be shifted two chunks to the left, the partition sequence should be shifted two steps to the right. Thus the correct layout for this data offset is ahbdcefg:

sudo mdadm --stop /dev/md126 sudo mdadm --create /dev/md126 --assume-clean --raid-devices=8 --level=5 /dev/sda2 /dev/sdh2 /dev/sdb2 /dev/sdd2 /dev/sdc2 /dev/sde2 /dev/sdf2 /dev/sdg2 

At this point our RAID array contains data in the right form. You might be lucky so that the RAID data offset is the same as it was in the original array, and then you will most likely be able to mount the partition. Unfortunately this was not my case.

Verify Data Consistency

We verify that the data is consistent over a strip of chunks by extracting a copy of picture.jpg from the array. For this we locate the offset for the 32-byte string at 100k:

sudo ./bgrep DA1DC4D616B1C71079624CDC36E3D40E7B1CFF00857C663687B6C4464D6C77D2 /dev/md126 

We then substract 100*1024 from the result and use the obtained decimal value in skip= parameter for dd. The count= is the size of picture.jpg in bytes:

sudo dd if=/dev/md126 of=./extract.jpg bs=1 skip=155311300608 count=4536208 

Check that extract.jpg is the same as picture.jpg.

Find RAID Data Offset

A sidenote: default data offset for mdadm version 3.2.3 is 2048 sectors. But this value has been changed over time. If the original array used a smaller data offset than your current mdadm, then mdadm --create without --assume-clean can overwrite the beginning of the file system.

In the previous section we created a RAID array. Verify which RAID data offset it had by issuing for some of the individual partitions:

sudo mdadm --examine /dev/sdb2 ... Data Offset : 2048 sectors ... 

2048 512-byte sectors is 1MB. Since chunk size is 512KB, the current data offset is two chunks.

If at this point you have a two-chunk offset, it is probably small enough, and you can skip this paragraph.
We create a RAID5 array with the data offset of one 512KB-chunk. Starting one chunk earlier shifts the parity one step to the left, thus we compensate by shifting the partition sequence one step to the left. Hence for 512KB data offset, the correct layout is hbdcefga. We use a version of mdadm that supports data offset (see Tools section). It takes offset in kilobytes:

sudo mdadm --stop /dev/md126 sudo ./mdadm --create /dev/md126 --assume-clean --raid-devices=8 --level=5 /dev/sdh2:512 /dev/sdb2:512 /dev/sdd2:512 /dev/sdc2:512 /dev/sde2:512 /dev/sdf2:512 /dev/sdg2:512 /dev/sda2:512 

Now we search for a valid ext4 superblock. The superblock structure can be found here: https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#The_Super_Block
We scan the beginning of the array for occurences of the magic number s_magic followed by s_state and s_errors. The bytestrings to look for are:

53EF01000100 53EF00000100 53EF02000100 53EF01000200 53EF02000200 

Example command:

sudo ./bgrep 53EF01000100 /dev/md126 /dev/md126: 0dc80438 

The magic number starts 0x38 bytes into the superblock, so we substract 0x38 to calculate the offset and examine the entire superblock:

sudo hexdump -n84 -s0xDC80400 -v /dev/md126 dc80400 2000 00fe 1480 03f8 cdd3 0032 d2b2 0119 dc80410 ab16 00f7 0000 0000 0002 0000 0002 0000 dc80420 8000 0000 8000 0000 2000 0000 b363 51bd dc80430 e406 5170 010d ffff ef53 0001 0001 0000 dc80440 3d3a 50af 0000 0000 0000 0000 0001 0000 dc80450 0000 0000 

This seems to be a valid superblock. s_log_block_size field at 0x18 is 0002, meaning that the block size is 2^(10+2)=4096 bytes. s_blocks_count_lo at 0x4 is 03f81480 blocks which is 254GB. Looks good.

We now scan for the occurrences of the first bytes of the superblock to find its copies. Note the byte flipping as compared to hexdump output:

sudo ./bgrep 0020fe008014f803d3cd3200 /dev/md126 /dev/md126: 0dc80400 # offset by 1024 bytes from the start of the FS /dev/md126: 15c80000 # 32768 blocks from FS start /dev/md126: 25c80000 # 98304 /dev/md126: 35c80000 # 163840 /dev/md126: 45c80000 # 229376 /dev/md126: 55c80000 # 294912 /dev/md126: d5c80000 # 819200 /dev/md126: e5c80000 # 884736 /dev/md126: 195c80000 /dev/md126: 295c80000 

This aligns perfectly with the expected positions of backup superblocks:

sudo mke2fs -n /dev/md126 ... Block size=4096 (log=2) ... Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872 

Hence the file system starts at the offset 0xdc80000, i.e. 225792KB from the partition start. Since we have 8 partitions of which one is for parity, we divide the offset by 7. This gives 33030144 bytes offset on every partition, which is exactly 63 RAID chunks. And since the current RAID data offset is one chunk, we conclude that the original data offset was 64 chunks, or 32768KB. Shifting hbdcefga 63 times to the right gives the layout bdcefgah.

We finally build the correct RAID array:

sudo mdadm --stop /dev/md126 sudo ./mdadm --create /dev/md126 --assume-clean --raid-devices=8 --level=5 /dev/sdb2:32768 /dev/sdd2:32768 /dev/sdc2:32768 /dev/sde2:32768 /dev/sdf2:32768 /dev/sdg2:32768 /dev/sda2:32768 /dev/sdh2:32768 sudo fsck.ext4 -n /dev/md126 e2fsck 1.42 (29-Nov-2011) Warning: skipping journal recovery because doing a read-only filesystem check. /dev/md126: clean, 423146/16654336 files, 48120270/66589824 blocks sudo mount -t ext4 -r /dev/md126 /home/xubuntu/mp 

Voilà!

2
  • 1
    Excellent walkthrough. One note - 53EF00000100 doesn't seem to be a valid anchor for EXT4 header. According to ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#The_Super_Block the two bytes after 53EF could be only 0100, 0200 or 0400. Commented Jul 23, 2016 at 11:43
  • I found this answer years ago, and from time to time, I go back to it as one might go back to read a good book once again. It's my all-time favorite StackExchange answer. Commented Aug 23, 2020 at 20:05
5

If you are lucky you might have some success with getting your files back with recovery software that can read a broken RAID-5 array. Zero Assumption Recovery is one I have had success with before.

However, I'm not sure if the process of creating a new array has gone and destroyed all the data, so this might be a last chance effort.

6
  • Thanks a lot Mark. I will give it a try. Do you know if it modifies the drives? If so I will make a disk copy and also try with other tools. Commented Jan 7, 2012 at 8:05
  • @Brigadieren - no, sorry, I'm not familiar enough with the intricacies of how RAID5 works. Commented Jan 7, 2012 at 9:58
  • @Brigadieren According to the mdadm documentation, the create process won't destroy data, just resync - but if it's chosen a geometry that didn't match with your original, then it may have destroyed data with the writing of new parity. If I have some free time later on I might see about re-creating your situation in a VM, to see if I can add any insight. I'd recommend grabbing full copies of the drives before attempting any recovery steps that write to the disks at all - you may want to look into getting extra drives to make copies. Commented Jan 7, 2012 at 18:35
  • I am just not sure what caused the sync - the fact that the array was degraded in the first place (due to power outage) or something else? I wonder if mdadm --create makes any distinction whether I specify the drive order differently than was originally given? Commented Jan 7, 2012 at 21:35
  • @Brigadieren Sync always occurs on create. Commented Jan 8, 2012 at 5:51
0

I had a similar issue. I formatted and reinstalled my OS/boot drive with a clean install of Ubuntu 12.04, then ran the mdadm --create... command and couldn't mount it.

It said it didn't have a valid superblock or partition.

Moreover, when I stopped the mdadm raid, I could no longer mount the regular device.

I was able to repair the superblock with mke2fs and e2fsck:

root@blackbox:~# mke2fs -n /dev/sdc1 mke2fs 1.42 (29-Nov-2011) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 91578368 inodes, 366284000 blocks 18314200 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 11179 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 

Then ran:

e2fsck -b 32768 -y /dev/sdc1 

That restored the superblock so I could mount and read the drive.

To get the array working without destroying the superblock or partitions I used build:

mdadm --build /dev/md0 --level=mirror --assume-clean --raid-devices=2 /dev/sdc1 missing 

After verifying the data, I will add the other drive:

mdadm --add /dev/md0 /sdd1 
0

I'm just updating some of the information given earlier. I had a 3-disk raid5 array working ok when my motherboard died. The array held /dev/md2 as the /home partition 1.2TB and /dev/md3 as the /var partition 300GB.

I had two backups of "important" stuff and a bunch of random things I had grabbed from various parts of the internet that I really should have gone through and selectively dumped. Most of the backups were broken into .tar.gz files of 25GB or less, and a separate copy of /etc was backed up also.

The rest of the filesystem was held on two small raid0 disks of 38GB.

My new machine was similar to the old hardware, and I got the machine up and running simply by plugging all five disks in and selecting an old generic kernel. So I had five disks with clean filesystems, though I could not be certain that the disks were in the right order, and needed to install a new version of Debian Jessie to be sure that I could upgrade the machine when needed, and sort out other problems.

With the new generic system installed on two Raid0 disks, I began to put the arrays back together. I wanted to be sure that I had the disks in the right order. What I should have done was to issue :

mdadm --assemble /dev/md3 -o --no-degraded --uuid=82164ae7:9af3c5f1:f75f70a5:ba2a159a 

But I didn't. It seems that mdadm is pretty smart and given a uuid, can figure out which drives go where. Even if the bios designates /dev/sdc as /sda, mdadm will put it together correctly (YMMV though).

Instead I issued: mdadm --create /dev/md2 without the --assume-clean, and allowed the resync on /dev/sde1 to complete. The next mistake I made was to work on /dev/sdc1 instead of the last drive in the /dev/md2, /sde1. Anytime mdadm thinks there is a problem it is the last drive that gets kicked out or re-synced.

After that, mdadm could not find any superblock, and e2fsck -n couldn't either.

After I found this page, I went through the procedure of trying to find the sequence for the drives (done), check for valid data (verified 6MB of a 9MB file), got the disks in the right sequence, cde, grabbed the uuid's of /md2 and /md3 from the old /etc/mdadm.conf and tried assembling.

Well, /dev/md3 started, and mdadm --misc -D /dev/md3 showed three healthy partitions, and the disks in the right order. /dev/md2 also looked good, until I tried to mount the filesystem.

# mdadm --create /dev/md2 --raid-devices=3 --level=5 --uuid=c0a644c7:e5bcf758:ecfbc8f3:ee0392b7 /dev/sdc1 /dev/sdd1 /dev/sde1 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Wed Feb 3 14:05:36 2016 mdadm: /dev/sdd1 appears to contain an ext2fs file system size=585936896K mtime=Thu Jan 1 01:00:00 1970 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Wed Feb 3 14:05:36 2016 mdadm: /dev/sde1 appears to contain an ext2fs file system size=585936896K mtime=Thu Jan 1 01:00:00 1970 mdadm: /dev/sde1 appears to be part of a raid array: level=raid5 devices=3 ctime=Wed Feb 3 14:05:36 2016 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md2 started. 

The filesystem refused to be mounted, and e2fsck couldn't find any superblocks. Further, when checking for superblocks as described above, the total block count found as a880 0076 or a880 0076 or 5500 1176 did not match the disk capacity size of 1199.79 reported my mdadm. Also none of the locations of the "superblocks" aligned with the data in the posts above.

I backed up all of /var, and prepared to wipe the disks. To see if it was possible to wipe just /md2, (I had nothing else to lose at this point) I dis the following:

root@ced2:/home/richard# mdadm --create /dev/md2 --raid-devices=3 --level=5 --uuid=c0a644c7:e5bcf758:ecfbc8f3:ee0392b7 /dev/sdc1 /dev/sdd1 /dev/sde1 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=3 ctime=Wed Feb 3 14:05:36 2016 mdadm: /dev/sdd1 appears to contain an ext2fs file system size=585936896K mtime=Thu Jan 1 01:00:00 1970 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=3 ctime=Wed Feb 3 14:05:36 2016 mdadm: /dev/sde1 appears to contain an ext2fs file system size=585936896K mtime=Thu Jan 1 01:00:00 1970 mdadm: /dev/sde1 appears to be part of a raid array: level=raid5 devices=3 ctime=Wed Feb 3 14:05:36 2016 Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md2 started. # mkfs.ext3 /dev/md2 mke2fs 1.42.12 (29-Aug-2014) Creating filesystem with 292902912 4k blocks and 73228288 inodes Filesystem UUID: a54e252f-78db-4ebb-b7ca-7dcd2edf57a4 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done # hexdump -n84 -s0x00000400 -v /dev/md2 0000400 6000 045d 5800 1175 7799 00df 6ff0 112e 0000410 5ff5 045d 0000 0000 0002 0000 0002 0000 0000420 8000 0000 8000 0000 2000 0000 10d3 56b2 0000430 10d3 56b2 0002 ffff ef53 0001 0001 0000 0000440 0c42 56b2 0000 0000 0000 0000 0001 0000 0000450 0000 0000 0000454 # ./bgrep 00605D0400587511 /dev/md2 /dev/md2: 00000400 /dev/md2: 08000000 /dev/md2: 18000000 /dev/md2: 28000000 /dev/md2: 38000000 /dev/md2: 48000000 /dev/md2: c8000000 /dev/md2: d8000000 /dev/md2: 188000000 /dev/md2: 288000000 /dev/md2: 3e8000000 /dev/md2: 798000000 /dev/md2: ab8000000 etc 

All seemed ok, except for the change to the uuid. So after a couple more checks, I wrote 600GB of backed up data onto /dev/md2. Then, unmounted and tried to re-mount the drive:

# mdadm --assemble /dev/md2 uuid=c0a644c7:e5bcf758:ecfbc8f3:ee0392b7 mdadm: cannot open device uuid=c0a644c7:e5bcf758:ecfbc8f3:ee0392b7: No such file or directory mdadm: uuid=c0a644c7:e5bcf758:ecfbc8f3:ee0392b7 has no superblock - assembly aborted 

Are you ********* kidding me? what about my 600GB on the file?

# mdadm --assemble /dev/md2 mdadm: /dev/md2 not identified in config file. 

Ah - easily fixed. uncommented one line in /etc/mdadm.conf

# mdadm --assemble /dev/md2 mdadm: /dev/md2 has been started with 3 drives. # e2fsck -n /dev/md2 e2fsck 1.42.12 (29-Aug-2014) /dev/md2: clean, 731552/73228288 files, 182979586/292902912 blocks 

Yippie!

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.