Recovering mdadm 4-disk RAID5 array with 2 out of date disks

Question

Edit:

The scenario in this wiki, where 1 drive has a slightly lower and another a significantly lower event count than the rest of the array, suggests assembling with --force while leaving out the oldest drive, and adding it (or a new one in case the disk is actually bad) back after the array assembled in a degraded state.

Would it make sense to do this in my situation, or is it more advisable to attempt a --force assemble with all 4 drives, given that the two out of date ones have the same event count?

Given my limited RAID knowledge I figured I'd ask about my specific situation before trying anything. Losing the data on these 4 drives wouldn't be the end of the world to me, but it'd still be nice to get it back.

I migrated a RAID5 array from an old machine to a new one without any problems at first. I used it for about 2 days until I noticed that 2 of the drives weren't listed in the BIOS boot screen. Since the array still assembled and worked fine after getting into linux I didn't think too much of it.

The next day the array stopped working, so I hooked up a PCI-e SATA card and replaced all my SATA cables. After that all 4 drives showed up in the BIOS boot screen so I'm assuming either my cables or SATA ports were causing the initial problem.

Now I'm left with a broken array though. mdadm --assemble lists two drives as (possibly out of date), and mdadm --examine shows 22717 events for the out of date drives and 23199 for the other two. This wiki entry suggests that an event count difference of <50 could be overcome by assembling with --force, but my 4 drives are separated by 482 events.

Below is all the relevant raid info. I was aware of all 4 drives having corrupt primary GPT tables before the array broke down, but since everything was working fine at the time I hadn't gotten around to fixing that yet.

`mdadm --assemble --scan --verbose`

mdadm: /dev/sde is identified as a member of /dev/md/guyyst-server:0, slot 2. mdadm: /dev/sdd is identified as a member of /dev/md/guyyst-server:0, slot 3. mdadm: /dev/sdc is identified as a member of /dev/md/guyyst-server:0, slot 1. mdadm: /dev/sdb is identified as a member of /dev/md/guyyst-server:0, slot 0. mdadm: added /dev/sdb to /dev/md/guyyst-server:0 as 0 (possibly out of date) mdadm: added /dev/sdc to /dev/md/guyyst-server:0 as 1 (possibly out of date) mdadm: added /dev/sdd to /dev/md/guyyst-server:0 as 3 mdadm: added /dev/sde to /dev/md/guyyst-server:0 as 2 mdadm: /dev/md/guyyst-server:0 assembled from 2 drives - not enough to start the array.

`mdadm --examine /dev/sd[bcde]`

/dev/sdb: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9 Name : guyyst-server:0 Creation Time : Wed Mar 27 23:49:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB) Array Size : 11720658432 (11177.69 GiB 12001.95 GB) Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=688 sectors State : clean Device UUID : 7ea39918:2680d2f3:a6c3b0e6:0e815210 Internal Bitmap : 8 sectors from superblock Update Time : Fri May 1 03:53:45 2020 Bad Block Log : 512 entries available at offset 24 sectors Checksum : 76a81505 - correct Events : 22717 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdc: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9 Name : guyyst-server:0 Creation Time : Wed Mar 27 23:49:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB) Array Size : 11720658432 (11177.69 GiB 12001.95 GB) Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=688 sectors State : clean Device UUID : 119ed456:cbb187fa:096d15e1:e544db2c Internal Bitmap : 8 sectors from superblock Update Time : Fri May 1 03:53:45 2020 Bad Block Log : 512 entries available at offset 24 sectors Checksum : d285ae78 - correct Events : 22717 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9 Name : guyyst-server:0 Creation Time : Wed Mar 27 23:49:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB) Array Size : 11720658432 (11177.69 GiB 12001.95 GB) Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=688 sectors State : clean Device UUID : 2670e048:4ebf581d:bf9ea089:0eae56c3 Internal Bitmap : 8 sectors from superblock Update Time : Fri May 1 04:12:18 2020 Bad Block Log : 512 entries available at offset 24 sectors Checksum : 26662f2e - correct Events : 23199 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9 Name : guyyst-server:0 Creation Time : Wed Mar 27 23:49:58 2019 Raid Level : raid5 Raid Devices : 4 Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB) Array Size : 11720658432 (11177.69 GiB 12001.95 GB) Used Dev Size : 7813772288 (3725.90 GiB 4000.65 GB) Data Offset : 264192 sectors Super Offset : 8 sectors Unused Space : before=264112 sectors, after=688 sectors State : clean Device UUID : 093856ae:bb19e552:102c9f77:86488154 Internal Bitmap : 8 sectors from superblock Update Time : Fri May 1 04:12:18 2020 Bad Block Log : 512 entries available at offset 24 sectors Checksum : 40917946 - correct Events : 23199 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : A.AA ('A' == active, '.' == missing, 'R' == replacing)

`mdadm --detail /dev/md0`

/dev/md0: Version : 1.2 Raid Level : raid0 Total Devices : 4 Persistence : Superblock is persistent State : inactive Working Devices : 4 Name : guyyst-server:0 UUID : 356cd1df:3a5c992d:c9899cbc:4c01e6d9 Events : 23199 Number Major Minor RaidDevice - 8 64 - /dev/sde - 8 32 - /dev/sdc - 8 48 - /dev/sdd - 8 16 - /dev/sdb

`fdisk -l`

The primary GPT table is corrupt, but the backup appears OK, so that will be used. Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: WDC WD40EFRX-68N Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 79F4A900-C9B7-03A9-402A-7DDE6D72EA00 Device Start End Sectors Size Type /dev/sdb1 2048 7814035455 7814033408 3.7T Microsoft basic data The primary GPT table is corrupt, but the backup appears OK, so that will be used. Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: WDC WD40EFRX-68N Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 43B95B20-C9B1-03A9-C856-EE506C72EA00 Device Start End Sectors Size Type /dev/sdc1 2048 7814035455 7814033408 3.7T Microsoft basic data The primary GPT table is corrupt, but the backup appears OK, so that will be used. Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: WDC WD40EFRX-68N Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 1E276A80-99EA-03A7-A0DA-89877AE6E900 The primary GPT table is corrupt, but the backup appears OK, so that will be used. Disk /dev/sde: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors Disk model: WDC WD40EFRX-68N Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 11BD8020-C9B5-03A9-0860-6F446D72EA00 Device Start End Sectors Size Type /dev/sde1 2048 7814035455 7814033408 3.7T Microsoft basic data

`smartctl -a -d ata /dev/sd[bcde]`

As pastebin since it exceeded the character limit: https://pastebin.com/vMVCX9EH

Sorry I can't help, but are you aware you are running RAID 5 out of spec when using 4tb drives? — davidgo
– davidgo, Commented May 1, 2020 at 20:35
@davidgo I was not, but after quick googling I couldn't find anything regarding drive size limitations in RAID5. Do you have a link to the specifications you're referring to? — guyyst
– guyyst, Commented May 1, 2020 at 21:09
Its not a hard drive limitation, its that the likelyhood of a second drive failure during a rebuild becomes very high, For hard drives, the sensible limited postulated is about 2tb / drive. Google "RAID 5 Maximum Size" - I found superuser.com/questions/912673/… - Another ZDNet article zdnet.com/article/why-raid-5-stops-working-in-2009 (because this is when 2TB drives became available) — davidgo
– davidgo, Commented May 1, 2020 at 23:14
Ah, from your first comment it sounded like a specific limitation above X terrabytes. I found those exact articles mentioning the issues with bigger disks in a RAID5 array as well. When I initially set this up I addmitedly went for the cheapest redundancy option available, without researching the dangers of large drives in RAID5. I was planning on moving this data to a RAID6 or RAID 1 anyways, but right now I'd just like to get this working. — guyyst
– guyyst, Commented May 1, 2020 at 23:39

Bernhard · Accepted Answer · 2020-05-06 10:03:42Z

Generally speaking, you must expect data loss in this situation. Two out of your four disks were ejected out of the RAID at roughly the same point on time. When assembled back, you will have a corrupt file system.

If possible, I would only experiment futher after dd-ing all disks as a backup to start over.

Using all 4 disks will allow you to identify which blocks differ (as there the checksum will not match), but it will not help you to compute a correct state. You could start checkarray after a forced re-assembly of all 4 and find the number of inconsistent blocks afterwards in /sys/block/mdX/md/mismatch_cnt. This may or may not be interesting to estimate the "degree of brokenness" of the file system.

Re-building the array can only use information from three disks to re-calculate parity. As the ejected disks have the same event count, using either of the ejected disks should result in the same (partially wrong) partity information to be re-computed.

The data on the array is not worth the $400+ it would cost to dd everything to new drives, so I went ahead and force assembled all four just now. The array came up fine and the event counts synchronized. mismatch_cnt reports 0 after running checkarray, and fsck didn't show any problems either. I assume this doesn't guarantee I avoided any corruption, but I'm able to copy the most important data to an external drive right now. Thanks! :) — guyyst
– guyyst, Commented May 6, 2020 at 16:24

Stack Exchange Network

Recovering mdadm 4-disk RAID5 array with 2 out of date disks

Edit:

`mdadm --assemble --scan --verbose`

`mdadm --examine /dev/sd[bcde]`

`mdadm --detail /dev/md0`

`fdisk -l`

`smartctl -a -d ata /dev/sd[bcde]`

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Recovering mdadm 4-disk RAID5 array with 2 out of date disks

Edit:

mdadm --assemble --scan --verbose

mdadm --examine /dev/sd[bcde]

mdadm --detail /dev/md0

fdisk -l

smartctl -a -d ata /dev/sd[bcde]

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions

`mdadm --assemble --scan --verbose`

`mdadm --examine /dev/sd[bcde]`

`mdadm --detail /dev/md0`

`fdisk -l`

`smartctl -a -d ata /dev/sd[bcde]`