- MTBF is remarkably broken when you have a lot of drives with similar origins. In statistics we'd call it a sampling bias, because of the similarity in your samples the averaging effects will tend to be less useful. If there's a fault with the batch or even with the design itself, and it happens more often than you'd think, then drives from that batch will fail sooner than MTBF would suggest.
MTBF is remarkably broken when you have a lot of drives with similar origins. In statistics we'd call it a sampling bias, because of the similarity in your samples the averaging effects will tend to be less useful. If there's a fault with the batch or even with the design itself, and it happens more often than you'd think, then drives from that batch will fail sooner than MTBF would suggest.
If the drives are spread out, you might get [50%, 90%, 120%, 200%] of MTBF, but if all the drives come from that 50% batch you've got a mess on your hands.
Raid array reassembly kills disks. No, really. If you get a drive failure and the array rebuilds, it's going to put extra load on the other drives while it scans the data off them. If you have a drive close to failure the rebuild may well take it out, or it may already have a failure location that you just weren't aware of because that section hadn't been read recently.
If you've got a lot of drives from the same batch, the chances of this kind of cascade failure occurring are much higher than the chances if they're different. You can mitigate this by having regular patrol scans, scrubs, resilvering, whatever the recommended practice is for the type of array you're using, but the downside to that is that it will impact performance and can takes hours to complete.
If the drives are spread out, you might get [50%, 90%, 120%, 200%] of MTBF, but if all the drives come from that 50% batch you've got a mess on your hands.
- Raid array reassembly kills disks. No, really. If you get a drive failure and the array rebuilds, it's going to put extra load on the other drives while it scans the data off them. If you have a drive close to failure the rebuild may well take it out, or it may already have a failure location that you just weren't aware of because that section hadn't been read recently.
If you've got a lot of drives from the same batch, the chances of this kind of cascade failure occurring are much higher than the chances if they're different. You can mitigate this by having regular patrol scans, scrubs, resilvering, whatever the recommended practice is for the type of array you're using, but the downside to that is that it will impact performance and can takes hours to complete.