1

I've been running zfs pool on ubuntu problem free for years. currently on 20.04

since around beginning of this year I've had to replace 2 out of 4 disks and even then brand new disks started showing errors.

started scrubbing it weekly and the things were kinda stable. 20-50 errors read and/or write errors would appear on some disks and scrub would fix them.

few days ago however a disk was faulted for too many errors. then second one degraded. running scrub made things worse.

triggered scrub today then realized disks may be too hot, shut down the pc to adjust fans, started again and zpool status shows this:

 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Jun 19 18:44:07 2021 1.51T scanned at 2.74G/s, 1.29T issued at 2.35G/s, 3.04T total 2.76G resilvered, 42.42% done, 0 days 00:12:44 to go config: NAME STATE READ WRITE CKSUM ztank DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 ata-ST2000LM003_HN-M201RAD_S34RJ9AFB25570 DEGRADED 0 0 0 too many errors ata-ST2000LM003_HN-M201RAD_S362J9EGB75740 ONLINE 0 0 0 (resilvering) mirror-1 ONLINE 0 0 0 ata-ST2000DM008-2FR102_ZFL3P2SZ ONLINE 0 0 0 ata-TOSHIBA_HDWL120_807APRBUT ONLINE 0 0 0 (resilvering) logs zfs_slog ONLINE 0 0 0 cache zfs_l2arc ONLINE 0 0 0 errors: No known data errors 

I'm really shocked what's going on

3
  • 1
    Have you been using the same power supply? Commented Jun 19, 2021 at 20:17
  • yes. haven't changed it in more than an year Commented Jun 19, 2021 at 20:44
  • 1
    Wait for the resilver to complete. Commented Jun 19, 2021 at 21:18

2 Answers 2

1

Well, looks like you answered yourself - disks were too hot so they started failing. See if you can recover from that degraded state.

Also, check your RAM. Do full memtest. If they are ok, check SATA cables too. Check all SMART stats and to test=long on all of them via smartctl. And never overheat your HDDs.

0

Turns out the problem was with the way I powered my drives. I have, without noticing put too many drives on single power rail. Once I distributed them evenly across the power rails, everything went back to normal.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.