2

I have the following zpool configuration:

zpool status NAME STATE READ WRITE CKSUM data ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da0 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da2 ONLINE 0 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 0 0 da6 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 da21 ONLINE 0 0 1 (repairing) da14 ONLINE 0 0 0 da22 ONLINE 0 0 0 da23 ONLINE 0 0 0 da13 ONLINE 0 0 0 da9 ONLINE 0 0 0 da12 ONLINE 0 0 0 da20 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 da11 ONLINE 0 0 0 da18 ONLINE 0 0 0 da8 ONLINE 0 0 0 da10 ONLINE 0 0 0 da15 ONLINE 0 0 0 da16 ONLINE 0 0 0 da17 ONLINE 0 0 0 da19 ONLINE 0 0 0 

da21 has 1 CKSUM fail every ~2 weeks. Do i need to replace it already, or should i wait till there are more errors? I am rather on the cautious side, but don't want to replace a perfectly healthy disk either.

To actually do it, are the following steps correct? The official guide (https://docs.oracle.com/cd/E23823_01/html/819-5461/gbbvf.html#gbcet) has some steps in between that depend on "cfgadm" but i don't have that on freebsd, so I rather make sure I am not doing something stupid before I start getting my hands dirty.

zpool offline da21 <*physically replace device*> zpool online daXX zpool replace data da21 daXX 

Can I do this during a scrub, or should i wait for the scrub to finish/stop it?

Any help greatly appreciated :)

1
  • For anyone wondering, it worked as above. The command to replace it was "zpool replace data da21 da21". Thanks for the helpful suggestions nonetheless. Commented Oct 22, 2018 at 9:10

2 Answers 2

1

da21 has 1 CKSUM fail every ~2 weeks. Do i need to replace it already, or should i wait till there are more errors? I am rather on the cautious side, but don't want to replace a perfectly healthy disk either.

I would first replace the cable and check if the problem persists in another bay/another enclosure/another controller (if possible on your setup). I have had most checksum errors in one of those situations. Failing disks usually show themselves with read or write errors.

It also would essentially be free in comparison to another full disk replacement, which you can still do if the errors persist (especially on Z2 or Z3, where the pressure to act is much lower). Of course, if you cannot take the slightest risk, you should not do that - but in this case, you would already use Z3 or multiple mirrors, wouldn't you? And you still have a current and verified backup ready, so the risk is very small.

Can I do this during a scrub, or should i wait for the scrub to finish/stop it?

If there's time, I would always wait out the scrub. You then can be sure that other disks in the vdev do not also exhibit hidden errors which could lead to serious damage if not found out before removing the disk (depending on your level of additional disks).

If there is no time, simply cancel the scrub with zpool scrub -s <poolname>.

1

I'd replace it now, personally. No point worrying about it failing later, and then having to scramble to get a replacement.

Are they hot swap? I'd just pop out the old one and pop in the new one. If you can avoid doing it while a scrub is running, then do so.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.