ZFS: slow read of empty sparse zvol

Question

I'm using RAID-Z2 with 8x8TB disks. Pool almost entirely empty. I created a zvol like this:

zfs create -V 1T -s -b 4096 "$ZVOL" zfs set compression=off "$ZVOL" zfs set logbias=throughput tank/fragger

Before writing any data to the zvol, when I read it, dd reports only 1.7GB/s dd if=/dev/zvol/"$ZVOL" bs=1M count=1048576 of=/dev/null

I've noticed that many zvol_txq kernel threads are running, and that the read appears to be CPU-bound. With ioload I can confirm ~13gbps at zd0, and that the physical underlying disks aren't really being accessed.

I recognize this is a wildly synthetic benchmark. Is it unreasonable to expect similar read speed to /dev/zero(43GB/s) in this case? There seems to be much more going on here. Why is it so slow?

1.7 GB/sec is anything but slow. When you read from /dev/zero you actually read from memory, you cannot compare this speed to the read speed from actual block device. — drookie
– drookie, Commented Aug 17 at 8:20
Did you monitor the processor load? Because on every read you calculate checksums. Also you need to read from all the disks on the same time (checksums), So the speed is pretty good. And BTW try in dd to set the block size which you define in zfs command and test again — Romeo Ninov
– Romeo Ninov, Commented Aug 17 at 8:38
@drookie, it's slow compared to reading /dev/zero. I think that's not entirely irrelevant, as the zvol has no data blocks on the block device. — rsaxvc
– rsaxvc, Commented Aug 17 at 19:44
In addition to what @RomeoNinov posted, remember that ZFS is not a fast file system (no it's NOT). XFS can be fast, especially the original non-crippleware on true SGX Irix systems. IBM GPFS is fast. Sun/Oracle had a fast fileystem - QFS - but it was EOL'd about 5 years ago. True high-performance file systems are dying out as hardware becomes faster and faster and "cloud computing" becomes more and more the norm where actual performance tuning of systems is replaced with approaches like just throwing K8S clusters with parallel containers at problems. Try profiling Filebeat. 🤮 — Andrew Henle
– Andrew Henle, Commented Aug 17 at 20:11
@rsaxvc, IMHO you do not read from filesystem, it is RAW device, so having or not data is irrelevant. Try to check with vmstat 1 10 during dd run. — Romeo Ninov
– Romeo Ninov, Commented Aug 17 at 20:55

shodanshok · Accepted Answer · 2025-08-19 09:52:10Z

You created a 4K ZVOL on a RAIDZ2 array. This will cause a very high overhead both in computation (zvol reading threads, high metadata load) and space (each 4K block will be written 3 times, basically the same as a 3-way mirror).

Please re-create your ZVOL with a more reasonable volblocksize, ranging from 16K (zvol default) to 128K (dataset default). For specific workloads even bigger size (as 1M) can be considered, but you should avoid small random I/O as much as possible.

No blocks have been written. I'll repeat with larger block size. — rsaxvc
– rsaxvc, Commented Aug 19 at 22:20

John Mahowald · Accepted Answer · 2025-08-18 19:32:10Z

If what you wanted was all zeros, or to test memory performance, read from /dev/zero. Its performance does not compare directly to permanent storage file systems.

ZFS is designed for integrity, large volumes, and copy on write behavior. /dev/zero is implemented by copying a constant directly to userspace memory, by its simplicity and never touching block devices it is fast.

For taking a copy of an entire dataset, there are ZFS snapshots. These are documented in many tutorials, no need to go into detail here.

Spindles or SSDs are in general hundreds to thousands of times slower than DRAM, but this is not the only reason why your empty file system is slower. ZFS is just doing more things.

Whether reading from a snapshot or not, the reads still go through several ZFS subsystems, eventually physical I/O to block devices happens, and it is checksummed. There are several performance optimizations, including read ahead and caching. All of this takes significant work to hit gigabyte per second speed, even if the physical reads are mostly metadata of unallocated blocks.

Your microbenchmark could vary a lot depending on if the zfs dataset was compressed or not, and then you wrote zeros to it. Obviously, 100% zeros compresses perfectly, even with just zero-length encoding algorithm.

Many filesystems including ext4 lack data checksums. A more fair comparison would checksum that read, perhaps pipe through sha256sum program or whichever algorithm is in use. Although that still isn't quite the same implementation.

/dev/zero only has to copy zeros to the output buffer. No physical disks, no metadata, no checksums, just zeros written to memory.

As always, sampling what exactly is on CPU is relatively simple. On Linux, there is perf record can make some nice flamegraph visualizations from that.

"ZFS is just doing more things." - I'd like to understand more about that. The zvol is unpopulated. No significant physical IO occurs during dd. Is it checksumming the void? — rsaxvc
– rsaxvc, Commented Aug 18 at 21:20
My answer gave you a tool to find out what is on CPU, perf record. I am not going to test everything ZFS does with unallocated blocks because an all zeros read does not accomplish much when /dev/zero. (and zpool initialize) exist. — John Mahowald
– John Mahowald, Commented Aug 19 at 12:51
most of the time seems spent copying things(blocks probably) in the kernel. — rsaxvc
– rsaxvc, Commented Aug 23 at 13:40

Stack Exchange Network

ZFS: slow read of empty sparse zvol

2 Answers 2

You must log in to answer this question.

Hot Network Questions

ZFS: slow read of empty sparse zvol

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions