We are using 15TB disks on our RHEL 8.6 Linux servers. These disks are for the HDFS filesystem. Compared to other disks like 4TB or 8TB, we are noticing some degradation in the logs of the datanode. We have checked many things to understand the difference between Hadoop clusters using 4TB or 8TB disks and the newer Hadoop cluster with 16TB disks. After searching on Google, we noticed that the filesystem created on the disks is ext4. I am wondering if ext4 can be used on large disks like 15TB. So my question is: Does ext4 support very large disks like 15TB, or is it better to use XFS on disks of 15TB?
1 Answer
The maximum supported size of ext4 filesystem is 1000TB. So the problem is not there.
If you keep many small files and many directories in HDFS this can be a challenge for ext4 and XFS is better in this direction. By many I mean >10 million of files and 100k directories (on two or more layers).
If you keep in HDFS large files (larger than 1GB) also XFS is better choice.
IMHO the many difference between 4/8 TB and 16TB is you can create/store much more files on bigger disks and reach kind of bottleneck.
- can you explain this "IMHO the many difference between 4/8 TB and 16TB is you can create/store much more files on bigger disks and reach kind of bottleneck"King David– King David2025-04-22 18:36:33 +00:00Commented Apr 22 at 18:36
- just to explain what we have, we have 365 workers machines, and each worker machine have 12 disks with 16T so all disks on all workers gives you the HDFS storageKing David– King David2025-04-22 18:38:53 +00:00Commented Apr 22 at 18:38
- @KingDavid, on 16 TB you can store 4 times more info compared to 4TB. And with ext4 4 times more files will increase find file at least 2 times.Romeo Ninov– Romeo Ninov2025-04-22 19:20:03 +00:00Commented Apr 22 at 19:20