0

I have a ceph cluster quincy 17.2.7

I wonder if ceph has some tools to quickly get the the used space (size in bytes) of a directory in a ceph file system. I know one can get that using du -hs /fs/dir-A, but that takes long if the directory has many data.

Note that I was initially planning to create filesystem-A and filesystem-B (in poolA and poolB respectively), instead or dir-A and dir-B, which would facilitate the size retrieval. However, having multiple file systems, even one per pool, is not advised for snapshot purposes https://docs.ceph.com/en/quincy/dev/cephfs-snapshots/#multi-fs

any idea how can I quickly get the used space of a directory in a ceph fs?

Thanks!

4
  • Keep in mind that Ceph is an object storage system, and directories in Ceph are essentially object prefixes. Each object has its own size, and directories don't have a size on their own. Commented Dec 4, 2023 at 7:47
  • Have you checked ceph fs status or the dashboard information? Maybe that suffices your demands. Commented Dec 5, 2023 at 9:00
  • @eblock, ceph fs status throws an error, here the last part:File "/usr/share/ceph/mgr/status/module.py", line 234, in handle_fs_status assert metadata AssertionError . The dashboard/GUI -> File Systems shows the directory tree, but not used space per dir. Commented Dec 6, 2023 at 10:08
  • I found in spinics.net/lists/ceph-users/msg75529.html how to fix the error I posted above. However, ceph fs status does not show how much space a directory in the fs uses., instead, the command shows the full space used by the fs Commented Dec 6, 2023 at 11:06

2 Answers 2

0

I had a further look of how to get the size per directory in a ceph fs, and I don't think there is such a tool (again, using linux tools as du is not an option as it takes long time to traverse a directory)

It seems to me the best way to go for this is create subvolumes, so each directory would be subvolume

0

There isn't a standard *nix command for this, but CephFS supports "recursive statistics" to expose that information. This is harder to find in the documentation than I thought it was, but here's a blog post about viewing them in CephFS' "virtual xattrs": https://blog.widodh.nl/2015/04/playing-with-cephfs-recursive-statistics. The "ceph.dir.rbytes" are the sum total of all file sizes underneath the directory in the hierarchy; similarly there are "rsubdirs" and "rfiles" (which sum to "rentries").

You can also set a mount option (userspace kernel) to make the directory size into the rbytes (ie, when you "ls" it, which will normally report 512 bytes or 4k). Doing that causes trouble with some tools, though, as they don't expect directory sizes to change like that or may inspect them to try and identify the local block size.

There are a few caveats to using rstats for precise information:

  1. File sizes account only for the specified size of a file, not the amount of space actually allocated. If you write 1 byte in a sparse file at offset 1GB, it will report 1GB.
  2. Updating the statistics require taking locks on inodes and directories, which can be intrusive to client IO in some cases. For this reason, the information is propagated up the tree (from file to directory, to parent directory, etc) lazily, as those locks are mutated for other reasons. It won't be an hour out of date, but it can definitely be ten seconds old.
1
  • Thank you for the explanation and the link, that helps Commented Dec 8, 2023 at 12:11

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.