9

Just for test, I created and mounted the same XFS file system on two hosts based on a shared device (pmem).Host A created a file in its mounted directory and executed the sync command to ensure that xfs_db can see the newly created inode information. However, this new file is not visible on Host B until host B umount the filesystem. I would like to know why.

I noticed that the ls system call(getdents) eventually called xfs_readdir(), which uses XFS's on-disk format to get inode information. However, does this process access the disk, or are some of the metadata for xfs_inode cached in memory when the file system is mounted?

2 Answers 2

11

What you're seeing is absolutely normal because XFS isn’t built for multi-host setups, it’s no clustered file system. By design!

In a nutshell:

Host A updates its in-memory metadata and syncs it to disk, but Host B doesn’t know about it since they don’t share cache updates. Caches aren’t coherent, so xfs_readdir() mostly pulls from memory unless it has to hit the disk, so Host B’s view stays stale.

Unmounting on Host B forces it to refresh from disk, which is why the file shows up.

Bottom line:

XFS doesn’t handle multi-host mounts. If you need that, roll with a clustered file system like GFS2 or OCFS2. You can bring in a distributed lock manager like say SANlock. Or you can stick with a ‘network redirector’ like NFS or SMB3, which is a safest way to go, hands down! There's some good reading on topic, see:

https://forums.starwindsoftware.com/viewtopic.php?t=1392

Hint:

Just ignore iSCSI and StarWind context, everything these guys are talking about is completely relevant to any shared block storage vendor.

3
  • 1
    Thanks for the answer. I have more frew questoins: xfs_readdir() on Host B mostly pulls from Caches in memory even that Host A updates the data on disk. But it didn't work after i dropped caches on Host B (echo 3 > /proc/sys/vm/drop_caches). I still cannot see the new file on Host B. Does it make sense? Commented Jan 17 at 2:46
  • 2
    Yes, it does. You’re exploiting the code path not designed to be used the way you use it. Commented Jan 17 at 5:20
  • 2
    Please enjoy! :) Commented Jan 17 at 6:24
4

You’re seeing perfectly normal aspects of shared storage access without using a filesystem designed for that usage (currently on mainline Linux, the options for that are GFS2 and OCFS2, neither of which is great).

To answer your title question, any non-clustered filesystem on Linux caches directory entries and inodes (and, notably, also the filesystem superblock), because the VFS layer itself does that. This really has nothing to do with XFS itself here. XFS may be doing additional caching, but what you demonstrated is behavior you would also see with ext4, BTRFS, F2FS, and essentially any other Linux filesystem except for GFS2 and OCFS2.

Expanding on this a bit more, essentially any filesystem not designed for shared storage access assumes it has exclusive access to the underlying storage device when mounted. This is really important for performance reasons, because it allows things that are not expected to change to just be cached, which eliminates a lot of unnecessary storage accesses.

This, in turn, leads to coherency issues in shared storage access situations like what you are doing. What you saw is actually a best case result, the worst case is that one or more of the hosts crashes due to some bug in the filesystem driver that results from not accounting for some state of the filesystem in persistent storage resulting from seeing a torn write.

If you need multiple systems to access the same storage, you need to instead use one of:

  • A clustered filesystem like GFS2 or OCFS2.
  • A network filesystem like NFS or SMB3, backed by non-shared storage.
  • A distributed storage system like Ceph.
7
  • 3
    It’s duplicated reply post pretty much. Commented Jan 16 at 12:40
  • 1
    If I'm not mistaken, it also means that if the two hosts write to the same filesystem, that could lead to coherence issues, overwrites, and more. In short: do not do it! Commented Jan 16 at 12:43
  • 3
    If you don't have a cluster-aware file system, then... YES! Commented Jan 16 at 12:56
  • 2
    Lots of technical issues. Ceph would still require a clustered file system. Network redirectors can live fine with shared storage backend. Commented Jan 16 at 19:25
  • 2
    Yeah, and that’s is a problem. Because Ceph is just shared block, and you put XFS on top of it to get the same issues. Commented Jan 17 at 5:18

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.