10

I am dealing with hundred million files in a filesystem (distributed among a lot of subdirectories), and I need to be able to list them very quickly, particularly in order to rsync them efficiently.

On a other hand, I don't really need to have the actual content of the file kept in cache.

I am constantly adding and removing files, but not that frequently (something like ten times per second).

Is there a way I can tell the OS (2.6.18-194.el5) to use the 24GB available RAM more on inode caching than on file caching? I already looked at /proc/etc/vm/vfs_cache_pressure but it doesn't seem to be exactly what I am looking for...

3 Answers 3

7

How about lowering the value for vfs_cache_pressure? According to Documentation for /proc/sys/vm/* this should do what you want:

vfs_cache_pressure

This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are.

3

Complementary to seeker's answer I would like to add/highlight a few things about vfs_cache_pressure.

In short it

influences the tendency the system reclaims memory for caching of VFS caches, versus pagecache and swap.

-> doc of /proc/sys/vm/

Some important values:

  • =0: The Kernel will never reclaim memory
  • =100: Reclaim at a "fair" rate (= default)

In order to apply the changes temporarily adapt the value in:

$ cat /proc/sys/vm/vfs_cache_pressure 15 

For a permanent change (applied during reboot):

Either add a line to /etc/sysctl.conf or (better) create a new file in /etc/sysctl.d/*.conf. E.g.:

$ cat 10-vfs-cache-pressure.conf vm.vfs_cache_pressure=10 

For me decreasing it to somewhere between 10 to 15 resulted in good performance, however this depends of course a lot on your system, amount of users and services running on it. I think there is no substitute other than playing around and carefully looking at the impact.


You might what to have a deeper look into slabtop.

slabtop will help you investigating the consequences when changing this and related kernel parameters (dentry and *inode_cache are what you are most probably looking for -> this answer might also help here).

2

you can use these 2 commands to do the same job.

Updatedb (to update the list of file and folders location in whole drive)

locate / (to list all files in the whole OS, which is lightening fast as it picks them up from the Database)

4
  • Yup, I run updatedb from cron to keep the inode cache 'warm.' It works well, take a look at your slabtop to see statistics. Also, running strace -c updated can give you some insight as far as how much gets updated. Commented Dec 6, 2011 at 13:00
  • 1
    Thank you, but how does that help me improving rsync performance?edit: Ok, just read Marcin's comment, but regarding the warmness of the cache, how is it different from running a simple find /? Commented Dec 6, 2011 at 13:03
  • In some implementations, the updatedb shell script actually runs find / to get the initial file list. On Mac OS X it still works this way. Commented Dec 6, 2011 at 14:38
  • 5
    The OP scans the filesystem multiple times per second, anyway. I don't see any advantage in scanning it even more often. Commented Oct 6, 2015 at 23:57

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.