Discussion:
Inner workings of nginx cache manager
Stefan Safar
2018-11-29 14:23:51 UTC
Permalink
Hi there,

I'd like to know a little bit more about the inner workings of the cache
manager. I looked through the code, and if I understand it correctly, when
it runs out of disk space specified by max_size, it tries to run the cache
manager, which looks at a queue of last-accessed URLs and tries to remove
the least used URL from the queue and from the disk.

My question is, whether the queue is somehow persisted on disk, or I
misunderstood something? What I'm trying to know is what happens when an
nginx instance runs out of disk space and it's restarted - how does nginx
know what it should or shouldn't delete? I don't think I saw any code that
would scan through the disk and that would be a rather slow way to deal
with this.

Thanks,

Stefan Safar
Maxim Dounin
2018-11-29 14:50:45 UTC
Permalink
Hello!
Post by Stefan Safar
Hi there,
I'd like to know a little bit more about the inner workings of the cache
manager. I looked through the code, and if I understand it correctly, when
it runs out of disk space specified by max_size, it tries to run the cache
manager, which looks at a queue of last-accessed URLs and tries to remove
the least used URL from the queue and from the disk.
My question is, whether the queue is somehow persisted on disk, or I
misunderstood something? What I'm trying to know is what happens when an
nginx instance runs out of disk space and it's restarted - how does nginx
know what it should or shouldn't delete? I don't think I saw any code that
would scan through the disk and that would be a rather slow way to deal
with this.
The LRU queue is only mantained in memory. When nginx is
restarted, details of which cache items were access last are lost,
and nginx will have to remove some items due to max_size reached,
it will remove mostly arbitrary items unless they were accessed
after the restart.

Note though that there is a code which scans though the disk to
find out which items are in the cache (and how much space they
take). The cache loader process does this, see
http://nginx.org/r/proxy_cache_path for a high level description
of how it works.
--
Maxim Dounin
http://mdounin.ru/
Stefan Safar
2018-11-30 12:26:27 UTC
Permalink
Hi Maxim,

thanks a lot for the clarification!

So the process/thread that scans through the files on disk need to read the
all the file headers to find the KEY for the all cache files to keep the
information in memory before it starts deleting anything, is that correct?

It would be great if I could specify an option which would tell the cache
manager that a whole drive is being used as cache, which would make the
cache manager able to cut down the time it takes before deleting stuff from
a huge drive from days to seconds. Does that make sense?

Stefan Safar
Post by Maxim Dounin
Hello!
Post by Stefan Safar
Hi there,
I'd like to know a little bit more about the inner workings of the cache
manager. I looked through the code, and if I understand it correctly,
when
Post by Stefan Safar
it runs out of disk space specified by max_size, it tries to run the
cache
Post by Stefan Safar
manager, which looks at a queue of last-accessed URLs and tries to remove
the least used URL from the queue and from the disk.
My question is, whether the queue is somehow persisted on disk, or I
misunderstood something? What I'm trying to know is what happens when an
nginx instance runs out of disk space and it's restarted - how does nginx
know what it should or shouldn't delete? I don't think I saw any code
that
Post by Stefan Safar
would scan through the disk and that would be a rather slow way to deal
with this.
The LRU queue is only mantained in memory. When nginx is
restarted, details of which cache items were access last are lost,
and nginx will have to remove some items due to max_size reached,
it will remove mostly arbitrary items unless they were accessed
after the restart.
Note though that there is a code which scans though the disk to
find out which items are in the cache (and how much space they
take). The cache loader process does this, see
http://nginx.org/r/proxy_cache_path for a high level description
of how it works.
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
http://mailman.nginx.org/mailman/listinfo/nginx
Maxim Dounin
2018-11-30 12:58:13 UTC
Permalink
Hello!
Post by Stefan Safar
So the process/thread that scans through the files on disk need to read the
all the file headers to find the KEY for the all cache files to keep the
information in memory before it starts deleting anything, is that correct?
No, cache loader only scans which files are present in the cache
(and their sizes), it doesn't try to read them. Raw keys as
stored in cache file headers are only needed for a safety check to
make sure there are no MD5 collissions between different keys, and
this check only happens when returning an actual response from the
cache.
--
Maxim Dounin
http://mdounin.ru/
Stefan Safar
2018-11-30 13:05:22 UTC
Permalink
Hi!

So the cache loader only does something like stat() during the filesystem
walk, which should be fairly fast, unless you have tens/hundreds of
millions of files in cache.

Thanks again!

Stefan
Post by Maxim Dounin
Hello!
Post by Stefan Safar
So the process/thread that scans through the files on disk need to read
the
Post by Stefan Safar
all the file headers to find the KEY for the all cache files to keep the
information in memory before it starts deleting anything, is that
correct?
No, cache loader only scans which files are present in the cache
(and their sizes), it doesn't try to read them. Raw keys as
stored in cache file headers are only needed for a safety check to
make sure there are no MD5 collissions between different keys, and
this check only happens when returning an actual response from the
cache.
--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
http://mailman.nginx.org/mailman/listinfo/nginx
Loading...