14

We've been running into the OOM killer despite nearly half of our memory being used for the FS cache. We've been logging memory stats once per minute (as reported by top), but there seems to be plenty of availability.

... Mem: 15339640k total, 15268304k used, 71336k free, 3152k buffers Swap: 0k total, 0k used, 0k free, 6608384k cached Mem: 15339640k total, 14855280k used, 484360k free, 13748k buffers Swap: 0k total, 0k used, 0k free, 6481852k cached [OOM killer: postgres killed] Mem: 15339640k total, 8212200k used, 7127440k free, 32776k buffers Swap: 0k total, 0k used, 0k free, 2394444k cached ... 

OOM details from syslog:

... Jun 10 05:45:25 db kernel: [11209156.840462] wal-e invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Jun 10 05:45:25 db kernel: [11209156.840469] wal-e cpuset=/ mems_allowed=0 Jun 10 05:45:25 db kernel: [11209156.840474] Pid: 7963, comm: wal-e Not tainted 3.2.0-43-virtual #68-Ubuntu Jun 10 05:45:25 db kernel: [11209156.840477] Call Trace: Jun 10 05:45:25 db kernel: [11209156.840498] [<ffffffff81119711>] dump_header+0x91/0xe0 Jun 10 05:45:25 db kernel: [11209156.840502] [<ffffffff81119a95>] oom_kill_process+0x85/0xb0 Jun 10 05:45:25 db kernel: [11209156.840506] [<ffffffff81119e3a>] out_of_memory+0xfa/0x220 Jun 10 05:45:25 db kernel: [11209156.840511] [<ffffffff8111f823>] __alloc_pages_nodemask+0x8c3/0x8e0 Jun 10 05:45:25 db kernel: [11209156.840520] [<ffffffff81216e00>] ? noalloc_get_block_write+0x30/0x30 Jun 10 05:45:25 db kernel: [11209156.840528] [<ffffffff811566c6>] alloc_pages_current+0xb6/0x120 Jun 10 05:45:25 db kernel: [11209156.840534] [<ffffffff81116637>] __page_cache_alloc+0xb7/0xd0 Jun 10 05:45:25 db kernel: [11209156.840539] [<ffffffff81118602>] filemap_fault+0x212/0x3c0 Jun 10 05:45:25 db kernel: [11209156.840553] [<ffffffff81138c32>] __do_fault+0x72/0x550 Jun 10 05:45:25 db kernel: [11209156.840557] [<ffffffff8113c2ea>] handle_pte_fault+0xfa/0x200 Jun 10 05:45:25 db kernel: [11209156.840562] [<ffffffff8100638e>] ? xen_pmd_val+0xe/0x10 Jun 10 05:45:25 db kernel: [11209156.840567] [<ffffffff81005309>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e Jun 10 05:45:25 db kernel: [11209156.840571] [<ffffffff8113d559>] handle_mm_fault+0x269/0x370 Jun 10 05:45:25 db kernel: [11209156.840576] [<ffffffff8100a56d>] ? xen_force_evtchn_callback+0xd/0x10 Jun 10 05:45:25 db kernel: [11209156.840581] [<ffffffff8100ad42>] ? check_events+0x12/0x20 Jun 10 05:45:25 db kernel: [11209156.840589] [<ffffffff8165b3cb>] do_page_fault+0x14b/0x520 Jun 10 05:45:25 db kernel: [11209156.840594] [<ffffffff81160d64>] ? kmem_cache_free+0x104/0x110 Jun 10 05:45:25 db kernel: [11209156.840600] [<ffffffff811ba2c8>] ? ep_remove+0xa8/0xc0 Jun 10 05:45:25 db kernel: [11209156.840604] [<ffffffff811bb133>] ? sys_epoll_ctl+0xb3/0x3d0 Jun 10 05:45:25 db kernel: [11209156.840614] [<ffffffff81658035>] page_fault+0x25/0x30 Jun 10 05:45:25 db kernel: [11209156.840617] Mem-Info: Jun 10 05:45:25 db kernel: [11209156.840618] Node 0 DMA per-cpu: Jun 10 05:45:25 db kernel: [11209156.840622] CPU 0: hi: 0, btch: 1 usd: 0 Jun 10 05:45:25 db kernel: [11209156.840624] CPU 1: hi: 0, btch: 1 usd: 0 Jun 10 05:45:25 db kernel: [11209156.840627] CPU 2: hi: 0, btch: 1 usd: 0 Jun 10 05:45:25 db kernel: [11209156.840629] CPU 3: hi: 0, btch: 1 usd: 0 Jun 10 05:45:25 db kernel: [11209156.840631] Node 0 DMA32 per-cpu: Jun 10 05:45:25 db kernel: [11209156.840634] CPU 0: hi: 186, btch: 31 usd: 30 Jun 10 05:45:25 db kernel: [11209156.840637] CPU 1: hi: 186, btch: 31 usd: 47 Jun 10 05:45:25 db kernel: [11209156.840639] CPU 2: hi: 186, btch: 31 usd: 15 Jun 10 05:45:25 db kernel: [11209156.840641] CPU 3: hi: 186, btch: 31 usd: 2 Jun 10 05:45:25 db kernel: [11209156.840643] Node 0 Normal per-cpu: Jun 10 05:45:25 db kernel: [11209156.840646] CPU 0: hi: 186, btch: 31 usd: 0 Jun 10 05:45:25 db kernel: [11209156.840648] CPU 1: hi: 186, btch: 31 usd: 14 Jun 10 05:45:25 db kernel: [11209156.840650] CPU 2: hi: 186, btch: 31 usd: 0 Jun 10 05:45:25 db kernel: [11209156.840653] CPU 3: hi: 186, btch: 31 usd: 1 Jun 10 05:45:25 db kernel: [11209156.840658] active_anon:3616567 inactive_anon:4798 isolated_anon:0 Jun 10 05:45:25 db kernel: [11209156.840660] active_file:98 inactive_file:168 isolated_file:20 Jun 10 05:45:25 db kernel: [11209156.840661] unevictable:1597 dirty:73 writeback:0 unstable:0 Jun 10 05:45:25 db kernel: [11209156.840662] free:16921 slab_reclaimable:17631 slab_unreclaimable:7534 Jun 10 05:45:25 db kernel: [11209156.840663] mapped:1614529 shmem:1613928 pagetables:124012 bounce:0 Jun 10 05:45:25 db kernel: [11209156.840666] Node 0 DMA free:7888kB min:4kB low:4kB high:4kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:7632kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jun 10 05:45:25 db kernel: [11209156.840681] lowmem_reserve[]: 0 4016 15112 15112 Jun 10 05:45:25 db kernel: [11209156.840686] Node 0 DMA32 free:48368kB min:4176kB low:5220kB high:6264kB active_anon:3776804kB inactive_anon:28kB active_file:0kB inactive_file:20kB unevictable:932kB isolated(anon):0kB isolated(file):0kB present:4112640kB mlocked:932kB dirty:0kB writeback:0kB mapped:1458536kB shmem:1458632kB slab_reclaimable:17604kB slab_unreclaimable:8088kB kernel_stack:1872kB pagetables:190616kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:437 all_unreclaimable? yes Jun 10 05:45:25 db kernel: [11209156.840698] lowmem_reserve[]: 0 0 11095 11095 Jun 10 05:45:25 db kernel: [11209156.840703] Node 0 Normal free:11428kB min:11548kB low:14432kB high:17320kB active_anon:10689464kB inactive_anon:19164kB active_file:528kB inactive_file:652kB unevictable:5456kB isolated(anon):0kB isolated(file):80kB present:11362176kB mlocked:5456kB dirty:292kB writeback:0kB mapped:4999580kB shmem:4997080kB slab_reclaimable:52920kB slab_unreclaimable:22048kB kernel_stack:2584kB pagetables:305432kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1974 all_unreclaimable? yes Jun 10 05:45:25 db kernel: [11209156.840715] lowmem_reserve[]: 0 0 0 0 Jun 10 05:45:25 db kernel: [11209156.840720] Node 0 DMA: 2*4kB 3*8kB 1*16kB 3*32kB 3*64kB 3*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 7888kB Jun 10 05:45:25 db kernel: [11209156.840752] Node 0 DMA32: 5813*4kB 2636*8kB 114*16kB 15*32kB 5*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 48372kB Jun 10 05:45:25 db kernel: [11209156.840776] Node 0 Normal: 1888*4kB 10*8kB 46*16kB 4*32kB 3*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 11760kB Jun 10 05:45:25 db kernel: [11209156.840788] 1615243 total pagecache pages Jun 10 05:45:25 db kernel: [11209156.840790] 0 pages in swap cache Jun 10 05:45:25 db kernel: [11209156.840801] Swap cache stats: add 0, delete 0, find 0/0 Jun 10 05:45:25 db kernel: [11209156.840803] Free swap = 0kB Jun 10 05:45:25 db kernel: [11209156.840805] Total swap = 0kB Jun 10 05:45:25 db kernel: [11209156.909794] 3934192 pages RAM Jun 10 05:45:25 db kernel: [11209156.909804] 99282 pages reserved Jun 10 05:45:25 db kernel: [11209156.909809] 18899146 pages shared Jun 10 05:45:25 db kernel: [11209156.909811] 2198511 pages non-shared Jun 10 05:45:25 db kernel: [11209156.909817] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name Jun 10 05:45:25 db kernel: [11209156.909835] [ 332] 0 332 4308 109 1 0 0 upstart-udev-br Jun 10 05:45:25 db kernel: [11209156.909845] [ 346] 0 346 5384 271 2 -17 -1000 udevd Jun 10 05:45:25 db kernel: [11209156.909851] [ 408] 0 408 5364 174 2 -17 -1000 udevd ... Jun 10 05:45:25 db kernel: [11209156.910703] [ 7963] 111 7963 17456 2966 0 0 0 wal-e Jun 10 05:45:25 db kernel: [11209156.910707] [ 7968] 111 7968 1639372 2351 3 0 0 postgres Jun 10 05:45:25 db kernel: [11209156.910711] [ 7969] 111 7969 1639371 1934 2 0 0 postgres Jun 10 05:45:25 db kernel: [11209156.910716] Out of memory: Kill process 12443 (postgres) score 418 or sacrifice child Jun 10 05:45:25 db kernel: [11209156.910733] Killed process 12443 (postgres) total-vm:6555152kB, anon-rss:4600kB, file-rss:6396572kB Jun 10 05:45:30 db kernel: [11209159.293083] postgres invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Jun 10 05:45:31 db kernel: [11209159.293091] postgres cpuset=/ mems_allowed=0 Jun 10 05:45:31 db kernel: [11209159.293095] Pid: 6508, comm: postgres Not tainted 3.2.0-43-virtual #68-Ubuntu Jun 10 05:45:31 db kernel: [11209159.293098] Call Trace: Jun 10 05:45:31 db kernel: [11209159.293111] [<ffffffff81119711>] dump_header+0x91/0xe0 Jun 10 05:45:31 db kernel: [11209159.293115] [<ffffffff81119a95>] oom_kill_process+0x85/0xb0 Jun 10 05:45:31 db kernel: [11209159.293119] [<ffffffff81119e3a>] out_of_memory+0xfa/0x220 ... 

We can try increasing the resolution of these to roughly once per second, but would there be any reason for an OOM here? (We've seen http://bl0rg.krunch.be/oom-frag.html but we're working with much larger absolute amounts of memory, most of which is taken by the kernel's FS cache.)

Also including relevant parts of our postgresql.conf below:

shared_buffers = 6GB effective_cache_size = 8GB 
5
  • 2
    Related chat discussion Commented Jun 12, 2013 at 0:14
  • Um... 3.2.0-43? Update time. The OOM killer has been through many (too many) changes over time. Some versions were really quite brain-dead about accounting for shared memory use... like PostgreSQL 9.2 and older's shared_buffers. Commented Jun 12, 2013 at 4:02
  • @CraigRinger Unfortunately there are other considerations including sticking with the kernel in the Ubuntu 12.04 distro for LTS, compatibility, security updates, etc. We're just interested if someone knows how to explain what's happening here - if it helps, pretend we're not interested in changing the status quo / making the problem go away. BTW shared_buffers is still in PG9.3. Commented Jun 12, 2013 at 9:56
  • Yes shared_buffers is still in 9.3, but it's no longer POSIX shared memory in 9.3. It's been replaced with an anonymous mmap()ed region. That gets around some icky kernel config issues and pinning problems, though I doubt it'll make the OOM killer any less confused. Commented Jun 12, 2013 at 10:39
  • 1
    Possibly a duplicate of serverfault.com/questions/288319/…, which has a potential answer. Commented Sep 26, 2016 at 9:06

2 Answers 2

6

It appears you (and I in a case with very similar symptoms) have truly run out of memory and have been confused by the cached number.

There apparently are cases when Linux not freeing large disk cache when memory demand goes up

In particular (I don't really understand why), postgres' shared_buffers may be reported under "Cached" (the page cache). In your case the 6481852k cached in top matches this line in the OOM-killer's log:

Jun 10 05:45:25 db kernel: [11209156.840788] 1615243 total pagecache pages 

(1615243*4KB ~= 6481852k) - meaning the page cache indeed was not dropped before invoking OOM-killer.

Yet there are few file-backed pages (I assume active_file:98 inactive_file:168 is similar to /proc/meminfo's Active(file)/Inactive(file)), so it's not the discardable pages we know and love.

The post at https://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/ demonstrates an example session where shutting down postgres leads to reduction of "cached" by the size of shared_buffers (scroll to "And most of it came off disk cache – as expected, because it was used for shared_buffers.") - unfortunately it doesn't indicate the version of postgres nor the kernel that was used for the experiment.

I'm using 3.13.0-67 x86_64 with PG 9.3. In 9.3 they switched from using Sys V shared memory (shmget) to anonymous mmap(...R+W, MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE...)+fork() (in 9.4 this became configurable via dynamic_shared_memory_type). But I couldn't find any explanations as to why these mmap()s are supposed to show up in "cached" and why, only https://access.redhat.com/solutions/406773 that says "Cached: Memory in the pagecache (Diskcache and Shared Memory)"

Given that there's many kinds of shared memory I'm both enlightened and confused...

1
  • After several years, this is a much better answer, thank you. There's still the question of why this is accounted as cached, but I am marking this accepted for now. Commented Jun 16, 2017 at 1:31
9
  1. For the love of everything good in the world, configure swap space on your servers.
    You really need swap space. I'm not the only one who says so, it's pretty much a universal truth around here. (<-- Those are three links )
    You should of course have enough RAM that your database server isn't swapping regularly -- if you don't the solution is money (which you take you your vendor and use to acquire more RAM).

  2. Since you now have adequate RAM, and swap to use if something goes wrong, you can disable the OOM killer (by disabling memory overcommit), like the Postgres people tell you to.
    (You can also apply their alternate solution and tell the OOM-Killer to never kill Postgres - but then you're just playing Russian Roulette with the rest of your system's processes...)

  3. (optional) Write an answer on Server Fault detailing why the default behavior in most Linux distributions is Bad, Wrong, and violates the POSIX specification for how malloc() is supposed to behave. Repeat it until everyone is sick of hearing about it.


Also note that the kernel's cached memory is available to postgres (or any other application) to use - you should factor it as free/available memory in your calculations.

If I had to hazard a guess at what's happening here I'd say you've got a complex query, Postgres is requesting RAM to execute it, and rather than saying "I don't have that RAM" Linux tells Postgres "Sure, you can have it."
Then when Postgres actually tries to use the RAM it was (allegedly) given Linux realizes it doesn't HAVE the RAM it promised Postgres (because it's overcommitted) - the OOM killer is told to free up the RAM, and dutifully kills the program using the most memory -- your database server.

Postgres is a well-designed program. If it's told it can't have the RAM it's requesting it will handle that gracefully (either by making do with less, or aborting with a message to the user).

8
  • 4
    Thanks for the elaboration on swap, but this doesn't answer my question of why this is happening in the first place. Yes, I understand the basic premise that Linux overcommits by default and that OOM is when we're out of RAM - I could have stated this much in my original question. But the question is why is it kicking in when I still have plenty of RAM (much of it just sitting in the FS cache)? Assume I'm not even interested in changing anything - that the OOM killer is fine, so long as I understand why it's being triggered. Commented Jun 12, 2013 at 2:43
  • 2
    After reviewing the links, there are unfortunately a number of assertions without backing evidence or concrete technical explanations. There certainly many Linux environments where swap is not even an option (example: look no further than a Live CD where there isn't already an existing local swap partition to reuse). We are furthermore not interested in enabling swappiness based on our own experiences and environment - we would rather have the OOM. An answer to the original question would be appreciated. Commented Jun 12, 2013 at 9:52
  • 1
    @Yang I assume that if you're building a server for Postgres you would want to follow the recommendations of the Postgres project. My answer is to do what they tell you (shut off the OOM killer). If you would like to wait and see if someone offers you a different answer you're certainly free to do so, but I'm not comfortable offering any other solution - In my opinion the OOM killer is Bad, Wrong, and violates POSIX. It may be acceptable on a desktop/worstation, but disabling it on servers is, IMO, The Right Thing To Do. Commented Jun 12, 2013 at 15:46
  • 3
    I've run across this situation on a server which does have swap, and after saturating available memory+swap the OOM killer was used instead of the kernel reclaiming "cached" memory, which was obviously somehow locked. I never resolved the issue, but @Yang's original question is not answered here. Commented Jun 20, 2014 at 5:37
  • 2
    Swap is not the answer, it only makes problem appear later. You need swap when the RAM is full and you need OOM Killer when RAM+swap gets full. If amount of swap is zero, you need OOM Killer sooner but you cannot avoid OOM Killer with swap. Commented Oct 10, 2018 at 11:07

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.