27

Running Ubuntu on a 2.6.31-302 x86-64 kernel. The overall problem is that I have memory in the 'cached' category that keeps on going up and will not be freed or used even when our application needs it.

So here's what I get out of the 'free' command. None of this looks out of the ordinary at first glance.

# free total used free shared buffers cached Mem: 7358492 5750320 1608172 0 7848 1443820 -/+ buffers/cache: 4298652 3059840 Swap: 0 0 0 

The first thing someone's going to say is "Don't worry, linux manages that memory automatically." Yes, I know how the memory manager is supposed to work; the problem is that it's not doing the right thing. The "cached" 1.4 GB here appears to be reserved and unusable.

My knowledge of Linux tells me that 3 GB is "free"; but the behavior of the system says otherwise. When the 1.6 GB of real free memory is used up during peak usage, as soon as more memory is demanded (and the 'free' in the first column approaches 0) the OOM killer is invoked, processes are killed, and problems start to arise even though the 'free' in the -/+ buffers/cache row still has about 1.4 GB 'free'.

I've tuned the oom_adj values on key processes so it doesn't bring the system to its knees, but even then important processes will be killed, and we never want to reach that point. Especially when, theoretically, 1.4GB is still "free" if it would only evict the disk cache.

Does anyone have any idea what's going on here? The internet is flooded with the dumb questions about the Linux 'free' command and "why don't I have any free memory" and I can't find anything about this issue because of that.

The first thing that pops into my head is that swap is off. We have a sysadmin that is adamant about it; I am open to explanations if they're backed up. Could this cause problems?

Here's free after running echo 3 > /proc/sys/vm/drop_caches :

# free total used free shared buffers cached Mem: 7358492 5731688 1626804 0 524 1406000 -/+ buffers/cache: 4325164 3033328 Swap: 0 0 0 

As you can see, some minuscule amount of cache is actually freed up, but around 1.4 GB appears to be "stuck." The other problem is that this value seems to rise over time. On another server 2.0 GB is stuck.

I'd really like this memory back... any help would be most appreciated.

Here's cat /proc/meminfo if it's worth anything:

# cat /proc/meminfo MemTotal: 7358492 kB MemFree: 1472180 kB Buffers: 5328 kB Cached: 1435456 kB SwapCached: 0 kB Active: 5524644 kB Inactive: 41380 kB Active(anon): 5492108 kB Inactive(anon): 0 kB Active(file): 32536 kB Inactive(file): 41380 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 320 kB Writeback: 0 kB AnonPages: 4125252 kB Mapped: 42536 kB Slab: 29432 kB SReclaimable: 13872 kB SUnreclaim: 15560 kB PageTables: 0 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 3679244 kB Committed_AS: 7223012 kB VmallocTotal: 34359738367 kB VmallocUsed: 7696 kB VmallocChunk: 34359729675 kB DirectMap4k: 7340032 kB DirectMap2M: 0 kB 
5
  • 3
    I don't have any explanation for your cache (although I suspect that mmap'd files probably comes into it), but for the good of humanity, take a shovel and some quicklime and get rid of the "you don't need swap if you've got lots of RAM!" booster. They're immune to rational discussion, and they're dangerously wrong. The fact the OOM killer is stalking you is just one symptom of this. Commented Jul 9, 2011 at 0:22
  • My thoughts exactly. Thanks for the advice. Do you know any other good articles or arguments on why swap is necessary? Commented Jul 11, 2011 at 22:00
  • 7
    Because if you don't have swap, things like this happen. But don't bother trying to argue with your swap denier; either break out the quicklime or say "if you don't want swap on here, you fix this mess you've insisted on creating". They'll either eventually change their mind themselves or they'll die trying. Problem solved either way. Commented Jul 11, 2011 at 23:00
  • Excellent, thanks for the tips. You were right about mmap'd files by the way - a quick lsof showed gigs of log files taking up the memory. Clearing them out solved the issue. Commented Jul 12, 2011 at 15:12
  • The problem is that without swap, overcommitting results in the OOM killer running and not overcommitting results in a system that can't launch processes. You need swap to make effective use of RAM. Commented Oct 21, 2014 at 18:07

3 Answers 3

10

I have discovered the answer to my own question - thanks to womble's help (submit an answer if you like).

lsof -s shows file handles in use, and turns out there were several gigabytes of mmap'd log files taking up the cache.

Implementing a logrotate should resolve the issue completely and allow me to take advantage of more memory.

I will also re-enable swap so we have no problems with the OOM killer in the future. Thanks.

4
  • 3
    mmap'd pages are discardable so that should not cause the cache to be pinned. Are you using a ramfs? Commented Jul 12, 2011 at 15:12
  • Hi, sorry to dig up an old thread, but I'm facing the same issue currently and lsof -s doesn't show any unusual usage. However, I am using a ramfs like you said [and the 2.6.10 kernel, which doesn't have the drop_caches feature]. What do you think is the likely suspect? Commented Jan 18, 2017 at 22:36
  • 1
    Thanks for the tip! I'm adding lsof -s | sort -rnk 7 | less to my toolbox now. A note for other readers: this may large entries like /proc/net/rpc/nfs4.nametoid/channel, but they didn't turn out to be the culprit in my case. Commented Jun 20, 2017 at 11:16
  • 1
    make sure your large files or programs aren't using mlock. in /proc/meminfo look at "Unevictable" pages. Commented Dec 4, 2017 at 4:07
2

Apparently, postgres' shared_buffers can show up in cached, while not really being easily discardable... See OOM despite available memory (cache)

1
  • 1
    Obviously the correct answer. See also here. Commented Feb 24, 2018 at 1:02
2

I encountered similar issue where the file system selection was incorrect. Switching to the xfs system from tmpfs resolved this problem. The tmpfs system was using all RAM as a page-cache, resulting in the eventual termination of my process by the OOM killer.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.