Home > linux > The oom-killer

The oom-killer

This post is pure geek. You’ve been warned.


I came across an interesting gotcha in Linux 2.6 kernels that I’ve never seen before despite the fact that I’ve been running 2.6 kernels for a number of years.

It’s called the oom-killer. Bwahahaha…

I came across this when my virtual machine died while running a build. The /var/log/messages log showed:

May 30 00:02:33 localhost kernel: oom-killer: gfp_mask=0xd0May 30 00:02:36 localhost kernel: cpu 0 hot: low 32, high 96, batch 16May 30 00:02:36 localhost kernel: cpu 0 cold: low 0, high 32, batch 16May 30 00:02:36 localhost kernel: cpu 1 hot: low 32, high 96, batch 16May 30 00:02:36 localhost kernel: cpu 1 cold: low 0, high 32, batch 16May 30 00:02:36 localhost kernel: May 30 00:02:36 localhost kernel: Free pages:       35932kB (19712kB HighMem)May 30 00:02:39 localhost kernel: HighMem free:19712kB min:512kB low:1024kB high:1536kB active:1002148kB                                   inactive:1422792kB present:2488324kB pages_scanned:0                                   all_unreclaimable? noMay 30 00:02:39 localhost kernel: protections[]: 0 0 0May 30 00:02:40 localhost kernel: DMA: 1*4kB 3*8kB 4*16kB 2*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB                                   1*2048kB 2*4096kB = 12508kBMay 30 00:02:43 localhost kernel: Free swap:       4176688kBMay 30 00:02:44 localhost kernel: 851457 pages of RAMMay 30 00:02:44 localhost kernel: 622081 pages of HIGHMEMMay 30 00:02:44 localhost kernel: 8252 reserved pagesMay 30 00:02:44 localhost kernel: 748042 pages sharedMay 30 00:02:44 localhost kernel: 2173 pages swap cachedMay 30 00:02:44 localhost kernel: Out of Memory: Killed process 4300 (vmware-vmx

The oom-killer is the Out Of Memory killer and is described here:

By default, the Linux kernel is configured to never say no when application processes ask for more memory. The assumption is that the applications will not actually use all the memory they ask for–this is called overcommitting memory. Hotels and airlines do the same thing when accepting bookings: the assumption is that not everybody who makes a booking will actually turn up to take their room/flight.

However, every now and then, the assumption is wrong–you have an overbooking, and somebody has to get bumped. When Linux runs out of memory, it starts killing processes in order to free some up. Of course, the processes are chosen according to a heuristic (which is a technical term meaning “you can’t please everyone”), and so invariably the kernel is going to kill something you consider important, thereby leading to much wailing and gnashing of teeth.

There are a number of ways of stopping this:

  • Check for low memory exhaustion

    # egrep 'High|Low' /proc/meminfo

    # free -lm

    When low memory is exhausted, it doesn’t matter how much high memory is
    available, the oom-killer will begin whacking processes to keep the
    server alive.

  • upgrade to 64-bit linux(not an option for most)
  • If limited to 32-bit Linux, the best solution is to run the hugemem kernel. This kernel splits low/high memory differently, and in most cases should provide enough low memory to map high memory. In most cases this is an easy fix – simply install the hugemem kernel RPM & reboot.
  • If running the 32-bit hugemem kernel isn’t an option either, you can try setting /proc/sys/vm/lower_zone_protection to a value of 250 or more. This will cause the kernel to try to be more aggressive in defending the low zone from allocating memory that could potentially be allocated in the high memory zone. As far as I know, this option isn’t available until the 2.6.x kernel. Some experimentation to find the best setting for your environment will probably be necessary. You can check & set this value on the fly via:
        # cat /proc/sys/vm/lower_zone_protection    # echo "250" > /proc/sys/vm/lower_zone_protection

    To set this option on boot, add the following to /etc/sysctl.conf:

        vm.lower_zone_protection = 250
  • As a last-ditch effort, you can disable the oom-killer. This option can cause the server to hang, so use it with extreme caution (and at your own risk)!
        Check status of oom-killer:    # cat /proc/sys/vm/oom-kill
    
        Turn oom-killer off/on:    # echo "0" > /proc/sys/vm/oom-kill    # echo "1" > /proc/sys/vm/oom-kill
    
        To make this change take effect at boot time, add the following to /etc/sysctl.conf:    vm.oom-kill = 0


Categories: linux Tags:
  1. Gladys Kravitz
    May 31st, 2008 at 22:16 | #1

    riveting.

  2. Anonymous
    June 1st, 2008 at 21:31 | #2

    huh??

  1. No trackbacks yet.

Switch to our mobile site