Linux-MM docs: the OOM killer

$Id: oom-killer.shtml,v 1.2 2001/02/01 23:04:25 riel Exp $

Since VM (virtual memory) on any system without strict per-user and/or per-group quotas can get completely exhausted, often leaving the system catatonic, it was clear that Linux needed a type of emergency recovery code to recover virtual memory when VM is completely exhausted.

When VM is exhausted, the OS can do things like adding swap space, suspending processes and writing their image to files and many more things, but even with all these (currently not implemented) tricks, there will be a point where the OS just cannot go any further.

In that situation, the only solution is to kill a process, recovering the memory that that process was using. Killing a process is arguably a bad thing to do, but most people seem to agree that it is better than doing nothing and letting the system hang until either a miracle happens or the system administrator walks by to push the reset button.

Killing a process, however, means that the system loses all the work that's been done by the process and possibly the availability of a system service, which would lead to an unusable system too. The OOM (out of memory) killer does its best to minimise the damage by making a careful choice which process to kill, instead of randomly killing something.

The goals of the OOM killer are diverse:

Luckily most of these goals are easy to fulfill, as long as we don't try to be perfect but satisfy ourselves with merely "good" behaviour. After all, the OOM killer is mostly about avoiding bad things, so the amount of freedom in chosing a good process to kill is pretty big.

The OOM killer uses the following factors to chose which process to kill in an out of memory situation:

The system uses these factors in a scoring system and the process which gets the highest amount of points will be killed by the OOM killer. This algorithm takes enough factors into account to chose a good process to kill, yet is simple enough that the results are predictable and the system administrator knows what to expect.

Whenever a process gets killed due to OOM (out of memory), the system will print a message, so the system administrator can see in the logfiles what has happened.

Rik van Riel
riel@conectiva.com.br
01/02/2001

Copyright 2001