Too little, too slow!

Linux 2000 UK Linux Developers' Conference

Linux for the Enterprise

7 - 9 July 2000, Hammersmith (West London)

Too little, too slow!

Memory Management

Introduction to memory management

Rik van Riel

To understand memory management, one needs to have a good conceptual and quantative overview of the memory hierarchy, the hierarchy of progressively smaller, faster and more expensive types of memory that populate every computer today. Because of this we'll start the talk with an overview of the memory hierarchy and some of the concepts involved in memory management.

If you're already knowledgeable about memory management, you may want to skip the rest of this abstract and move on to section 2).

The speed difference between CPU and memory (25 to 100 times as slow) and memory and hard disk (>100.000 times as slow) is quite big. Because of this the memory hierarchy poses some "interesting" performance problems that the operating system has to deal with.

The speed difference between CPU and memory is mainly masked by "cache"; cache is very fast memory and using it does not need support from the Operating System or application. However, there are some tricks the OS can perform to make it easier for the cache to do its job well and to raise system performance.

The speed difference between memory and hard disk is truly enormous. Furthermore, data on disk will be saved permanently so we need to store some of it in a way that we can find it back after the computer is rebooted.

This means that we have to store the data in a "filesystem". I won't talk about how a filesystem works. The important part is that a filesystem works like an index where you have to look up where the data is.

The extra lookup means that the disk would be even slower, more than a million times as slow as the processor! The only reason that the system still runs reasonably fast is because of some memory management and filesystem tricks.

A look at Linux 2.5 VM

In this part we'll present some new ideas for Linux memory management. While current Linux memory management should be able to cope with most "normal" system loads just fine, it isn't as good as it could be and should be improved a bit to handle extreme situations a bit better. The following ideas will be presented.

2.5 VM

In Linux 2.5 virtual memory management will see some considerable changes. One of the main problems with the current Linux memory management is that sometimes we cannot make a proper distinction between pages which are in use and pages which can be evicted from memory to make room for new data.

In order to improve that situation and make the VM subsystem more resilient against wildly variable VM loads, we will use ideas from various other operating systems to improve Linux memory management. The main page replacement routine will use the active, inactive and scavenge (cache) lists as found in FreeBSD. This mechanism maintains a balance between used and old memory pages so there will always be "proper" pages around to swap. In addition to this there will probably be things like dynamic and administrator settable RSS limits, anti hog code to prevent one user or process from hogging the machine and slowing down the rest of the machine and per-user memory accounting.

Anti hog code

The virtual memory management of most modern operating systems works under the assumption that every page is of equal importance, applying equal memory pressure to each page in the system. This can lead to the situation where one memory hog is running happily and touching all its pages all the time (since it is in memory it is fast) and the rest of the system is thrashing (and will continue to do so since it is running so slow that it won't get a chance to use its memory before the next pageout scan).

Since this is a very unfair situation that nobody wants to run into and also can cause very inefficient system use, we should leave the idea that every page is equally important behind. There are a number of ideas that can improve this situation considerably. Two of these will be presented in this lecture. One is the simple anti-hog code that was experimented with in the 2.3 kernel and the other is the solution of dynamic RSS limits.

Process suspension

When the memory load on the system is just too big (eg. when the working set of all running processes no longer fits in memory) paging is no longer enough and something else needs to be done. The simplest solution is to simply suspend a process for a while so that the sum of all working set sizes is small enough to fit in memory.

The obvious questions arising with this solution are: which process(es) to suspend? For how long should they be suspended? How do we ensure fairness? How do we make sure that every process is able to get some work done? How do we make sure interactive performance isn't impacted too much?

The algorithm presented is a variation on the algorithm used by ITS (the Incompatible Timesharing System), where the system makes an attempt at measuring throughput made times memory used, averaged over time. Using this per-process number the system can estimate how badly the system is thrashing (do we need to suspend a process) and make sure all processes receive fair treatment.

Biography

Rik van Riel was born in 1978 in the north east of the Netherlands, but didn't buy his first computer until 12 years later. Linux wasn't installed until 4 years after that, when the XT was replaced by a 486. In the mean time, Rik finished grammar school.

University was a different story. Little bits of a lot of different things were learnt and a lot was learnt about Linux. Not surprisingly it took two years for Rik to quit university and get a Linux job. First as Linux consultant for Le Reseau [http://www.reseau.nl/], but...

Currently Rik works as a full time kernel developer for Conectiva S.A. [http://www.conectiva.com/]. His specialties include memory management and scheduling, he also runs the Linux-MM website and the #kernelnewbies IRC channel on openprojects.net.

At the moment Rik is busy designing new memory management ideas for use in Linux 2.5, with VM maintenance in 2.2 and 2.4 "in the background". High availability, filesystems and various other things also get some attention, but at the moment memory management seems to eat all his time...