Tech Insider					   Technology and Trends


			   USENET Archives


Electronic mail:			      WorldWideWeb:
   tech-insider@outlook.com		         http://tech-insider.org/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed.icl.net!netnews.com!xfer02.netnews.com!
newsfeed1.cidera.com!Cidera!news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!
horse.lucky.net!carrier.kiev.ua!solar.carrier.kiev.ua!not-for-mail
From: Marcelo Tosatti <marc...@conectiva.com.br>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Mon, 10 Dec 2001 20:27:43 +0000 (UTC)
Organization: unknown
Lines: 45
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <Pine.LNX.4.21.0112101705281.25362-100000@freak.distro.conectiva>
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: solar.carrier.kiev.ua 1008016064 2466 193.193.193.124 (10 Dec 2001 20:27:44 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Mon, 10 Dec 2001 20:27:44 +0000 (UTC)
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Andrea Arcangeli


Andrea, 

Could you please start looking at any 2.4 VM issues which show up ?

Just please make sure that when sending a fix for something, send me _one_
problem and a patch which fixes _that_ problem.

I'm tempted to look at VM, but I think I'll spend my limited time in a
better way if I review's others people work instead.

---------- Forwarded message ----------
Date: Mon, 10 Dec 2001 16:46:02 -0200 (BRST)
From: Marcelo Tosatti <marc...@conectiva.com.br>
To: Abraham vd Merwe <abra...@2d3d.co.za>
Cc: Linux Kernel Development <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up



On Mon, 10 Dec 2001, Abraham vd Merwe wrote:

> Hi!
> 
> If I leave my machine on for a day or two without doing anything on it (e.g.
> my machine at work over a weekend) and I come back then 1) all my memory is
> used for buffers/caches and when I try running application, the OOM killer
> kicks in, tries to allocate swap space (which I don't have) and kills
> whatever I try start (that's with 300M+ memory in buffers/caches).

Abraham, 

I'll take a look at this issue as soon as pre8 is released. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: vasquez.zip.com.au: Host r...@zipperii.zip.com.au 
[61.8.0.87] claimed to be zip.com.au
Original-Message-ID: <3C151F7B.44125B1@zip.com.au>
Original-Date: 	Mon, 10 Dec 2001 12:47:55 -0800
From: Andrew Morton <a...@zip.com.au>
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.17-pre5 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Marcelo Tosatti <marc...@conectiva.com.br>
CC: Andrea Arcangeli <and...@suse.de>, lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-References: <Pine.LNX.4.21.0112101705281.25362-100...@freak.distro.conectiva>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Mon, 10 Dec 2001 20:49:56 GMT
Message-ID: <fa.b7d58hv.8nqk9d@ifi.uio.no>
References: <fa.n33nt6v.66g9oc@ifi.uio.no>
Lines: 65

Marcelo Tosatti wrote:
> 
> Andrea,
> 
> Could you please start looking at any 2.4 VM issues which show up ?
> 

Just fwiw, I did some testing on this yesterday.

Buffers and cache data are sitting on the active list, and shrink_caches()
is *not* getting them off the active list, and onto the inactive list
where they can be freed.

So we end up with enormous amounts of anon memory on the inactive
list, so this code:

        /* try to keep the active list 2/3 of the size of the cache */
        ratio = (unsigned long) nr_pages * nr_active_pages / ((nr_inactive_pages + 1) * 2);
        refill_inactive(ratio);

just calls refill_inactive(0) all the time.  Nothing gets moved
onto the inactive list - it remains full of unfreeable anon
allocations.  And with no swap, there's nowhere to go.

I think a little fix is to add

        if (ratio < nr_pages)
                ratio = nr_pages;

so we at least move *something* onto the inactive list.

Also refill_inactive needs to be changed so that it counts
the number of pages which it actually moved, rather than
the number of pages which it inspected.

In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
So we're madly trying to swap pages out and finding that there's no swap
space.  I beleive that when we find there's no swap left we should move
the page onto the active list so we don't keep rescanning it pointlessly.

A fix may be to just remove the use-once stuff.  It is one of the
sources of this problem, because it's overpopulating the inactive list.

In my testing last night, I tried to allocate 650 megs on a 768 meg
swapless box.  Got oom-killed when there was almost 100 megs of freeable
memory: half buffercache, half filecache.  Presumably, all of it was
stuck on the active list with no way to get off.

We also need to do something about shrink_[di]cache_memory(),
which seem to be called in the wrong place.

There's also the report concerning modify_ldt() failure in a
similar situation.  I'm not sure why this one occurred.  It
vmallocs 64k of memory and that seems to fail.

I did some similar testing a week or so ago, also tested
the -aa patches.  They seemed to maybe help a tiny bit,
but not significantly.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news2.google.com!news1.google.com!sn-xit-02!
supernews.com!news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 01:11:58 +0100
From: Andrea Arcangeli <and...@suse.de>
To: Andrew Morton <a...@zip.com.au>
Cc: Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-Message-ID: <20011211011158.A4801@athlon.random>
Original-References: 
<Pine.LNX.4.21.0112101705281.25362-100...@freak.distro.conectiva> 
<3C151F7B.4412...@zip.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <3C151F7B.44125B1@zip.com.au>; from akpm@zip.com.au on Mon, 
Dec 10, 2001 at 12:47:55PM -0800
X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc
X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 00:12:53 GMT
Message-ID: <fa.gm1hejv.16679c@ifi.uio.no>
References: <fa.b7d58hv.8nqk9d@ifi.uio.no>
Lines: 84

On Mon, Dec 10, 2001 at 12:47:55PM -0800, Andrew Morton wrote:
> Marcelo Tosatti wrote:
> > 
> > Andrea,
> > 
> > Could you please start looking at any 2.4 VM issues which show up ?
> > 
> 
> Just fwiw, I did some testing on this yesterday.
> 
> Buffers and cache data are sitting on the active list, and shrink_caches()
> is *not* getting them off the active list, and onto the inactive list
> where they can be freed.

please check 2.4.17pre4aa1, see the per-classzone info, they will
prevent all the problems with the refill inactive with highmem.

> 
> So we end up with enormous amounts of anon memory on the inactive
> list, so this code:
> 
>         /* try to keep the active list 2/3 of the size of the cache */
>         ratio = (unsigned long) nr_pages * nr_active_pages / ((nr_inactive_pages + 1) * 2);
>         refill_inactive(ratio);
> 
> just calls refill_inactive(0) all the time.  Nothing gets moved
> onto the inactive list - it remains full of unfreeable anon
> allocations.  And with no swap, there's nowhere to go.
> 
> I think a little fix is to add
> 
>         if (ratio < nr_pages)
>                 ratio = nr_pages;
> 
> so we at least move *something* onto the inactive list.
> 
> Also refill_inactive needs to be changed so that it counts
> the number of pages which it actually moved, rather than
> the number of pages which it inspected.

done ages ago here.

> 
> In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
> So we're madly trying to swap pages out and finding that there's no swap
> space.  I beleive that when we find there's no swap left we should move
> the page onto the active list so we don't keep rescanning it pointlessly.

yes, however I think the swap-flood with no swap isn't a very
interesting case to optimize.

> 
> A fix may be to just remove the use-once stuff.  It is one of the
> sources of this problem, because it's overpopulating the inactive list.
> 
> In my testing last night, I tried to allocate 650 megs on a 768 meg
> swapless box.  Got oom-killed when there was almost 100 megs of freeable
> memory: half buffercache, half filecache.  Presumably, all of it was
> stuck on the active list with no way to get off.
> 
> We also need to do something about shrink_[di]cache_memory(),
> which seem to be called in the wrong place.
> 
> There's also the report concerning modify_ldt() failure in a
> similar situation.  I'm not sure why this one occurred.  It
> vmallocs 64k of memory and that seems to fail.

dunno about this modify_ldt failure.

> 
> I did some similar testing a week or so ago, also tested
> the -aa patches.  They seemed to maybe help a tiny bit,
> but not significantly.

I don't have any pending bug report. AFIK those bugs are only in
mainline. If you can reproduce with -aa please send me a bug report.
thanks,

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news.tele.dk!
small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: vasquez.zip.com.au: Host r...@zipperii.zip.com.au 
[61.8.0.87] claimed to be zip.com.au
Original-Message-ID: <3C15B0B3.1399043B@zip.com.au>
Original-Date: 	Mon, 10 Dec 2001 23:07:31 -0800
From: Andrew Morton <a...@zip.com.au>
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.17-pre5 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Andrea Arcangeli <and...@suse.de>
CC: Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-References: 
<Pine.LNX.4.21.0112101705281.25362-100...@freak.distro.conectiva> 
<3C151F7B.4412...@zip.com.au>,
<3C151F7B.4412...@zip.com.au>; from a...@zip.com.au on Mon, 
Dec 10, 2001 at 12:47:55PM -0800 <20011211011158.A4...@athlon.random>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 07:09:39 GMT
Message-ID: <fa.dqdtekv.1v7q39d@ifi.uio.no>
References: <fa.gm1hejv.16679c@ifi.uio.no>
Lines: 172

Andrea Arcangeli wrote:
> 
> On Mon, Dec 10, 2001 at 12:47:55PM -0800, Andrew Morton wrote:
> > Marcelo Tosatti wrote:
> > >
> > > Andrea,
> > >
> > > Could you please start looking at any 2.4 VM issues which show up ?
> > >
> >
> > Just fwiw, I did some testing on this yesterday.
> >
> > Buffers and cache data are sitting on the active list, and shrink_caches()
> > is *not* getting them off the active list, and onto the inactive list
> > where they can be freed.
> 
> please check 2.4.17pre4aa1, see the per-classzone info, they will
> prevent all the problems with the refill inactive with highmem.

This is not highmem-related.  But the latest -aa patch does
appear to have fixed this bug.  Stale memory is no longer being
left on the active list, and all buffercache memory is being reclaimed
before the oom-killer kicks in (swapless case).

Also, (and this is in fact the same problem), the patched kernel
has less tendency to push in-use memory out to swap while leaving
tens of megs of old memory on the active list.  This is all good.

Which of your changes has caused this?

Could you please separate this out into one or more specific patches for
the 2.4.17 series?





Why does this code exist at the end of refill_inactive()?

        if (entry != &active_list) {
                list_del(&active_list);
                list_add(&active_list, entry);
        }





This test on a 64 megabyte machine, on ext2:

	time (tar xfz /nfsserver/linux-2.4.16.tar.gz ; sync)

On 2.4.17-pre7 it takes 21 seconds.  On -aa it is much slower: 36 seconds.

This is probably due to the write scheduling changes in fs/buffer.c.
This chunk especially will, under some conditions, cause bdflush
to madly spin in a loop unplugging all the disk queues:

@@ -2787,7 +2795,7 @@
 
                spin_lock(&lru_list_lock);
                if (!write_some_buffers(NODEV) || balance_dirty_state() < 0) {
-                       wait_for_some_buffers(NODEV);
+                       run_task_queue(&tq_disk);
                        interruptible_sleep_on(&bdflush_wait);
                }
        }

Why did you make this change?





Execution time for `make -j12 bzImage' on a 64meg RAM/512 meg swap
dual x86:

-aa:					4 minutes 20 seconds
2.4.7-pre8				4 minutes 8 seconds
2.4.7-pre8 plus the below patch:	3 minutes 55 seconds

Now it could be that this performance regression is due to the
write merging mistake in fs/buffer.c.  But with so much unrelated
material in the same patch it's hard to pinpoint the source.



--- linux-2.4.17-pre8/mm/vmscan.c	Thu Nov 22 23:02:59 2001
+++ linux-akpm/mm/vmscan.c	Mon Dec 10 22:34:18 2001
@@ -537,7 +537,7 @@ static void refill_inactive(int nr_pages
 
 	spin_lock(&pagemap_lru_lock);
 	entry = active_list.prev;
-	while (nr_pages-- && entry != &active_list) {
+	while (nr_pages && entry != &active_list) {
 		struct page * page;
 
 		page = list_entry(entry, struct page, lru);
@@ -551,6 +551,7 @@ static void refill_inactive(int nr_pages
 		del_page_from_active_list(page);
 		add_page_to_inactive_list(page);
 		SetPageReferenced(page);
+		nr_pages--;
 	}
 	spin_unlock(&pagemap_lru_lock);
 }
@@ -561,6 +562,12 @@ static int shrink_caches(zone_t * classz
 	int chunk_size = nr_pages;
 	unsigned long ratio;
 
+	shrink_dcache_memory(priority, gfp_mask);
+	shrink_icache_memory(priority, gfp_mask);
+#ifdef CONFIG_QUOTA
+	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
+#endif
+
 	nr_pages -= kmem_cache_reap(gfp_mask);
 	if (nr_pages <= 0)
 		return 0;
@@ -568,17 +575,13 @@ static int shrink_caches(zone_t * classz
 	nr_pages = chunk_size;
 	/* try to keep the active list 2/3 of the size of the cache */
 	ratio = (unsigned long) nr_pages * nr_active_pages / ((nr_inactive_pages + 1) * 2);
+	if (ratio == 0)
+		ratio = nr_pages;
 	refill_inactive(ratio);
 
 	nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
 	if (nr_pages <= 0)
 		return 0;
-
-	shrink_dcache_memory(priority, gfp_mask);
-	shrink_icache_memory(priority, gfp_mask);
-#ifdef CONFIG_QUOTA
-	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
-#endif
 
 	return nr_pages;
 }

> ...
> 
> >
> > In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
> > So we're madly trying to swap pages out and finding that there's no swap
> > space.  I beleive that when we find there's no swap left we should move
> > the page onto the active list so we don't keep rescanning it pointlessly.
> 
> yes, however I think the swap-flood with no swap isn't a very
> interesting case to optimize.

Running swapless is a valid configuration, and the kernel is doing
great amounts of pointless work.  I would expect a diskless workstation
to suffer from this.  The problem remains in latest -aa.  It would be
useful to find a fix.
 
> 
> I don't have any pending bug report. AFIK those bugs are only in
> mainline. If you can reproduce with -aa please send me a bug report.
> thanks,

Bugs which are only fixed in -aa aren't much use to anyone.

The VM code lacks comments, and nobody except yourself understands
what it is supposed to be doing.  That's a bug, don't you think?

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!netnews.com!xfer02.netnews.com!newsfeed1.cidera.com!
Cidera!news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!horse.lucky.net!
carrier.kiev.ua!solar.carrier.kiev.ua!not-for-mail
From: Rik van Riel <r...@conectiva.com.br>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Tue, 11 Dec 2001 13:36:57 +0000 (UTC)
Organization: unknown
Lines: 38
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <Pine.LNX.4.33L.0112111130110.4079-100000@imladris.surriel.com>
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Trace: solar.carrier.kiev.ua 1008077817 8788 193.193.193.124 (11 Dec 2001 13:36:57 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Tue, 11 Dec 2001 13:36:57 +0000 (UTC)
X-X-Sender:  <r...@imladris.surriel.com>
In-Reply-To: <3C15B0B3.1399043B@zip.com.au>
X-spambait: aardv...@kernelnewbies.org
X-spammeplease: 	aardv...@nl.linux.org
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Andrew Morton

On Mon, 10 Dec 2001, Andrew Morton wrote:

> This test on a 64 megabyte machine, on ext2:
>
> 	time (tar xfz /nfsserver/linux-2.4.16.tar.gz ; sync)
>
> On 2.4.17-pre7 it takes 21 seconds.  On -aa it is much slower: 36 seconds.

> Execution time for `make -j12 bzImage' on a 64meg RAM/512 meg swap
> dual x86:
>
> -aa:					4 minutes 20 seconds
> 2.4.7-pre8				4 minutes 8 seconds
> 2.4.7-pre8 plus the below patch:	3 minutes 55 seconds


Andrea, it seems -aa is not the holy grail VM-wise. If you want
to merge your good stuff with marcelo, please do it in the
"one patch with explanation per problem" style marcelo asked.

If nothing happens I'll take my chainsaw and remove the whole
use-once stuff just so 2.4 will avoid the worst cases, even if
it happens to remove some of the nice stuff you've been working
on.

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 14:42:23 +0100
From: Andrea Arcangeli <and...@suse.de>
To: Andrew Morton <a...@zip.com.au>
Cc: Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-Message-ID: <20011211144223.E4801@athlon.random>
Original-References: 
<Pine.LNX.4.21.0112101705281.25362-100...@freak.distro.conectiva> 
<3C151F7B.4412...@zip.com.au>, <3C151F7B.4412...@zip.com.au>; 
<20011211011158.A4...@athlon.random> <3C15B0B3.13990...@zip.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <3C15B0B3.1399043B@zip.com.au>; from akpm@zip.com.au on Mon, 
Dec 10, 2001 at 11:07:31PM -0800
X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc
X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 13:46:24 GMT
Message-ID: <fa.gl1dfbv.26q61b@ifi.uio.no>
References: <fa.dqdtekv.1v7q39d@ifi.uio.no>
Lines: 204

On Mon, Dec 10, 2001 at 11:07:31PM -0800, Andrew Morton wrote:
> Why does this code exist at the end of refill_inactive()?
> 
>         if (entry != &active_list) {
>                 list_del(&active_list);
>                 list_add(&active_list, entry);
>         }

so that we restart next time at the point where we stopped browsing the
active list.

> This test on a 64 megabyte machine, on ext2:
> 
> 	time (tar xfz /nfsserver/linux-2.4.16.tar.gz ; sync)
> 
> On 2.4.17-pre7 it takes 21 seconds.  On -aa it is much slower: 36 seconds.
> 
> This is probably due to the write scheduling changes in fs/buffer.c.

yes, I also lowered the percentage of dirty memory in the system by
default, so that a write flood should less probably stall the system.

Plus I made the elevator more latency oriented, rather than throughput
oriented. Did you also tested how much the system was responsive during
the test?

Do you remeber the thread about a 'tar xzf' hanging the machine? It
doesn't hang with -aa, but of course you'll run slower if it has to do
seeks.

> This chunk especially will, under some conditions, cause bdflush
> to madly spin in a loop unplugging all the disk queues:
> 
> @@ -2787,7 +2795,7 @@
>  
>                 spin_lock(&lru_list_lock);
>                 if (!write_some_buffers(NODEV) || balance_dirty_state() < 0) {
> -                       wait_for_some_buffers(NODEV);
> +                       run_task_queue(&tq_disk);
>                         interruptible_sleep_on(&bdflush_wait);
>                 }
>         }
> 
> Why did you make this change?

to make bdflush to less badly spin in a loop unplugging all the disk
queues.

We need to unplug only once, to submit the I/O, but we don't need to
wait on every single buffer that we previously wrote. Note that
run_task_queue() has nothing to do with wait_on_buffer, the above should
be much better in terms of "spinning in a loop unplugging all the disk
queues". It will do it only once at least.

Infact all the wait_for_some_buffers are broken (particularly the one in
balance_dirty()), they're not necessary, they can only slowdown the
machine.

The only reason would be to refile the buffers into the clean list, but
nothing else. That's a total waste of I/O pipelining. And yes, that's
something to fix too.

> Execution time for `make -j12 bzImage' on a 64meg RAM/512 meg swap
> dual x86:
> 
> -aa:					4 minutes 20 seconds
> 2.4.7-pre8				4 minutes 8 seconds
> 2.4.7-pre8 plus the below patch:	3 minutes 55 seconds
> 
> Now it could be that this performance regression is due to the
> write merging mistake in fs/buffer.c.  But with so much unrelated
> material in the same patch it's hard to pinpoint the source.
> 
> 
> 
> --- linux-2.4.17-pre8/mm/vmscan.c	Thu Nov 22 23:02:59 2001
> +++ linux-akpm/mm/vmscan.c	Mon Dec 10 22:34:18 2001
> @@ -537,7 +537,7 @@ static void refill_inactive(int nr_pages
>  
>  	spin_lock(&pagemap_lru_lock);
>  	entry = active_list.prev;
> -	while (nr_pages-- && entry != &active_list) {
> +	while (nr_pages && entry != &active_list) {
>  		struct page * page;
>  
>  		page = list_entry(entry, struct page, lru);
> @@ -551,6 +551,7 @@ static void refill_inactive(int nr_pages
>  		del_page_from_active_list(page);
>  		add_page_to_inactive_list(page);
>  		SetPageReferenced(page);
> +		nr_pages--;
>  	}
>  	spin_unlock(&pagemap_lru_lock);
>  }
> @@ -561,6 +562,12 @@ static int shrink_caches(zone_t * classz
>  	int chunk_size = nr_pages;
>  	unsigned long ratio;
>  
> +	shrink_dcache_memory(priority, gfp_mask);
> +	shrink_icache_memory(priority, gfp_mask);
> +#ifdef CONFIG_QUOTA
> +	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
> +#endif
> +
>  	nr_pages -= kmem_cache_reap(gfp_mask);
>  	if (nr_pages <= 0)
>  		return 0;
> @@ -568,17 +575,13 @@ static int shrink_caches(zone_t * classz
>  	nr_pages = chunk_size;
>  	/* try to keep the active list 2/3 of the size of the cache */
>  	ratio = (unsigned long) nr_pages * nr_active_pages / ((nr_inactive_pages + 1) * 2);
> +	if (ratio == 0)
> +		ratio = nr_pages;
>  	refill_inactive(ratio);
>  
>  	nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority);
>  	if (nr_pages <= 0)
>  		return 0;
> -
> -	shrink_dcache_memory(priority, gfp_mask);
> -	shrink_icache_memory(priority, gfp_mask);
> -#ifdef CONFIG_QUOTA
> -	shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
> -#endif
>  
>  	return nr_pages;
>  }

it should be simple, mainline swapouts more, so it's less likely to
trash away some useful cache.

just try -aa after a:

	echo 10 >/proc/sys/vm/vm_mapped_ratio

it should swapout more and better preserve the cache.

> > > In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
> > > So we're madly trying to swap pages out and finding that there's no swap
> > > space.  I beleive that when we find there's no swap left we should move
> > > the page onto the active list so we don't keep rescanning it pointlessly.
> > 
> > yes, however I think the swap-flood with no swap isn't a very
> > interesting case to optimize.
> 
> Running swapless is a valid configuration, and the kernel is doing

I'm not saying it's not valid or non interesting.

It's the mix "I'm running out of memory and I'm swapless" that is the
case not interesting to optimize.

If you're swapless it means you've enough memory and that you're not
running out of swap. Otherwise _you_ (not the kernel) are wrong not
having swap.

> great amounts of pointless work.  I would expect a diskless workstation
> to suffer from this.  The problem remains in latest -aa.  It would be
> useful to find a fix.

It can be optimized by making the other cases slower. I believe if
swap_out is recalled heavily in a swapless configuration either some
other part of the kernel or the user are wrong, not swap_out. So it's at
least not obvious to me that it would be useful to fix it inside
swap_out.

> > I don't have any pending bug report. AFIK those bugs are only in
> > mainline. If you can reproduce with -aa please send me a bug report.
> > thanks,
> 
> Bugs which are only fixed in -aa aren't much use to anyone.

Then there are no other bugs, that's fine, this is why I said I'm
finished (except for the minor performance work, like the buffer
flushing in buffer.c that certainly cannot affect stability, or the
swap-triggering etc.. all minor things that doesn't affect stability and
where there's not the perfect solution anyways).

> The VM code lacks comments, and nobody except yourself understands
> what it is supposed to be doing.  That's a bug, don't you think?

Lack of documentation is not a bug, period. Also it's not true that I'm
the only one who understands it. For istance Linus understand it
completly, I am 100% sure.

Anyways I wrote a dozen of slides on the VM with some graph showing the
design of the VM if anybody can better learn from a slide than from the
code.

I believe the slides are useful to understand the design, but if you
want to change one line of code slides or not you've to read the code.
Everybody is complaining about documentation. This is a red-herring.
There's no documentation that allows you to hack the previous VM code.
I'd ask how many of the people happy with the previous documentation
were effectively VM developers. Except for some possible misleading
comment in the current code that we may have not updated yet, I don't
think there's been a regression in documentation.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
newsfeeds.belnet.be!news.belnet.be!news.tele.dk!small.news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 15:59:22 +0200
From: Abraham vd Merwe <abra...@2d3d.co.za>
To: Andrea Arcangeli <and...@suse.de>
Cc: Linux Kernel Development <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-Message-ID: <20011211155922.B1863@crystal.2d3d.co.za>
Mail-Followup-To: Abraham vd Merwe <abra...@2d3d.co.za>,
	Andrea Arcangeli <and...@suse.de>,
	Linux Kernel Development <linux-ker...@vger.kernel.org>
Original-References: 
<Pine.LNX.4.21.0112101705281.25362-100...@freak.distro.conectiva> 
<3C151F7B.4412...@zip.com.au>, <3C151F7B.4412...@zip.com.au>; 
<20011211011158.A4...@athlon.random> <3C15B0B3.13990...@zip.com.au> 
<20011211144223.E4...@athlon.random>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="gj572EiMnwbLXET9"
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20011211144223.E4801@athlon.random>; 
from andrea@suse.de on Tue, Dec 11, 2001 at 14:42:23 +0100
X-Operating-System: Debian GNU/Linux crystal 2.4.2 i686
X-GPG-Public-Key: http://oasis.blio.net/pgpkeys/keys/2d3d.gpg
X-Uptime: 3:46pm  up 1 day,  6:27,  5 users,  load average: 0.00, 0.00, 0.00
X-Edited-With-Muttmode: 	muttmail.sl - 2001-06-06
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: 2d3D, Inc.
Date: Tue, 11 Dec 2001 13:57:41 GMT
Message-ID: <fa.fclar8v.kkq2q6@ifi.uio.no>
References: <fa.gl1dfbv.26q61b@ifi.uio.no>
Lines: 75


Hi Andrea!

> > > > In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_ot=
hers().
> > > > So we're madly trying to swap pages out and finding that there's no=
 swap
> > > > space.  I beleive that when we find there's no swap left we should =
move
> > > > the page onto the active list so we don't keep rescanning it pointl=
essly.
> > >=20
> > > yes, however I think the swap-flood with no swap isn't a very
> > > interesting case to optimize.
> >=20
> > Running swapless is a valid configuration, and the kernel is doing
>=20
> I'm not saying it's not valid or non interesting.
>=20
> It's the mix "I'm running out of memory and I'm swapless" that is the
> case not interesting to optimize.
>=20
> If you're swapless it means you've enough memory and that you're not
> running out of swap. Otherwise _you_ (not the kernel) are wrong not
> having swap.

The problem is that your VM is unnecesarily eating up memory and then wants
swap. That is unacceptable. Having 90% of your memory in buffers/cache and
then the OOM killer kicks in because nothing is free is what we're moaning
about.

--=20

Regards
 Abraham

Did you hear about the model who sat on a broken bottle and cut a nice figu=
re?

__________________________________________________________
 Abraham vd Merwe - 2d3D, Inc.

 Device Driver Development, Outsourcing, Embedded Systems

  Cell: +27 82 565 4451         Snailmail:
   Tel: +27 21 761 7549            Block C, Antree Park
   Fax: +27 21 761 7648            Doncaster Road
 Email: abra...@2d3d.co.za         Kenilworth, 7700
  Http: http://www.2d3d.com        South Africa


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 11:59:06 -0200 (BRST)
From: Rik van Riel <r...@conectiva.com.br>
X-X-Sender:  <r...@imladris.surriel.com>
To: Andrea Arcangeli <and...@suse.de>
Cc: Andrew Morton <a...@zip.com.au>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
In-Reply-To: <20011211144223.E4801@athlon.random>
Original-Message-ID: <Pine.LNX.4.33L.0112111157410.4079-100000@imladris.surriel.com>
X-spambait: aardv...@kernelnewbies.org
X-spammeplease: 	aardv...@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 14:00:57 GMT
Message-ID: <fa.njn7j0v.7g2i6@ifi.uio.no>
References: <fa.gl1dfbv.26q61b@ifi.uio.no>
Lines: 28

On Tue, 11 Dec 2001, Andrea Arcangeli wrote:

> > The VM code lacks comments, and nobody except yourself understands
> > what it is supposed to be doing.  That's a bug, don't you think?
>
> Lack of documentation is not a bug, period. Also it's not true that
> I'm the only one who understands it.

Without documentation, you can only know what the code
does, never what it is supposed to do or why it does it.

This makes fixing problems a lot harder, especially since
people will never agree on what a piece of code is supposed
to do.

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!cpk-news-hub1.bbnplanet.com!news.gtei.net!
newsfeed1.cidera.com!Cidera!news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!
horse.lucky.net!carrier.kiev.ua!solar.carrier.kiev.ua!not-for-mail
From: Andrea Arcangeli <and...@suse.de>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Tue, 11 Dec 2001 14:04:08 +0000 (UTC)
Organization: unknown
Lines: 40
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <20011211150119.H4801@athlon.random>
References: <Pine.LNX.4.21.0112101705281.25362-100000@freak.distro.conectiva> 
<3C151F7B.44125B1@zip.com.au>, <3C151F7B.44125B1@zip.com.au>; 
<20011211011158.A4801@athlon.random> <3C15B0B3.1399043B@zip.com.au> 
<20011211144223.E4801@athlon.random> <20011211155922.B1863@crystal.2d3d.co.za>
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: solar.carrier.kiev.ua 1008079449 8936 193.193.193.124 (11 Dec 2001 14:04:09 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Tue, 11 Dec 2001 14:04:09 +0000 (UTC)
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <20011211155922.B1863@crystal.2d3d.co.za>; 
from abraham@2d3d.co.za on Tue, Dec 11, 2001 at 03:59:22PM +0200
X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc
X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Abraham vd Merwe

On Tue, Dec 11, 2001 at 03:59:22PM +0200, Abraham vd Merwe wrote:
> Hi Andrea!
> 
> > > > > In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
> > > > > So we're madly trying to swap pages out and finding that there's no swap
> > > > > space.  I beleive that when we find there's no swap left we should move
> > > > > the page onto the active list so we don't keep rescanning it pointlessly.
> > > > 
> > > > yes, however I think the swap-flood with no swap isn't a very
> > > > interesting case to optimize.
> > > 
> > > Running swapless is a valid configuration, and the kernel is doing
> > 
> > I'm not saying it's not valid or non interesting.
> > 
> > It's the mix "I'm running out of memory and I'm swapless" that is the
> > case not interesting to optimize.
> > 
> > If you're swapless it means you've enough memory and that you're not
> > running out of swap. Otherwise _you_ (not the kernel) are wrong not
> > having swap.
> 
> The problem is that your VM is unnecesarily eating up memory and then wants
> swap. That is unacceptable. Having 90% of your memory in buffers/cache and
> then the OOM killer kicks in because nothing is free is what we're moaning
> about.

Dear, Abraham please apply this patch:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17pre4aa1.bz2

on top of a 2.4.17pre4 and then recompile, try again and send me a
bugreport if you can reproduce. thanks,

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
newsfeeds.belnet.be!news.belnet.be!news.tele.dk!small.news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 15:23:56 +0100
From: Andrea Arcangeli <and...@suse.de>
To: Rik van Riel <r...@conectiva.com.br>
Cc: Andrew Morton <a...@zip.com.au>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-Message-ID: <20011211152356.I4801@athlon.random>
Original-References: <20011211144223.E4...@athlon.random> 
<Pine.LNX.4.33L.0112111157410.4079-100...@imladris.surriel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <Pine.LNX.4.33L.0112111157410.4079-100000@imladris.surriel.com>; 
from riel@conectiva.com.br on Tue, Dec 11, 2001 at 11:59:06AM -0200
X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc
X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 14:25:01 GMT
Message-ID: <fa.gmhlerv.1mi7h6@ifi.uio.no>
References: <fa.njn7j0v.7g2i6@ifi.uio.no>
Lines: 39

On Tue, Dec 11, 2001 at 11:59:06AM -0200, Rik van Riel wrote:
> On Tue, 11 Dec 2001, Andrea Arcangeli wrote:
> 
> > > The VM code lacks comments, and nobody except yourself understands
> > > what it is supposed to be doing.  That's a bug, don't you think?
> >
> > Lack of documentation is not a bug, period. Also it's not true that
> > I'm the only one who understands it.
> 
> Without documentation, you can only know what the code
> does, never what it is supposed to do or why it does it.

I only care about "what the code does" and "what are the results and the
bugreports".  Anything else is vaopurware and I don't care about that.

As said I wrote some documentation on the VM for my last speech at the
one of the most important italian linux events, it explains the basic
design. It should be published on their webside as soon as I find the
time to send them the slides. I can post a link once it will be online.
It shoud allow non VM-developers to understand the logic behind the VM
algorithm, but understanding those slides it's far from allowing anyone
to hack the VM.

I _totally_ agree with Linus when he said "real world is totally
dominated by the implementation details". I was thinking this way before
reading his recent email to l-k (however I totally disagree about
evolution being random and the other kernel-offtopic part of such thread :).

For developers the real freedom is the code, not the documentation and
the code is there. And I think it's much easier to understand the
current code (ok I'm biased, but still I believe for outsiders it's
simpler).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com!
newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us!
nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
X-To: linux-ker...@vger.kernel.org
Orig-Path: 	forge.intermeta.de!not-for-mail
From: "Henning P. Schmiedehausen" <mailg...@hometree.net>
Orig-Newsgroups: hometree.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: 	Tue, 11 Dec 2001 15:47:33 +0000 (UTC)
Organization: INTERMETA - Gesellschaft fuer Mehrwertdienste mbH
Message-ID: <linux.kernel.9v59ql$pkh$1@forge.intermeta.de>
Reply-To: h...@intermeta.de
Approved: n...@nntp-server.caltech.edu
Lines: 33

Andrea Arcangeli <and...@suse.de> writes:

>Lack of documentation is not a bug, period. Also it's not true that I'm

I scare myself sh**less that you as the one responsible for something
as crucial as MM in the Linux kernel, has such an attitude towards
software development especially when people like RvR as for docs.

Sorry, but to me this sounds like something from M$ (MAPI? You don't
need MAPI documentation. We know what we're doing. You don't need to
know how Windows XX works. It's enough that we know).

Actually, you _do_ get documentation from M$. Something, one can't say
about the Linux MM-sprikled-with holy-penguin-pee subsystem.

I'm not happy about your usage of magic numbers, either. So it is
still running on solid 2.2.19 until further notice (or until Rik loses
his patience. ;-) )

	Regards
		Henning

-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     h...@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   i...@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news2.google.com!news1.google.com!sn-xit-02!
supernews.com!newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!
news.home.com!newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us!
nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
X-To: h...@intermeta.de
Date: 	Tue, 11 Dec 2001 16:01:40 +0000 (GMT)
X-Cc: linux-ker...@vger.kernel.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <linux.kernel.E16DpLg-0005f3-00@the-village.bc.nu>
From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Approved: n...@nntp-server.caltech.edu
Lines: 13

> I'm not happy about your usage of magic numbers, either. So it is
> still running on solid 2.2.19 until further notice (or until Rik loses
> his patience. ;-) )

Andrea did the 2.2.19 VM as well, but that one is somewhat better
documented, and doesn't have the use-once funnies.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 15:09:01 -0200 (BRST)
From: Rik van Riel <r...@conectiva.com.br>
X-X-Sender:  <r...@duckman.distro.conectiva>
To: <h...@intermeta.de>
Cc: <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
In-Reply-To: <9v59ql$pkh$1@forge.intermeta.de>
Original-Message-ID: <Pine.LNX.4.33L.0112111426450.1352-100000@duckman.distro.conectiva>
X-spambait: aardv...@kernelnewbies.org
X-spammeplease: 	aardv...@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 17:11:33 GMT
Message-ID: <fa.nqm0f9v.tgcurq@ifi.uio.no>
References: <fa.gqf9jvv.1j0akhn@ifi.uio.no>
Lines: 22

On Tue, 11 Dec 2001, Henning P. Schmiedehausen wrote:

> I'm not happy about your usage of magic numbers, either. So it is
> still running on solid 2.2.19 until further notice (or until Rik loses
> his patience. ;-) )

I've lost patience and have decided to move development away
from the main tree.  http://linuxvm.bkbits.net/   ;)

cheers,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
newsfeeds.belnet.be!news.belnet.be!news.tele.dk!small.news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
To: r...@conectiva.com.br (Rik van Riel)
Original-Date: 	Tue, 11 Dec 2001 17:28:04 +0000 (GMT)
Cc: h...@intermeta.de, linux-ker...@vger.kernel.org
In-Reply-To: <Pine.LNX.4.33L.0112111426450.1352-100000@duckman.distro.conectiva> 
from "Rik van Riel" at Dec 11, 2001 03:09:01 PM
X-Mailer: ELM [version 2.5 PL6]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Original-Message-Id: <E16DqhI-0005vG-00@the-village.bc.nu>
From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 17:20:41 GMT
Message-ID: <fa.hfigtfv.17msh1e@ifi.uio.no>
References: <fa.nqm0f9v.tgcurq@ifi.uio.no>
Lines: 17

> > I'm not happy about your usage of magic numbers, either. So it is
> > still running on solid 2.2.19 until further notice (or until Rik loses
> > his patience. ;-) )
> 
> I've lost patience and have decided to move development away
> from the main tree.  http://linuxvm.bkbits.net/   ;)

Are your patches available in a format that is accessible using free
software ?

(Now where did I put the troll sign 8))

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Tue, 11 Dec 2001 15:22:17 -0200 (BRST)
From: Rik van Riel <r...@conectiva.com.br>
X-X-Sender:  <r...@duckman.distro.conectiva>
To: Alan Cox <a...@lxorguk.ukuu.org.uk>
Cc: <h...@intermeta.de>, <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
In-Reply-To: <E16DqhI-0005vG-00@the-village.bc.nu>
Original-Message-ID: <Pine.LNX.4.33L.0112111520560.1352-100000@duckman.distro.conectiva>
X-spambait: aardv...@kernelnewbies.org
X-spammeplease: 	aardv...@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 11 Dec 2001 17:24:26 GMT
Message-ID: <fa.nnm2fpv.ugeubq@ifi.uio.no>
References: <fa.hfigtfv.17msh1e@ifi.uio.no>
Lines: 33

On Tue, 11 Dec 2001, Alan Cox wrote:

> > > I'm not happy about your usage of magic numbers, either. So it is
> > > still running on solid 2.2.19 until further notice (or until Rik loses
> > > his patience. ;-) )
> >
> > I've lost patience and have decided to move development away
> > from the main tree.  http://linuxvm.bkbits.net/   ;)
>
> Are your patches available in a format that is accessible using free
> software ?

Yes, I'm making patches available on my home page:

	http://surriel.com/patches/

Note that development isn't too fast due to the fact
that I try to clean up all code I touch instead of
just making the changes needed for the functionality.

kind regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!cpk-news-hub1.bbnplanet.com!news.gtei.net!
newsfeed1.cidera.com!Cidera!news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!
horse.lucky.net!news.lucky.net!carrier.kiev.ua!solar.carrier.kiev.ua!not-for-mail
From: Daniel Phillips <phill...@bonn-fries.net>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Tue, 11 Dec 2001 15:28:27 +0000 (UTC)
Organization: unknown
Lines: 39
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <E16DooZ-0001J4-00@starship.berlin>
References: <20011211144223.E4801@athlon.random> 
<Pine.LNX.4.33L.0112111157410.4079-100000@imladris.surriel.com> 
<20011211152356.I4801@athlon.random>
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7BIT
X-Trace: solar.carrier.kiev.ua 1008084508 9483 193.193.193.124 (11 Dec 2001 15:28:28 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Tue, 11 Dec 2001 15:28:28 +0000 (UTC)
X-Mailer: KMail [version 1.3.2]
In-Reply-To: <20011211152356.I4801@athlon.random>
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Andrea Arcangeli

On December 11, 2001 03:23 pm, Andrea Arcangeli wrote:
> As said I wrote some documentation on the VM for my last speech at the
> one of the most important italian linux events, it explains the basic
> design. It should be published on their webside as soon as I find the
> time to send them the slides. I can post a link once it will be online.

Why not also post the whole thing as an email, right here?

> It shoud allow non VM-developers to understand the logic behind the VM
> algorithm, but understanding those slides it's far from allowing anyone
> to hack the VM.

It's a start.

> I _totally_ agree with Linus when he said "real world is totally
> dominated by the implementation details".

Linus didn't say anything about not documenting the implementation details, 
nor did he say anything about not documenting in general.

> For developers the real freedom is the code, not the documentation and
> the code is there. And I think it's much easier to understand the
> current code (ok I'm biased, but still I believe for outsiders it's
> simpler).

Judging by the number of complaints, it's not easy enough.  I know that, 
personally, decoding your vm is something that's always on my 'things I could 
do if I didn't have a lot of other things to do' list.  So far, only Linus, 
Marcelo, Andrew and maybe Rik seem to have made the investment.  You'd have a 
lot more helpers by now if you gave just a little higher priority to 
documentation

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!news2.dg.net.ua!
bn.utel.com.ua!carrier.kiev.ua!horse.lucky.net!carrier.kiev.ua!
solar.carrier.kiev.ua!not-for-mail
From: Andrea Arcangeli <and...@suse.de>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Wed, 12 Dec 2001 11:18:49 +0000 (UTC)
Organization: unknown
Lines: 37
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <20011212121624.B4801@athlon.random>
References: <20011211144223.E4801@athlon.random> 
<Pine.LNX.4.33L.0112111157410.4079-100000@imladris.surriel.com> 
<20011211152356.I4801@athlon.random> <E16DooZ-0001J4-00@starship.berlin>
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: solar.carrier.kiev.ua 1008155930 17796 193.193.193.124 (12 Dec 2001 11:18:50 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Wed, 12 Dec 2001 11:18:50 +0000 (UTC)
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <E16DooZ-0001J4-00@starship.berlin>; 
from phillips@bonn-fries.net on Tue, Dec 11, 2001 at 04:27:23PM +0100
X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc
X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Daniel Phillips

On Tue, Dec 11, 2001 at 04:27:23PM +0100, Daniel Phillips wrote:
> On December 11, 2001 03:23 pm, Andrea Arcangeli wrote:
> > As said I wrote some documentation on the VM for my last speech at the
> > one of the most important italian linux events, it explains the basic
> > design. It should be published on their webside as soon as I find the
> > time to send them the slides. I can post a link once it will be online.
> 
> Why not also post the whole thing as an email, right here?

I uploaded it here:

	ftp://ftp.suse.com//pub/people/andrea/talks/english/2001/pluto-dec-pub-0.tar.gz

Hopefully it's understandable standalone.

> > It shoud allow non VM-developers to understand the logic behind the VM
> > algorithm, but understanding those slides it's far from allowing anyone
> > to hack the VM.
> 
> It's a start.
> 
> > I _totally_ agree with Linus when he said "real world is totally
> > dominated by the implementation details".
> 
> Linus didn't say anything about not documenting the implementation details, 
> nor did he say anything about not documenting in general.

yes, my only point was that "documentation" isn't nearly enough, and
that it's not mandatory (given all the changes don't affect any user
API), but I certainly agree documentation helps.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!news2.dg.net.ua!
bn.utel.com.ua!carrier.kiev.ua!horse.lucky.net!carrier.kiev.ua!
solar.carrier.kiev.ua!not-for-mail
From: Daniel Phillips <phill...@bonn-fries.net>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Wed, 12 Dec 2001 20:04:42 +0000 (UTC)
Organization: unknown
Lines: 42
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <E16EFb9-0000E4-00@starship.berlin>
References: <20011211144223.E4801@athlon.random> 
<E16DooZ-0001J4-00@starship.berlin> <20011212121624.B4801@athlon.random>
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7BIT
X-Trace: solar.carrier.kiev.ua 1008187483 20722 193.193.193.124 (12 Dec 2001 20:04:43 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Wed, 12 Dec 2001 20:04:43 +0000 (UTC)
X-Mailer: KMail [version 1.3.2]
In-Reply-To: <20011212121624.B4801@athlon.random>
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Andrea Arcangeli

On December 12, 2001 12:16 pm, Andrea Arcangeli wrote:
> On Tue, Dec 11, 2001 at 04:27:23PM +0100, Daniel Phillips wrote:
> > On December 11, 2001 03:23 pm, Andrea Arcangeli wrote:
> > > As said I wrote some documentation on the VM for my last speech at the
> > > one of the most important italian linux events, it explains the basic
> > > design. It should be published on their webside as soon as I find the
> > > time to send them the slides. I can post a link once it will be online.
> > 
> > Why not also post the whole thing as an email, right here?
> 
> I uploaded it here:

ftp://ftp.suse.com//pub/people/andrea/talks/english/2001/pluto-dec-pub-0.tar.gz

This is really, really useful.

Helpful hint: to run the slideshow, get magicpoint (debian users: apt-get 
install mgp) and do:

   mv pluto.mpg pluto.mgp # ;)
   mgp pluto.mgp -x vflib

Helpful hint #2: Actually, just gv pluto.ps is gets all the content.

Helpful hint #3: Actually, less pluto.mgp will do the trick (after the 
rename) and lets you cut and paste the text, as I'm about to do...

Nit: "vm shrinking is not serialized with any other subsystem, it is also 
                                                              only---^^^^
threaded against itself."

The big thing I see missing from this presentation is a discussion of how 
icache, dcache etc fit into the picture, i.e., shrink_caches.

--
Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com!
newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us!
nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Wed, 12 Dec 2001 22:25:34 +0100
From: Andrea Arcangeli <and...@suse.de>
X-To: Daniel Phillips <phill...@bonn-fries.net>
X-Cc: Rik van Riel <r...@conectiva.com.br>, Andrew Morton <a...@zip.com.au>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Message-ID: <linux.kernel.20011212222534.P4801@athlon.random>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Approved: n...@nntp-server.caltech.edu
Lines: 59

On Wed, Dec 12, 2001 at 09:03:20PM +0100, Daniel Phillips wrote:
> On December 12, 2001 12:16 pm, Andrea Arcangeli wrote:
> > On Tue, Dec 11, 2001 at 04:27:23PM +0100, Daniel Phillips wrote:
> > > On December 11, 2001 03:23 pm, Andrea Arcangeli wrote:
> > > > As said I wrote some documentation on the VM for my last speech at the
> > > > one of the most important italian linux events, it explains the basic
> > > > design. It should be published on their webside as soon as I find the
> > > > time to send them the slides. I can post a link once it will be online.
> > > 
> > > Why not also post the whole thing as an email, right here?
> > 
> > I uploaded it here:
> 
> ftp://ftp.suse.com//pub/people/andrea/talks/english/2001/pluto-dec-pub-0.tar.gz
> 
> This is really, really useful.
> 
> Helpful hint: to run the slideshow, get magicpoint (debian users: apt-get 
> install mgp) and do:
> 
>    mv pluto.mpg pluto.mgp # ;)

8)

>    mgp pluto.mgp -x vflib
> 
> Helpful hint #2: Actually, just gv pluto.ps is gets all the content.
> 
> Helpful hint #3: Actually, less pluto.mgp will do the trick (after the 
> rename) and lets you cut and paste the text, as I'm about to do...
> 
> Nit: "vm shrinking is not serialized with any other subsystem, it is also 
>                                                               only---^^^^
> threaded against itself."

correct.

> The big thing I see missing from this presentation is a discussion of how 
> icache, dcache etc fit into the picture, i.e., shrink_caches.

Going into the differences between icache/dcache and pagecache would
been too low level (and I should have spent some time explaining what
icache and dcache are first ;), so as you noticed I intentionally
ignored those highlevel vfs caches in the slides. The concept of "pages
of cache" is usually well known by most people instead, so I only
considered the pagecache, that incidentally is also the most interesting
case for the VM.  For seasoned kernel developers it would been
interesting to integrate more stuff, of course, but as you said this is
a start at least :).

About the icache/dcache shrinking, that's probably the most rough thing
we have in the vm at the moment. It just works.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news.tele.dk!small.news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: vasquez.zip.com.au: 
Host r...@zipperii.zip.com.au [61.8.0.87] claimed to be zip.com.au
Original-Message-ID: <3C1717C3.82CC4A63@zip.com.au>
Original-Date: 	Wed, 12 Dec 2001 00:39:31 -0800
From: Andrew Morton <a...@zip.com.au>
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.17-pre8 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Andrea Arcangeli <and...@suse.de>
CC: Marcelo Tosatti <marc...@conectiva.com.br>,
        lkml <linux-ker...@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Original-References: 
<Pine.LNX.4.21.0112101705281.25362-100...@freak.distro.conectiva> 
<3C151F7B.4412...@zip.com.au>, <3C151F7B.4412...@zip.com.au>; 
<20011211011158.A4...@athlon.random> <3C15B0B3.13990...@zip.com.au>,
<3C15B0B3.13990...@zip.com.au>; 
from a...@zip.com.au on Mon, Dec 10, 2001 at 11:07:31PM -0800 
<20011211144223.E4...@athlon.random>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Wed, 12 Dec 2001 08:42:10 GMT
Message-ID: <fa.dff1htv.37eu2g@ifi.uio.no>
References: <fa.gl1dfbv.26q61b@ifi.uio.no>
Lines: 75

Andrea Arcangeli wrote:
> 
> 
> [ big snip.  Addressed in other email ]
> 
> it should be simple, mainline swapouts more, so it's less likely to
> trash away some useful cache.
> 
> just try -aa after a:
> 
>         echo 10 >/proc/sys/vm/vm_mapped_ratio
> 
> it should swapout more and better preserve the cache.

-aa swapout balancing seems very good indeed to me.

> > > > In my swapless testing, I burnt HUGE amounts of CPU in flush_tlb_others().
> > > > So we're madly trying to swap pages out and finding that there's no swap
> > > > space.  I beleive that when we find there's no swap left we should move
> > > > the page onto the active list so we don't keep rescanning it pointlessly.
> > >
> > > yes, however I think the swap-flood with no swap isn't a very
> > > interesting case to optimize.
> >
> > Running swapless is a valid configuration, and the kernel is doing
> 
> I'm not saying it's not valid or non interesting.
> 
> It's the mix "I'm running out of memory and I'm swapless" that is the
> case not interesting to optimize.
> 
> If you're swapless it means you've enough memory and that you're not
> running out of swap. Otherwise _you_ (not the kernel) are wrong not
> having swap.

um.  Spose so.
 
> ...
> 
> > The VM code lacks comments, and nobody except yourself understands
> > what it is supposed to be doing.  That's a bug, don't you think?
> 
> Lack of documentation is not a bug, period. Also it's not true that I'm
> the only one who understands it. For istance Linus understand it
> completly, I am 100% sure.
> 
> Anyways I wrote a dozen of slides on the VM with some graph showing the
> design of the VM if anybody can better learn from a slide than from the
> code.

That's good.  Your elevator design slides were very helpful.  However
offline documentation tends to go stale.   A nice big block comment
maintained by a programmer who cares goes a loooog way.

> I believe the slides are useful to understand the design, but if you
> want to change one line of code slides or not you've to read the code.
> Everybody is complaining about documentation. This is a red-herring.
> There's no documentation that allows you to hack the previous VM code.
> I'd ask how many of the people happy with the previous documentation
> were effectively VM developers. Except for some possible misleading
> comment in the current code that we may have not updated yet, I don't
> think there's been a regression in documentation.
> 

Sigh.  Just because the current core kernel looks like it was
scrawled in crayon by an infant doesn't mean that everyone has
to eschew literate, mature, competent and maintainable programming
practices.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!news2.dg.net.ua!
bn.utel.com.ua!carrier.kiev.ua!horse.lucky.net!carrier.kiev.ua!
solar.carrier.kiev.ua!not-for-mail
From: Andrew Morton <a...@zip.com.au>
Newsgroups: lucky.linux.kernel
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Wed, 12 Dec 2001 08:48:25 +0000 (UTC)
Organization: unknown
Lines: 159
Sender: n...@solar.carrier.kiev.ua
Approved: newsmas...@lucky.net
Message-ID: <3C1718E1.C22141B3@zip.com.au>
References: <3C15B0B3.1399043B@zip.com.au> 
<Pine.LNX.4.33L.0112111130110.4079-100000@imladris.surriel.com>,
NNTP-Posting-Host: solar.carrier.kiev.ua
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: solar.carrier.kiev.ua 1008146905 16932 193.193.193.124 (12 Dec 2001 08:48:25 GMT)
X-Complaints-To: usenet@solar.carrier.kiev.ua
NNTP-Posting-Date: Wed, 12 Dec 2001 08:48:25 +0000 (UTC)
X-Authentication-Warning: vasquez.zip.com.au: Host r...@zipperii.zip.com.au 
[61.8.0.87] claimed to be zip.com.au
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.17-pre8 i686)
X-Accept-Language: en
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Andrea Arcangeli

Andrea Arcangeli wrote:
> 
> On Tue, Dec 11, 2001 at 11:32:25AM -0200, Rik van Riel wrote:
> > On Mon, 10 Dec 2001, Andrew Morton wrote:
> >
> > > This test on a 64 megabyte machine, on ext2:
> > >
> > >     time (tar xfz /nfsserver/linux-2.4.16.tar.gz ; sync)
> > >
> > > On 2.4.17-pre7 it takes 21 seconds.  On -aa it is much slower: 36 seconds.
> >
> > > Execution time for `make -j12 bzImage' on a 64meg RAM/512 meg swap
> > > dual x86:
> > >
> > > -aa:                                        4 minutes 20 seconds
> > > 2.4.7-pre8                          4 minutes 8 seconds
> > > 2.4.7-pre8 plus the below patch:    3 minutes 55 seconds
> >
> >
> > Andrea, it seems -aa is not the holy grail VM-wise. If you want
> 
> it may be not a holy grail in swap benchmarks and flood of writes to
> disk, those are minor performance regressions, but I have no one single
> bug report related to "stability".

Your patch increases the time to untar a kernel tree by seventy five
percent.  That's a fairly major minor regression.

> The only thing I got back from Andrew is been "it runs a little slower"
> in those two tests.

The swapstorm I agree is uninteresting.  The slowdown with a heavy write
load impacts a very common usage, and I've told you how to mostly fix
it.  You need to back out the change to bdflush.
 
> and of course he didn't even attempted to benchmark the interactive
> feeling that was the _whole_ point of my buffer.c and elevator changes.

As far as I know, at no point in time have you told anyone that
this was an objective of your latest patch.  So of course I
didn't test for it.

Interactivity is indeed improved.  It has gone from catastrophic to
horrid.

There are four basic tests I use to quantify this, all with 64 megs of
memory:

1: Start a continuous write, and on a different partition, time how
   long it takes to read a 16 megabyte file.

   Here, -aa takes 40 seconds.  Stock 2.4.17-pre8 takes 71 seconds.
   2.4.17-pre8 with the same elevator settings as in -aa takes
   40 seconds.

   Large writes are slowing reads by a factor of 100.

2: Start a continuous write and, from another machine, run

	time ssh -X otherhost xterm -e true

   On -aa this takes 68 seconds.  On 2.4.17-pre8 it takes over
   three minutes.  I got bored and killed it.  The problem can't
   be fixed on 2.4.17-pre8 with tuning - it's probably due to the
   poor page replacement - stuff is getting swapped out.  This is
   a significant problem in 2.4.17-pre and we need a fix for it.

3: Run `cp -a linux/ junk'.  Time how long it takes to read a 16 meg file.

   There's no appreciable difference between any of the kernels here.
   It varies from 2 seconds to 10, and is generally OK.

4:  Run `cp -a linux/ junk'.  time ssh -X otherhost xterm -e true

   Varies between three and five seconds, depending on elvtune settings.
   No noticeable difference between any kernels.

It's tests 1 and 2 which are interesting, because we perform so
very badly.  And no amount of fiddling buffer.c or elvtune settings
is going to fix it, because they don't address the core problem.

Which is: when the elevator can't merge a read it sticks it at the
end of the request queue, behind all the writes.

I'll be submitting a little patch for 2.4.18-pre which allows the user
to tunably promote reads ahead of most of the writes.  It improves
tests 1 and 2 by a factor of eight to twelve.

> So as far as I'm concerned 2.4.15aa1 and 2.4.17pre?aa? are just rock
> solid and usable in production.

I haven't done much stability testing - without a description of what the
changes are trying to do, I can't test them - all I could do is blindly
run stress tests and I'm sure your QA team can do that as well as I,
on bigger boxes.

But I don't doubt that it's stable.   However Red Hat's QA guys are
pretty good at knocking kernels over...

gargh.  Ninety seconds of bash-shared-mapping and I get "end-request:
buffer-list destroyed" against the swap device.  Borked IDE driver.
Seems stable on SCSI.

The -aa VM is still a little prone to tossing out "0-order allocation
failures" when there's tons of swap available and when much memory
is freeable by dropping or writing back to shared mappings.  But
this doesn't seem to cause any problems, as long as there's some
memory available for atomic allocations, and I never saw free
memory go below 800 kbytes...

> We'll keep doing background benchmarking and changes that cannot
> affect stability, but the core design is finished as far I can tell.

We'll know when it gets wider testing in the runup to 2.4.18.  The
fact that I found a major (although easily fixed) performance problem
in the first ten minutes indicates that caution is needed, yes?

What's the thinking with the changes to dcache/icache flushing?
A single d/icache entry can save three seeks, which is _enormous_ value for
just a few hundred bytes of memory.  You appear to be shrinking the i/dcache
by 12% each time you try to swap out or evict 32 pages.   What this means
is that as soon we start to get a bit short on memory, the i/dcache vanishes.
And it takes ages to read that stuff back in.  How did you test this?  Without
having done (or even devised) any quantitative testing myself, I have a gut
feel that we need to preserve the i/dcache (versus file data) much more than
this.



Oh.  Maybe the core design (whatever it is :)) is not finished,
because it retains the bone-headed, dumb-to-the-point-of-astonishing
misfeature which Linux VM has always had:

If someone is linearly writing (or reading) a gigabyte file on a 64
megabyte box they *don't* want the VM to evict every last little scrap
of cache on behalf of data which they *obviously* do not want
cached.

It's good that -aa VM doesn't summarily dump the i/dcache and plonk
everything you want into swap when this happens.  Progress.


So. To summarise.

- Your attempt to address read latencies didn't work out, and should
  be dropped (hopefully Marcelo and Jens are OK with an elevator hack :))

- We urgently need a fix for 2.4.17's page replacement problems.  
 
- aa is good.  Believe it or not, I like it. The mm/* portions fix
  significant performance problems in our current VM.  I guess we
  should bite the bullet and merge it all in 2.4.18-pre

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

			   USENET Archives


Notice
******

The materials and information included in this website may only be used
for purposes such as criticism, review, private study, scholarship, or 
research.


Electronic mail:			      WorldWideWeb:
   tech-insider@outlook.com		         http://tech-insider.org/