[2.4.17/18pre] VM and swap - it's really unusable

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3C2CD326.100@athlon.maya.org>
Original-Date: 	Fri, 28 Dec 2001 21:16:38 +0100
From: Andreas Hartmann <andihartm...@freenet.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+) Gecko/20011225
X-Accept-Language: en-us
MIME-Version: 1.0
To: Kernel-Mailingliste <linux-ker...@vger.kernel.org>
Subject: [2.4.17/18pre] VM and swap - it's really unusable
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Fri, 28 Dec 2001 20:19:40 GMT
Message-ID: <fa.djbc0rv.1ogs82n@ifi.uio.no>
Lines: 32

Hello all,

Again, I did a rsync-operation as described in
"[2.4.17rc1] Swapping" MID <3C1F4014.2010...@athlon.maya.org>.

This time, the kernel had a swappartition which was about 200MB. As the 
swap-partition was fully used, the kernel killed all processes of knode.
Nearly 50% of RAM had been used for buffers at this moment. Why is there 
so much memory used for buffers?

I know I repeat it, but please:

	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
	the same operation!

Please consider that I'm using 512 MB of RAM. This should, or better: 
must be enough to do the rsync-operation nearly without any swapping - 
kernel 2.2.19 does it!

The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
because the machine swaps nearly nonstop.


Regards,
Andreas Hartmann

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news.tele.dk!
small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Fri, 28 Dec 2001 18:32:12 -0200 (BRST)
From: Rik van Riel <r...@conectiva.com.br>
X-X-Sender:  <r...@duckman.distro.conectiva>
To: Andreas Hartmann <andihartm...@freenet.de>
Cc: <linux-ker...@vger.kernel.org>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
In-Reply-To: <3C2CD326.100@athlon.maya.org>
Original-Message-ID: <Pine.LNX.4.33L.0112281827000.12225-100000@duckman.distro.conectiva>
X-spambait: aardv...@kernelnewbies.org
X-spammeplease: 	aardv...@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Fri, 28 Dec 2001 20:33:52 GMT
Message-ID: <fa.ookmi1v.1hmtj7@ifi.uio.no>
References: <fa.djbc0rv.1ogs82n@ifi.uio.no>
Lines: 29

On Fri, 28 Dec 2001, Andreas Hartmann wrote:

> 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
> 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
> 	the same operation!

If you feel adventurous you can try my rmap based
VM, the latest version is on:

	http://surriel.com/patches/2.4/2.4.17-rmap-8

This VM should behave a bit better (it does on my machines),
but isn't yet bug-free enough to be used on production machines.
Also, the changes it introduces are, IMHO, too big for a stable
kernel series ;)

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3C2CE373.3000806@athlon.maya.org>
Original-Date: 	Fri, 28 Dec 2001 22:26:11 +0100
From: Andreas Hartmann <andihartm...@freenet.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+) Gecko/20011225
X-Accept-Language: en-us
MIME-Version: 1.0
To: Andrew Morton <a...@zip.com.au>
CC: Kernel-Mailingliste <linux-ker...@vger.kernel.org>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Original-References: <3C2CD326....@athlon.maya.org> <3C2CD9EC.1D6C7...@zip.com.au>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Fri, 28 Dec 2001 21:34:08 GMT
Message-ID: <fa.ebv0e5v.10385b4@ifi.uio.no>
References: <fa.e1utl5v.2n6spf@ifi.uio.no>
Lines: 48

Andrew Morton wrote:

> Andreas Hartmann wrote:
> 
>>Hello all,
>>
>>Again, I did a rsync-operation as described in
>>"[2.4.17rc1] Swapping" MID <3C1F4014.2010...@athlon.maya.org>.
>>
>>This time, the kernel had a swappartition which was about 200MB. As the
>>swap-partition was fully used, the kernel killed all processes of knode.
>>Nearly 50% of RAM had been used for buffers at this moment. Why is there
>>so much memory used for buffers?
>>
> 
> It's very strange.  The large amount of buffercache usage is to
> be expected from statting 20 gigs worth of files, but the kernel
> should (and normally does) free up that memory on demand.
> 
> Which filesystem(s) are you using?
> 
> Are you using NFS/NBD/SMBFS or anything like that?
> 

Basically, I'm using NFS and reiserfs. But I didn't use any file on NFS 
since the last reboot - and the NFS-shares haven't been mounted.

There are 2 IDE-Harddisks in this machine:
hda: WDC WD205AA, ATA DISK drive (40079088 sectors (20520 MB) w/2048KiB
				  cache, CHS=2494/255/63, UDMA(66))
hdb: WDC WD450AA-00BAA0, ATA DISK drive (87930864 sectors (45021 MB)
					w/2048KiB Cache,
					CHS=5473/255/63, UDMA(66))

On hda, I have got 7 partitions (plus one little "boot"-partition, which 
isn't mounted and a 200MB swap partition).
On hdb, I have got 12 partitions and one more, meanwhile 1GB swap partition.
All partitions are formated with reiserfs.

Regards,
Andreas Hartmann

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news.tele.dk!
small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
To: andihartm...@freenet.de (Andreas Hartmann)
Original-Date: 	Sat, 29 Dec 2001 00:30:51 +0000 (GMT)
Cc: linux-ker...@vger.kernel.org (Kernel-Mailingliste)
In-Reply-To: <3C2CD326.100@athlon.maya.org> from "Andreas Hartmann" at Dec 28, 2001 09:16:38 PM
X-Mailer: ELM [version 2.5 PL6]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Original-Message-Id: <E16K7Om-0002QI-00@the-village.bc.nu>
From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Sat, 29 Dec 2001 00:21:49 GMT
Message-ID: <fa.g0136fv.n4eo16@ifi.uio.no>
References: <fa.djbc0rv.1ogs82n@ifi.uio.no>
Lines: 13

> 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
> 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
> 	the same operation!
> The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
> because the machine swaps nearly nonstop.

Does the 2.4.9 Red Hat kernel (if yoiu are using RH) or 2.4.12-ac8 show the 
same problem ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3C2DC1AA.2070106@athlon.maya.org>
Original-Date: 	Sat, 29 Dec 2001 14:14:18 +0100
From: Andreas Hartmann <andihartm...@freenet.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+) Gecko/20011225
X-Accept-Language: en-us
MIME-Version: 1.0
To: Kernel-Mailingliste <linux-ker...@vger.kernel.org>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Original-References: <3C2CD326....@athlon.maya.org>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Sat, 29 Dec 2001 13:18:13 GMT
Message-ID: <fa.eaucgmv.133soom@ifi.uio.no>
References: <fa.djbc0rv.1ogs82n@ifi.uio.no>
Lines: 67

Andreas Hartmann wrote:

> Hello all,
> 
> Again, I did a rsync-operation as described in
> "[2.4.17rc1] Swapping" MID <3C1F4014.2010...@athlon.maya.org>.
> 

Some other examples:
I just did a
cp -Rd linux-2.4.16 linux-2.4.17
(with object-files). Before starting this action, I had about 120 MB of 
free RAM. During copying - I did nothing else meanwhile, there was 2MB 
swap used - and 12 MB of RAM were free. The biggest part of memory was 
used for caching - what is ok.
After copying, only 10 MB of memory have been given free again. There 
have been 490MB of RAM used now (nearly most for caching).

Outgoing from this situation, I started another little cp-action:
cp -Rd linux-2.4.18pre1 linux-2.4.test
(again including object files).
Result: the swap usage stayed nearly constant, neverthless there have 
been 6 accesses to swap.

Now, I deleted the linux-2.4.test-directory with
rm -R linux-2.4.test
This action was very fast (approximately 1s).

Afterwards, a big part of the cache memory has been given free (about 
100MB). Now, 122MB of RAM have been free again.

Next example (running after the last):
SuSE run-crons have been running. This means:
-> updatedb
-> sort
-> frcode
-> find
-> mandb

47MB swap used, 2/3 of memory is used for buffers (Don't forget: I've 
got 512MB of RAM) and about 30MB of RAM are free.

My observation:
Why does the kernel swap to get free memory for caching / buffering? I 
can't see any sense in this action. Wouldn't it be better to shrink the 
cashing / buffering-RAM to the amount of memory, which is obviously free?

Swapping should be principally used, if the RAM ends for real memory 
(memory, which is used for running applications). First of all, the 
memory-usage of cache and buffers should be reduced before starting to 
swap IMHO.

Or would it be possible, to implement more than one swapping strategy, 
which could be configured during make menuconfig? This would give the 
user the chance to find the best swapping strategy for his purpose.

Regards,
Andreas Hartmann

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Thu, 3 Jan 2002 14:23:01 -0600
From: Ken Brownfield <brown...@irridia.com>
To: Andreas Hartmann <andihartm...@freenet.de>
Cc: Kernel-Mailingliste <linux-ker...@vger.kernel.org>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Original-Message-ID: <20020103142301.C4759@asooo.flowerfire.com>
Original-References: <3C2CD326....@athlon.maya.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
In-Reply-To: <3C2CD326.100@athlon.maya.org>; 
from andihartmann@freenet.de on Fri, Dec 28, 2001 at 09:16:38PM +0100
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 3 Jan 2002 20:24:43 GMT
Message-ID: <fa.j74mi6v.1e5qhho@ifi.uio.no>
References: <fa.djbc0rv.1ogs82n@ifi.uio.no>
Lines: 78

Unfortunately, I lost the response that basically said "2.4 looks stable
to me", but let me count the ways in which I agree with Andreas'
sentiment:

	A) VM has major issues
		1) about a dozen recent OOPS reports in VM code
		2) VM falls down on large-memory machines with a
		   high inode count (slocate/updatedb, i/dcache)
		3) Memory allocation failures and OOM triggers
		   even though caches remain full.
		4) Other bugs fixed in -aa and others
	B) Live- and dead-locks that I'm seeing on all 2.4 production
	   machines > 2.4.9, possibly related to A.  But how will I
	   ever find out?
	C) IO-APIC code that requires noapic on any and all SMP
	   machines that I've ever run on.

I don't have anything against anyone here -- I think everyone is doing a
fine job.  It's an issue of acceptance of the problem and focus.  These
issues are all showstoppers for me, and while I don't represent the 90%
of the Linux market that is UP desktops, IMHO future work on the kernel
will be degraded by basic functionality that continues to cause
problems.

I think seeing some of Andrea's and Andrew's et al patches actually
*happen* would be a good thing, since 2.4 kernels are decidedly not
ready for production here.  I am forced to apply 26 distinct patch sets
to my kernels, and I am NOT the right person to make these judgements.
Which is why I was interested in an LKML summary source, though I
haven't yet had a chance to catch up on that thread of comment.

Having a glitch in the radeon driver is one thing; having persistent,
fatal, and reproducable failures in universal kernel code is entirely
another.

-- 
Ken.
brown...@irridia.com

On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote:
| Hello all,
| 
| Again, I did a rsync-operation as described in
| "[2.4.17rc1] Swapping" MID <3C1F4014.2010...@athlon.maya.org>.
| 
| This time, the kernel had a swappartition which was about 200MB. As the 
| swap-partition was fully used, the kernel killed all processes of knode.
| Nearly 50% of RAM had been used for buffers at this moment. Why is there 
| so much memory used for buffers?
| 
| I know I repeat it, but please:
| 
| 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
| 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
| 	the same operation!
| 
| Please consider that I'm using 512 MB of RAM. This should, or better: 
| must be enough to do the rsync-operation nearly without any swapping - 
| kernel 2.2.19 does it!
| 
| The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
| because the machine swaps nearly nonstop.
| 
| 
| Regards,
| Andreas Hartmann
| 
| -
| To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| the body of a message to majord...@vger.kernel.org
| More majordomo info at  http://vger.kernel.org/majordomo-info.html
| Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news.tele.dk!
small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Thu, 3 Jan 2002 18:50:10 -0200 (BRST)
From: Rik van Riel <r...@conectiva.com.br>
X-X-Sender:  <r...@imladris.surriel.com>
To: Ken Brownfield <brown...@irridia.com>
Cc: Andreas Hartmann <andihartm...@freenet.de>,
        Kernel-Mailingliste <linux-ker...@vger.kernel.org>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
In-Reply-To: <20020103142301.C4759@asooo.flowerfire.com>
Original-Message-ID: <Pine.LNX.4.33L.0201031848060.24031-100000@imladris.surriel.com>
X-spambait: aardv...@kernelnewbies.org
X-spammeplease: 	aardv...@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 3 Jan 2002 20:52:01 GMT
Message-ID: <fa.o032nuv.u2o3gi@ifi.uio.no>
References: <fa.j74mi6v.1e5qhho@ifi.uio.no>
Lines: 37

On Thu, 3 Jan 2002, Ken Brownfield wrote:

> 	A) VM has major issues
> 		1) about a dozen recent OOPS reports in VM code
> 		2) VM falls down on large-memory machines with a
> 		   high inode count (slocate/updatedb, i/dcache)
> 		3) Memory allocation failures and OOM triggers
> 		   even though caches remain full.
> 		4) Other bugs fixed in -aa and others
> 	B) Live- and dead-locks that I'm seeing on all 2.4 production
> 	   machines > 2.4.9, possibly related to A.  But how will I
> 	   ever find out?

I've spent ages trying to fix these bugs in the -ac kernel,
but they got all backed out in search of better performance.

Right now I'm developing a VM again, but I have no interest
at all in fixing the livelocks in the main kernel, they'll
just get removed again after a while.

If you want to test my VM stuff, you can get patches from
http://surriel.com/patches/ or direct access at the bitkeeper
tree on http://linuxvm.bkbits.net/

cheers,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!cpk-news-hub1.bbnplanet.com!news.gtei.net!
newsfeed1.cidera.com!Cidera!news2.dg.net.ua!bn.utel.com.ua!
carrier.kiev.ua!not-for-mail
From: Dieter =?iso-8859-15?q?N=FCtzel?= <Dieter.Nuet...@hamburg.de>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Tue, 8 Jan 2002 03:06:25 +0000 (UTC)
Organization: DN
Lines: 25
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <20020108030420Z287595-13997+1799@vger.kernel.org>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: text/plain;
Content-Transfer-Encoding: 8bit
X-Trace: horse.lucky.net 1010459185 89122 193.193.193.118 (8 Jan 2002 03:06:25 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Tue, 8 Jan 2002 03:06:25 +0000 (UTC)
X-Mailer: KMail [version 1.3.2]
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Marcelo Tosatti

Is it possible to decide, now what should go into 2.4.18 (maybe -pre3) -aa or 
-rmap?
Andrew Morten`s read-latency.patch is a clear winner for me, too.
What about 00_nanosleep-5 and bootmem?
The O(1) scheduler?
Maybe preemption? It is disengageable so nobody should be harmed but we get 
the chance for wider testing.

Any comments?

Thanks,
	Dieter

-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuet...@hamburg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!hub1.nntpserver.com!news-out.spamkiller.net!
propagator-la!news-in-la.newsfeeds.com!news-in.superfeed.net!
news.exit.com!gehenna.pell.portland.or.us!nntp-server.caltech.edu!
nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Tue, 8 Jan 2002 11:55:59 +0100 (CET)
From: Luigi Genoni <ker...@Expansa.sns.it>
X-To: Dieter =?iso-8859-15?q?N=FCtzel?= <Dieter.Nuet...@hamburg.de>
X-cc: Marcelo Tosatti <marc...@conectiva.com.br>,
        Andrea Arcangeli <and...@suse.de>,
        Rik van Riel <r...@conectiva.com.br>,
        Linux Kernel List <linux-ker...@vger.kernel.org>,
        Andrew Morton <a...@zip.com.au>, Robert Love <r...@tech9.net>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Message-ID: <linux.kernel.Pine.LNX.4.33.0201081153310.29480-100000@Expansa.sns.it>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=iso-8859-1
Approved: n...@nntp-server.caltech.edu
Lines: 24



On Tue, 8 Jan 2002, Dieter [iso-8859-15] Nützel wrote (passim):

> Is it possible to decide, now what should go into 2.4.18 (maybe -pre3) -aa or
> -rmap?
[...]
> Maybe preemption? It is disengageable so nobody should be harmed but we get
> the chance for wider testing.
>
> Any comments?
preemption?? this is eventually 2.5 stuff, and should not be integrated
into 2.4 stable tree. Of course a backport is possible, when/if it will be
quite well tested and well working on 2.5





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com!
newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us!
nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Tue, 8 Jan 2002 14:21:17 +0100
From: Andrea Arcangeli <and...@suse.de>
X-To: Luigi Genoni <ker...@Expansa.sns.it>
X-Cc: Dieter =?iso-8859-1?Q?N=FCtzel?= <Dieter.Nuet...@hamburg.de>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        Rik van Riel <r...@conectiva.com.br>,
        Linux Kernel List <linux-ker...@vger.kernel.org>,
        Andrew Morton <a...@zip.com.au>, Robert Love <r...@tech9.net>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Message-ID: <linux.kernel.20020108142117.F3221@inspiron.school.suse.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Approved: n...@nntp-server.caltech.edu
Lines: 50

On Tue, Jan 08, 2002 at 11:55:59AM +0100, Luigi Genoni wrote:
> 
> 
> On Tue, 8 Jan 2002, Dieter [iso-8859-15] Nützel wrote (passim):
> 
> > Is it possible to decide, now what should go into 2.4.18 (maybe -pre3) -aa or
> > -rmap?
> [...]
> > Maybe preemption? It is disengageable so nobody should be harmed but we get
> > the chance for wider testing.
> >
> > Any comments?
> preemption?? this is eventually 2.5 stuff, and should not be integrated

indeed ("eventually" in the italian sense btw, obvious to me, but not
for l-k).

I'm not against preemption (I can see the benefits about the mean
latency for real time DSP) but the claims about preemption making the
kernel faster doesn't make sense to me. more frequent scheduling,
overhead of branches in the locks (you've to conditional_schedule after
the last preemption lock is released and the cachelines for the per-cpu
preemption locks) and the other preemption stuff can only make the
kernel slower.  Furthmore for multimedia playback any sane kernel out
there with lowlatency fixes applied will work as well as a preemption
kernel that pays for all the preemption overhead.

About the other claim that as the kernel becomes more granular
performance will increase with preemption in kernel, that's obviously
wrong as well, it's clearly the other way around. Maybe it was meant
"latency will decrease further", that's right, but also performance will
decrease if something.

So yes, mean latency will decrease with preemptive kernel, but your CPU
is definitely paying something for it.

> into 2.4 stable tree. Of course a backport is possible, when/if it will be
> quite well tested and well working on 2.5
> 
> 
> 
> 

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news.tele.dk!small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Wed, 9 Jan 2002 00:33:35 +1100
From: Anton Blanchard <an...@samba.org>
To: Andrea Arcangeli <and...@suse.de>
Cc: Luigi Genoni <ker...@Expansa.sns.it>,
        Dieter N?tzel <Dieter.Nuet...@hamburg.de>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        Rik van Riel <r...@conectiva.com.br>,
        Linux Kernel List <linux-ker...@vger.kernel.org>,
        Andrew Morton <a...@zip.com.au>, Robert Love <r...@tech9.net>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Original-Message-ID: <20020108133335.GB26307@krispykreme>
Original-References: <20020108030420Z287595-13997+1...@vger.kernel.org> 
<Pine.LNX.4.33.0201081153310.29480-100...@Expansa.sns.it> 
<20020108142117.F3...@inspiron.school.suse.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20020108142117.F3221@inspiron.school.suse.de>
User-Agent: Mutt/1.3.25i
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 8 Jan 2002 13:39:05 GMT
Message-ID: <fa.gpe55mv.ajebbs@ifi.uio.no>
References: <fa.i5nsc8v.5m6fgr@ifi.uio.no>
Lines: 13

 
> So yes, mean latency will decrease with preemptive kernel, but your CPU
> is definitely paying something for it.

And Andrew Morton's work suggests he can do a much better job of
reducing latency than -preempt.

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!
news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!not-for-mail
From: Daniel Phillips <phill...@bonn-fries.net>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Tue, 8 Jan 2002 14:59:47 +0000 (UTC)
Organization: unknown
Lines: 40
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <E16Nxjg-00009W-00@starship.berlin>
References: <20020108030420Z287595-13997+1799@vger.kernel.org> 
<20020108142117.F3221@inspiron.school.suse.de> <20020108133335.GB26307@krispykreme>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7BIT
X-Trace: horse.lucky.net 1010501987 36212 193.193.193.118 (8 Jan 2002 14:59:47 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Tue, 8 Jan 2002 14:59:47 +0000 (UTC)
X-Mailer: KMail [version 1.3.2]
In-Reply-To: <20020108133335.GB26307@krispykreme>
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Anton Blanchard

On January 8, 2002 02:33 pm, Anton Blanchard wrote:
> Andrea Arcangeli [apparently] wrote:
> > So yes, mean latency will decrease with preemptive kernel, but your CPU
> > is definitely paying something for it.
> 
> And Andrew Morton's work suggests he can do a much better job of
> reducing latency than -preempt.

That's not a particularly clueful comment, Anton.  Obviously, any 
latency-busting hacks that Andrew does could also be patched into a
-preempt kernel.

What a preemptible kernel can do that a non-preemptible kernel can't is: 
reschedule exactly as often as necessary, instead of having lots of extra 
schedule points inserted all over the place, firing when *they* think the 
time is right, which may well be earlier than necessary.

The preemptible approach is much less of a maintainance headache, since 
people don't have to be constantly doing audits to see if something changed, 
and going in to fiddle with scheduling points.

Finally, with preemption, rescheduling can be forced with essentially zero 
latency in response to an arbitrary interrupt such as IO completion, whereas 
the non-preemptive kernel will have to 'coast to a stop'.  In other words, 
the non-preemptive kernel will have little lags between successive IOs, 
whereas the preemptive kernel can submit the next IO immediately.  So there 
are bound to be loads where the preemptive kernel turns in better latency 
*and throughput* than the scheduling point hack.

Mind you, I'm not devaluing Andrew's work, it's good and valuable.  However 
it's good to be aware of why that approach can't equal the latency-busting 
performance of the preemptive approach.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!
news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!news.tele.dk!
small.news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!
internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
X-Authentication-Warning: vasquez.zip.com.au: 
Host r...@zipperii.zip.com.au [61.8.0.87] claimed to be zip.com.au
Original-Message-ID: <3C3B4CB7.FEAAF5FC@zip.com.au>
Original-Date: 	Tue, 08 Jan 2002 11:47:03 -0800
From: Andrew Morton <a...@zip.com.au>
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.18pre1 i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Daniel Phillips <phill...@bonn-fries.net>
CC: Anton Blanchard <an...@samba.org>, Andrea Arcangeli <and...@suse.de>,
        Luigi Genoni <ker...@Expansa.sns.it>,
        Dieter N?tzel <Dieter.Nuet...@hamburg.de>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        Rik van Riel <r...@conectiva.com.br>,
        Linux Kernel List <linux-ker...@vger.kernel.org>,
        Robert Love <r...@tech9.net>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Original-References: <20020108030420Z287595-13997+1...@vger.kernel.org> 
<20020108142117.F3...@inspiron.school.suse.de> <20020108133335.GB26307@krispykreme>,
<20020108133335.GB26307@krispykreme> <E16Nxjg-00009W...@starship.berlin>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Tue, 8 Jan 2002 19:54:56 GMT
Message-ID: <fa.do1pjuv.1oguu2j@ifi.uio.no>
References: <fa.hlj1q9v.1akmgbl@ifi.uio.no>
Lines: 87

Daniel Phillips wrote:
> 
> On January 8, 2002 02:33 pm, Anton Blanchard wrote:
> > Andrea Arcangeli [apparently] wrote:
> > > So yes, mean latency will decrease with preemptive kernel, but your CPU
> > > is definitely paying something for it.
> >
> > And Andrew Morton's work suggests he can do a much better job of
> > reducing latency than -preempt.
> 
> That's not a particularly clueful comment, Anton.  Obviously, any
> latency-busting hacks that Andrew does could also be patched into a
> -preempt kernel.

Yes.  The important part is the implicit dropping of the BKL across
schedule().

> What a preemptible kernel can do that a non-preemptible kernel can't is:
> reschedule exactly as often as necessary, instead of having lots of extra
> schedule points inserted all over the place, firing when *they* think the
> time is right, which may well be earlier than necessary.

Nope.  `if (current->need_resched)' -> the time is right (beyond right,
actually).

> The preemptible approach is much less of a maintainance headache, since
> people don't have to be constantly doing audits to see if something changed,
> and going in to fiddle with scheduling points.

Except it doesn't work.  The full-on low-latency patch has ~60 rescheduling
points.  Of these, ~40 involve popping spinlocks.  Really, the only significant
latency sources which the preemptible kernel solves are generic_file_read()
and generic_file_write().

So preemptible kernel needs lock-break to be useful.  And then it's basically
the same thing, with the same maintainability problems.  And believe me, these
are considerable.  Mainly because the areas which needs busting up exactly
coincide with the areas where there has been most churn in the kernel.

> Finally, with preemption, rescheduling can be forced with essentially zero
> latency in response to an arbitrary interrupt such as IO completion, whereas
> the non-preemptive kernel will have to 'coast to a stop'.  In other words,
> the non-preemptive kernel will have little lags between successive IOs,
> whereas the preemptive kernel can submit the next IO immediately.  So there
> are bound to be loads where the preemptive kernel turns in better latency
> *and throughput* than the scheduling point hack.

Latency yes.  Throughout no.

I don't think the "preempt slows down the kernel" argument is very valid
really.  Let's invert the argument - Linux is multitasking, and that has a
cost.  There's no reason why certain bits of the kernel need to violate that
just to get a bit more throughput.  If it really worries you, set HZ=10 and
increase all the timeslices, etc.

Now, there *may* be overheads added due to losing the implicit locking which
per-CPU data gives you.

The main cost of preempt IMO is in complexity and stability risks.

(BTW: I took a weird oops testing the preempt patch on an SMP NFS client.
The fault address was 0x0aXXXXXX.  No useful backtrace, unfortunately).

> Mind you, I'm not devaluing Andrew's work, it's good and valuable.  However
> it's good to be aware of why that approach can't equal the latency-busting
> performance of the preemptive approach.

There's no point in just merging the preempt patch and saying "there,
that's done".  It doesn't do anything.

Instead, a decision needs to be made: "Linux will henceforth be a 
low-latency kernel".  Now, IF we can come to this decision, then
internal preemption is the way to do it.  But it affects ALL kernel
developers.  Because we'll need to introduce a new rule: "it is a
bug to spend more than five milliseconds holding any locks".

So.  Do we we want a low-latency kernel?  Are we prepared to mandate
the five-millisecond rule?   It can be done, but won't be easy, and
we'll never get complete coverage.  But I don't see the will around
here.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!128.230.129.106!news.maxwell.syr.edu!
netnews.com!xfer02.netnews.com!newsfeed1.cidera.com!Cidera!
news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!not-for-mail
From: Marcelo Tosatti <marc...@conectiva.com.br>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Tue, 8 Jan 2002 15:15:42 +0000 (UTC)
Organization: unknown
Lines: 42
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <Pine.LNX.4.21.0201081153160.19178-100000@freak.distro.conectiva>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Content-Transfer-Encoding: 8BIT
X-Trace: horse.lucky.net 1010502942 37547 193.193.193.118 (8 Jan 2002 15:15:42 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Tue, 8 Jan 2002 15:15:42 +0000 (UTC)
In-Reply-To: <20020108030431.0099F38C58@perninha.conectiva.com.br>
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Dieter =?iso-8859-15?q?N=FCtzel?=

On Tue, 8 Jan 2002, Dieter [iso-8859-15] Nützel wrote:

> Is it possible to decide, now what should go into 2.4.18 (maybe -pre3) -aa or 
> -rmap?

-rmap is 2.5 stuff. 

I would really like to integrate -aa stuff as soon as I can understand
_why_ Andrea is doing those changes.

Note that people will _always_ complain about VM: It will always be
possible to optimize it to some case and cause harm to other cases.

I'm not saying that VM is perfect right now: It for sure has problems.

> Andrew Morten`s read-latency.patch is a clear winner for me, too.

AFAIK Andrew's code simply adds schedule points around the kernel, right? 

If so, nope, I do not plan to integrate it.

> What about 00_nanosleep-5 and bootmem?

What is 00_nanosleep-5 and bootmem ? 

> The O(1) scheduler?

2.5 stuff.

> Maybe preemption? It is disengageable so nobody should be harmed but we get 
> the chance for wider testing.

2.5 too.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com!
newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us!
nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
X-To: marc...@conectiva.com.br (Marcelo Tosatti)
Date: 	Tue, 8 Jan 2002 15:46:20 +0000 (GMT)
X-Cc: Dieter.Nuet...@hamburg.de (Dieter =?iso-8859-15?q?N=FCtzel?=),
        and...@suse.de (Andrea Arcangeli),
        r...@conectiva.com.br (Rik van Riel),
        linux-ker...@vger.kernel.org (Linux Kernel List),
        a...@zip.com.au (Andrew Morton), r...@tech9.net (Robert Love)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <linux.kernel.E16NySC-0006pc-00@the-village.bc.nu>
From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Approved: n...@nntp-server.caltech.edu
Lines: 18

> > Andrew Morten`s read-latency.patch is a clear winner for me, too.
> 
> AFAIK Andrew's code simply adds schedule points around the kernel, righ=
> t?=20
> 
> If so, nope, I do not plan to integrate it.

Yep. It has the most wonderful effect on system latency without actually
breaking any semantics. Pre-empt is a trickier one because it does change
actual behaviour a lot more, although it should be preserving locking
rules.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!news2.dg.net.ua!
bn.utel.com.ua!carrier.kiev.ua!not-for-mail
From: Andrea Arcangeli <and...@suse.de>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Tue, 8 Jan 2002 15:34:20 +0000 (UTC)
Organization: unknown
Lines: 62
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <20020108162930.E1894@inspiron.school.suse.de>
References: <20020108030420Z287595-13997+1799@vger.kernel.org> 
<20020108142117.F3221@inspiron.school.suse.de> 
<20020108133335.GB26307@krispykreme> <E16Nxjg-00009W-00@starship.berlin>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: horse.lucky.net 1010504060 39295 193.193.193.118 (8 Jan 2002 15:34:20 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Tue, 8 Jan 2002 15:34:20 +0000 (UTC)
Content-Disposition: inline
User-Agent: Mutt/1.3.12i
In-Reply-To: <E16Nxjg-00009W-00@starship.berlin>; 
from phillips@bonn-fries.net on Tue, Jan 08, 2002 at 04:00:11PM +0100
X-GnuPG-Key-URL: http://e-mind.com/~andrea/aa.gnupg.asc
X-PGP-Key-URL: http://e-mind.com/~andrea/aa.asc
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Daniel Phillips

On Tue, Jan 08, 2002 at 04:00:11PM +0100, Daniel Phillips wrote:
> On January 8, 2002 02:33 pm, Anton Blanchard wrote:
> > Andrea Arcangeli [apparently] wrote:
> > > So yes, mean latency will decrease with preemptive kernel, but your CPU
> > > is definitely paying something for it.
> > 
> > And Andrew Morton's work suggests he can do a much better job of
> > reducing latency than -preempt.
> 
> That's not a particularly clueful comment, Anton.  Obviously, any 
> latency-busting hacks that Andrew does could also be patched into a
> -preempt kernel.
> 
> What a preemptible kernel can do that a non-preemptible kernel can't is: 
> reschedule exactly as often as necessary, instead of having lots of extra 
> schedule points inserted all over the place, firing when *they* think the 
> time is right, which may well be earlier than necessary.

"extra schedule points all over the place", that's the -preempt kernel
not the lowlatency kernel! (on yeah, you don't see them in the source
but ask your CPU if it sees them)

> The preemptible approach is much less of a maintainance headache, since 
> people don't have to be constantly doing audits to see if something changed, 
> and going in to fiddle with scheduling points.

this yes, it requires less maintainance, but still you should keep in
mind the details about the spinlocks, things like the checks the VM does
in shrink_cache are needed also with preemptive kernel.

> Finally, with preemption, rescheduling can be forced with essentially zero 
> latency in response to an arbitrary interrupt such as IO completion, whereas 
> the non-preemptive kernel will have to 'coast to a stop'.  In other words, 
> the non-preemptive kernel will have little lags between successive IOs, 
> whereas the preemptive kernel can submit the next IO immediately.  So there 
> are bound to be loads where the preemptive kernel turns in better latency 
> *and throughput* than the scheduling point hack.

The I/O pipeline is big enough that a few msec before or later in a
submit_bh shouldn't make a difference, the batch logic in the
ll_rw_block layer also try to reduce the reschedule, and last but not
the least if the task is I/O bound preemptive kernel or not won't make
any difference in the submit_bh latency because no task is eating cpu
and latency will be the one of pure schedule call.

> Mind you, I'm not devaluing Andrew's work, it's good and valuable.  However 
> it's good to be aware of why that approach can't equal the latency-busting 
> performance of the preemptive approach.

I also don't want to devaluate the preemptive kernel approch (the mean
latency it can reach is lower than the one of the lowlat kernel, however
I personally care only about worst case latency and this is why I don't
feel the need of -preempt), but I just wanted to make clear that the
idea that is floating around that preemptive kernel is all goodness is
very far from reality, you get very low mean latency but at a price.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newshub2.rdc1.sfba.home.com!news.home.com!
newshub1-work.rdc1.sfba.home.com!gehenna.pell.portland.or.us!
nntp-server.caltech.edu!nntp-server.caltech.edu!mail2news96
Newsgroups: mlist.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
X-To: a...@zip.com.au (Andrew Morton)
Date: 	Tue, 8 Jan 2002 20:13:49 +0000 (GMT)
X-Cc: phill...@bonn-fries.net (Daniel Phillips),
        an...@samba.org (Anton Blanchard), and...@suse.de (Andrea Arcangeli),
        ker...@Expansa.sns.it (Luigi Genoni),
        Dieter.Nuet...@hamburg.de (Dieter N?tzel),
        marc...@conectiva.com.br (Marcelo Tosatti),
        r...@conectiva.com.br (Rik van Riel),
        linux-ker...@vger.kernel.org (Linux Kernel List),
        r...@tech9.net (Robert Love)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Message-ID: <linux.kernel.E16O2d3-0007VF-00@the-village.bc.nu>
From: Alan Cox <a...@lxorguk.ukuu.org.uk>
Approved: n...@nntp-server.caltech.edu
Lines: 20

> low-latency kernel".  Now, IF we can come to this decision, then
> internal preemption is the way to do it.  But it affects ALL kernel

The pre-empt patches just make things much much harder to debug. They
remove some of the predictability and the normal call chain following
goes out of the window because you end up seeing crashes in a thread with
no idea what ran the microsecond before

Some of that happens now but this makes it vastly worse.

The low latency patches don't change the basic predictability and
debuggability but allow you to hit a 1mS pre-empt target for the general
case.

Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!newsfeed1.cidera.com!Cidera!news2.dg.net.ua!
bn.utel.com.ua!carrier.kiev.ua!not-for-mail
From: Daniel Phillips <phill...@bonn-fries.net>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Tue, 8 Jan 2002 15:56:32 +0000 (UTC)
Organization: unknown
Lines: 62
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <E16Nyaf-0000A5-00@starship.berlin>
References: <20020108030420Z287595-13997+1799@vger.kernel.org> 
<E16Nxjg-00009W-00@starship.berlin> <20020108162930.E1894@inspiron.school.suse.de>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7BIT
X-Trace: horse.lucky.net 1010505392 42041 193.193.193.118 (8 Jan 2002 15:56:32 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Tue, 8 Jan 2002 15:56:32 +0000 (UTC)
X-Mailer: KMail [version 1.3.2]
In-Reply-To: <20020108162930.E1894@inspiron.school.suse.de>
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Andrea Arcangeli

On January 8, 2002 04:29 pm, Andrea Arcangeli wrote:
> > The preemptible approach is much less of a maintainance headache, since 
> > people don't have to be constantly doing audits to see if something changed, 
> > and going in to fiddle with scheduling points.
> 
> this yes, it requires less maintainance, but still you should keep in
> mind the details about the spinlocks, things like the checks the VM does
> in shrink_cache are needed also with preemptive kernel.

Yes of course, the spinlock regions still have to be analyzed and both
patches have to be maintained for that.  Long duration spinlocks are bad
by any measure, and have to be dealt with anyway.

> > Finally, with preemption, rescheduling can be forced with essentially zero 
> > latency in response to an arbitrary interrupt such as IO completion, whereas 
> > the non-preemptive kernel will have to 'coast to a stop'.  In other words, 
> > the non-preemptive kernel will have little lags between successive IOs, 
> > whereas the preemptive kernel can submit the next IO immediately.  So there 
> > are bound to be loads where the preemptive kernel turns in better latency 
> > *and throughput* than the scheduling point hack.
> 
> The I/O pipeline is big enough that a few msec before or later in a
> submit_bh shouldn't make a difference, the batch logic in the
> ll_rw_block layer also try to reduce the reschedule, and last but not
> the least if the task is I/O bound preemptive kernel or not won't make
> any difference in the submit_bh latency because no task is eating cpu
> and latency will be the one of pure schedule call.

That's not correct.  For one thing, you don't know that no task is eating
CPU, or that nobody is hogging the kernel.  Look at the above, and consider
the part about the little lags between IOs.

> > Mind you, I'm not devaluing Andrew's work, it's good and valuable.  However 
> > it's good to be aware of why that approach can't equal the latency-busting 
> > performance of the preemptive approach.
> 
> I also don't want to devaluate the preemptive kernel approch (the mean
> latency it can reach is lower than the one of the lowlat kernel, however
> I personally care only about worst case latency and this is why I don't
> feel the need of -preempt),

This is exactly the case that -preempt handles well.  On the other hand,
trying to show that scheduling hacks satisfy any given latency bound is
equivalent to solving the halting problem.

I thought you had done some real time work?

> but I just wanted to make clear that the
> idea that is floating around that preemptive kernel is all goodness is
> very far from reality, you get very low mean latency but at a price.

A price lots of people are willing to pay.

By the way, have you measured the cost of -preempt in practice?

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
newsfeed.direct.ca!look.ca!feed2.news.rcn.net!rcn!dca6-feed2.news.digex.net!
intermedia!newsfeed1.cidera.com!Cidera!news2.dg.net.ua!bn.utel.com.ua!
carrier.kiev.ua!not-for-mail
From: Andrew Morton <a...@zip.com.au>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Tue, 8 Jan 2002 20:24:24 +0000 (UTC)
Organization: unknown
Lines: 55
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <3C3B5305.267EFC14@zip.com.au>
References: <20020108030431.0099F38C58@perninha.conectiva.com.br> 
<Pine.LNX.4.21.0201081153160.19178-100000@freak.distro.conectiva>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: horse.lucky.net 1010521464 72669 193.193.193.118 (8 Jan 2002 20:24:24 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Tue, 8 Jan 2002 20:24:24 +0000 (UTC)
X-Authentication-Warning: vasquez.zip.com.au: 
Host r...@zipperii.zip.com.au [61.8.0.87] claimed to be zip.com.au
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.18pre1 i686)
X-Accept-Language: en
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Marcelo Tosatti

Marcelo Tosatti wrote:
> 
> > Andrew Morten`s read-latency.patch is a clear winner for me, too.
> 
> AFAIK Andrew's code simply adds schedule points around the kernel, right?
> 
> If so, nope, I do not plan to integrate it.

I haven't sent it to you yet :)  It improves the kernel.  That's
good, isn't it?  (There are already forty or fifty open-coded
rescheduling points in the kernel.  That patch just adds the
missing (and most important) ten).  

BTW, with regard to the "preempt and low-lat improve disk throughput"
argument.  I have occasionally seen small throughput improvements,
but I think these may be just request-merging flukes.  Certainly
they were very small.

The one area where it sometimes makes a huuuuuge throughput
improvement is software RAID.

Much of the VM and dirty buffer writeout code assumes that
submit_bh() starts I/O.  Guess what?  RAID's submit_bh()
sometimes *doesn't* start I/O.  Because the IO is started
by a different thread.

With the Riel VM I had a test case in which software RAID
completely and utterly collapsed because of this.  The machine
was spending huge amounts of time spinning in page_launder(), madly
submitting I/O, but never yielding, so the I/O wasn't being started.

-aa VM has an open-coded yield in shrink_cahce() which prevents
that particular collapse.  But I had a report yesterday that
the mini-ll patch triples throughput on a complex RAID stack in
2.4.17.  Same reason.

Arguably, this is a RAID problem - raidN_make_request() should
be yielding.  But it's better to do this in one nice, single,
reviewable place - submit_bh().  However that won't prevent
wait_for_buffers() from starving the raid thread.

RAID is not alone.  ksoftirqd, keventd and loop_thread() also
need reasonably good response times.

But given the number of people who have been providing feedback
on this patch, and on the disk-read-latency patch, none of this
is going anywhere, and mine will be the only Linux machines which
don't suck.  (Takes ball, goes home).

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news-x2.support.nl!news-x.support.nl!surfnet.nl!newsfeed.media.kyoto-u.ac.jp!
newshub2.rdc1.sfba.home.com!news.home.com!newshub1-work.rdc1.sfba.home.com!
gehenna.pell.portland.or.us!nntp-server.caltech.edu!nntp-server.caltech.edu!
mail2news96
Newsgroups: mlist.linux.kernel
Date: 	Wed, 9 Jan 2002 00:02:48 +0100 (CET)
From: Luigi Genoni <ker...@Expansa.sns.it>
X-To: Daniel Phillips <phill...@bonn-fries.net>
X-cc: Andrea Arcangeli <and...@suse.de>, Anton Blanchard <an...@samba.org>,
        Dieter N?tzel <Dieter.Nuet...@hamburg.de>,
        Marcelo Tosatti <marc...@conectiva.com.br>,
        Rik van Riel <r...@conectiva.com.br>,
        Linux Kernel List <linux-ker...@vger.kernel.org>,
        Andrew Morton <a...@zip.com.au>, Robert Love <r...@tech9.net>
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Message-ID: <linux.kernel.Pine.LNX.4.33.0201082351020.1185-100000@Expansa.sns.it>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Approved: n...@nntp-server.caltech.edu
Lines: 81



On Tue, 8 Jan 2002, Daniel Phillips wrote:

> On January 8, 2002 04:29 pm, Andrea Arcangeli wrote:
> > > The preemptible approach is much less of a maintainance headache, since
> > > people don't have to be constantly doing audits to see if something changed,
> > > and going in to fiddle with scheduling points.
> >
> > this yes, it requires less maintainance, but still you should keep in
> > mind the details about the spinlocks, things like the checks the VM does
> > in shrink_cache are needed also with preemptive kernel.
>
> Yes of course, the spinlock regions still have to be analyzed and both
> patches have to be maintained for that.  Long duration spinlocks are bad
> by any measure, and have to be dealt with anyway.
>
> > > Finally, with preemption, rescheduling can be forced with essentially zero
> > > latency in response to an arbitrary interrupt such as IO completion, whereas
> > > the non-preemptive kernel will have to 'coast to a stop'.  In other words,
> > > the non-preemptive kernel will have little lags between successive IOs,
> > > whereas the preemptive kernel can submit the next IO immediately.  So there
> > > are bound to be loads where the preemptive kernel turns in better latency
> > > *and throughput* than the scheduling point hack.
> >
> > The I/O pipeline is big enough that a few msec before or later in a
> > submit_bh shouldn't make a difference, the batch logic in the
> > ll_rw_block layer also try to reduce the reschedule, and last but not
> > the least if the task is I/O bound preemptive kernel or not won't make
> > any difference in the submit_bh latency because no task is eating cpu
> > and latency will be the one of pure schedule call.
>
> That's not correct.  For one thing, you don't know that no task is eating
> CPU, or that nobody is hogging the kernel.  Look at the above, and consider
> the part about the little lags between IOs.
>
> > > Mind you, I'm not devaluing Andrew's work, it's good and valuable.  However
> > > it's good to be aware of why that approach can't equal the latency-busting
> > > performance of the preemptive approach.
> >
> > I also don't want to devaluate the preemptive kernel approch (the mean
> > latency it can reach is lower than the one of the lowlat kernel, however
> > I personally care only about worst case latency and this is why I don't
> > feel the need of -preempt),
>
> This is exactly the case that -preempt handles well.  On the other hand,
> trying to show that scheduling hacks satisfy any given latency bound is
> equivalent to solving the halting problem.
>
> I thought you had done some real time work?
>
> > but I just wanted to make clear that the
> > idea that is floating around that preemptive kernel is all goodness is
> > very far from reality, you get very low mean latency but at a price.
>
> A price lots of people are willing to pay
Probably sometimes they are not making a good business. In the reality
preempt is good in many scenarios, as I said, and I agree that for
desktops, and dedicated servers where just one application runs, and
probably the CPU is idle the most of the time, indeed users have a speed
feeling. Please consider that on eavilly loaded servers, with 40 and more
users, some are running gcc, others g77, others g++ compilations, someone
runs pine or mutt or kmail, and netscape, and mozilla, and emacs (someone
form xterm kde or gnome), and and
and... You can have also 4/8 CPU butthey are not infinite ;) (but I talk
mainly thinking of dualAthlon systems).
there is a lot of memory and disk I/O.
This is not a strange scenary on the interactive servers used at SNS.
Here preempt has a too high price
>
> By the way, have you measured the cost of -preempt in practice?
>
Yes, I did a lot of tests, and with current preempt patch definitelly
I was seeing a too big performance loss.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Path: archiver1.google.com!news1.google.com!sn-xit-02!supernews.com!
news.tele.dk!small.news.tele.dk!newsfeed4.cidera.com!newsfeed1.cidera.com!
Cidera!news2.dg.net.ua!bn.utel.com.ua!carrier.kiev.ua!not-for-mail
From: Dieter =?iso-8859-15?q?N=FCtzel?= <Dieter.Nuet...@hamburg.de>
Newsgroups: lucky.linux.kernel
Subject: Re: [2.4.17/18pre] VM and swap - it's really unusable
Date: Wed, 9 Jan 2002 00:16:25 +0000 (UTC)
Organization: DN
Lines: 104
Sender: n...@horse.lucky.net
Approved: newsmas...@lucky.net
Message-ID: <20020109001450Z288633-13996+2793@vger.kernel.org>
References: <Pine.LNX.4.33.0201082351020.1185-100000@Expansa.sns.it>
NNTP-Posting-Host: horse.lucky.net
Mime-Version: 1.0
Content-Type: text/plain;
Content-Transfer-Encoding: 8bit
X-Trace: horse.lucky.net 1010535385 94833 193.193.193.118 (9 Jan 2002 00:16:25 GMT)
X-Complaints-To: usenet@horse.lucky.net
NNTP-Posting-Date: Wed, 9 Jan 2002 00:16:25 +0000 (UTC)
X-Mailer: KMail [version 1.3.2]
In-Reply-To: <Pine.LNX.4.33.0201082351020.1185-100000@Expansa.sns.it>
X-Mailing-List: 	linux-kernel@vger.kernel.org
X-Comment-To: Luigi Genoni

On Wednesday, 9. January 2002 00:02, Luigi Genoni wrote:
> On Tue, 8 Jan 2002, Daniel Phillips wrote:
> > On January 8, 2002 04:29 pm, Andrea Arcangeli wrote:
[-]
> > > I also don't want to devaluate the preemptive kernel approch (the mean
> > > latency it can reach is lower than the one of the lowlat kernel,
> > > however I personally care only about worst case latency and this is why
> > > I don't feel the need of -preempt),
> >
> > This is exactly the case that -preempt handles well.  On the other hand,
> > trying to show that scheduling hacks satisfy any given latency bound is
> > equivalent to solving the halting problem.
> >
> > I thought you had done some real time work?
> >
> > > but I just wanted to make clear that the
> > > idea that is floating around that preemptive kernel is all goodness is
> > > very far from reality, you get very low mean latency but at a price.
> >
> > A price lots of people are willing to pay
>
> Probably sometimes they are not making a good business. In the reality
> preempt is good in many scenarios, as I said, and I agree that for
> desktops, and dedicated servers where just one application runs, and
> probably the CPU is idle the most of the time,

OK, good. You are much at the same line than I am.

Should we starting not only to differentiate between UP and SMP systems but 
allthought between desktop and (big) servers?
I remember one saying. "Think, this patch is worth only for ~0.05% of the 
Linux users..." (He meant the multi SMP system users.)

Allmost 99.95% of the Linux users running desktops and I am somewhat tiered 
of saying, "sorry, Linux is under development..."
Look at the imprint of the famous German ct magazine (they are not even known 
as Linux bashers...;-). It shows little penguins falling like domino stones 
(starting with 2.4.17).

Let me rephrase it:
I appreciate all your great work and I know "only" some (little) internals of 
it but we should do some interactivity improvements for the 2.4 kernel, too.
I know what it's worth Andrew's (lowlatency patch) and Robert's (George 
Anzinger's) preempt patch. In short the system (bigger desktop) flies.

The holly grail would be a combination of preempt+lock-break plus lowlatency 
and Ingo's O(1) scheduler.

My main focus lies on 3D graphics not kernel and I use KDE (yes, a little 
luxury:-) 'cause KDE is C++ and most visualization systems are c and later 
c++.

Without the above patches even my 1 GHz Athlon II, 640 MB, feels sluggish.
But I don't forget to think about throughput which is even usefull for 
"heavy" compiler runs...

> indeed users have a speed
> feeling. Please consider that on eavilly loaded servers, with 40 and more
> users, some are running gcc, others g77, others g++ compilations, someone
> runs pine or mutt or kmail, and netscape, and mozilla, and emacs (someone
> form xterm kde or gnome), and and
> and... You can have also 4/8 CPU butthey are not infinite ;) (but I talk
> mainly thinking of dualAthlon systems).
> there is a lot of memory and disk I/O.
> This is not a strange scenary on the interactive servers used at SNS.
> Here preempt has a too high price

That's why preempt is a compile time option, btw.

> > By the way, have you measured the cost of -preempt in practice?
>
> Yes, I did a lot of tests, and with current preempt patch definitelly
> I was seeing a too big performance loss.

Have you tried with stock 2.4.17 or with additional patches?
2.4.17-rc2aa2 (10_vm-21)?

The later make big differences in throughput for me (with and without 
preempt).

I am under preparation of some numbers.
Anybody want some special tests?
dbench (yes, I know...) with and without MP3 during run
latencytest0.42-png
bonnie++
getc_putc

Thank you for all your serious answers. This was definitely not intended as a 
flamewar start.

-Dieter
-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuet...@hamburg.de

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/