Almost dead machine with heavy swapping

From: st...@iafrica.com (Steve Davies)
Subject: Almost dead machine with heavy swapping
Date: 1995/07/20
Message-ID: <3umadm$g85@grovel.iafrica.com>#1/1
X-Deja-AN: 106594010
organization: Internet Africa
newsgroups: comp.os.linux.development.system

Hi,

Looking for comments and input:  we run a Linux 1.2.10 machine for
quite heavy usage.  The machine has 32megs of real RAM and about the
same of swap, on an IDE drive.

Normally this machine hardly uses the swap space.  2megs or so tops.

But every now and then we have a problem where a process will start
grabbing huge amounts of RAM.  Its often pine when a user tries to
attach or deteach a large MIME attachment.  RSS sizes of 20 to 30
megabytes have been seen.

When this happens the machine starts biting heavily in to the swap
(surprise surprise).  It becomes very slow indeed - to the extent that
it can take many minutes to get logged in as root and "reboot".  If we
don't catch it in time it can be necessary to "big red switch" the
machine.

What can I do?  First off I don't really understand why the
performance becomes so poor.  Is there something that I can do to
improve performance under memory load?  Seems to me that the scheduler
should seriously drop the priority of processes using much more RAM
than the average!

Secondly: is there some way that I can limit the maximum memory usage
of a single process to avoid the problem?

Thanks,
Steve Davies

From: rnich...@interaccess.com (Robert Nichols)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/21
Message-ID: <DC2KFx.BA.0.omega-3@interaccess.com>#1/1
X-Deja-AN: 106593964
references: <3umadm$g85@grovel.iafrica.com> <3un5bl$39m@news.randomc.com> 
<3unan3$4q1@hubcap.clemson.edu>
organization: InterAccess: Chicagoland's Full Service Internet Provider
newsgroups: comp.os.linux.development.system

In article <3unan3$...@hubcap.clemson.edu>,
Lex Spoon <ssp...@hubcap.clemson.edu> wrote:
:Clever dot Net (r...@Clever.randomc.com) wrote:
:: 1> IDE sucks.
:
:In all seriousness, why is/would swapping to IDE be so much worse
:than swapping to SCSI?  Does this effect even EIDE drives with 
:VLB controllers?

In theory:  Conventional IDE is fairly CPU intensive, so swapping
    consumes a lot of CPU time in addition to tying up the I/O system
    and blocking the process being swapped.

In practice:  Swapping is also pretty bad with a SCSI disk and a
    busmastering SCSI adapter.

:: 2> Swapping in linux sucks.
:
:Well, swapping sucks.  Is it worse in linux, though, than in any other OS?

It's hard to say whether it's the fault of the OS or the fairly
poor capabilities of the PC's I/O architecture.

--
Bob Nichols         rnich...@interaccess.com

From: w...@netcom.com (Ben Wing)
Subject: Serious problems with Linux swapping performance
Date: 1995/07/21
Message-ID: <wingDC1Hv4.Gxq@netcom.com>#1/1
X-Deja-AN: 106594038
sender: w...@netcom12.netcom.com
references: <3umadm$g85@grovel.iafrica.com>
organization: NETCOM On-line Communication Services (408 261-4700 guest)
newsgroups: comp.os.linux.development.system

In article <3umadm$...@grovel.iafrica.com>,
Steve Davies <st...@iafrica.com> wrote:
|Hi,
|
|Looking for comments and input:  we run a Linux 1.2.10 machine for
|quite heavy usage.  The machine has 32megs of real RAM and about the
|same of swap, on an IDE drive.
|
|Normally this machine hardly uses the swap space.  2megs or so tops.
|
|But every now and then we have a problem where a process will start
|grabbing huge amounts of RAM.  Its often pine when a user tries to
|attach or deteach a large MIME attachment.  RSS sizes of 20 to 30
|megabytes have been seen.
|
|When this happens the machine starts biting heavily in to the swap
|(surprise surprise).  It becomes very slow indeed - to the extent that
|it can take many minutes to get logged in as root and "reboot".  If we
|don't catch it in time it can be necessary to "big red switch" the
|machine.
|
|What can I do?  First off I don't really understand why the
|performance becomes so poor.  Is there something that I can do to
|improve performance under memory load?  Seems to me that the scheduler
|should seriously drop the priority of processes using much more RAM
|than the average!

I've seen the same problem.  I was trying to debug a problem where the
app I was developing (XEmacs) would crash if you loaded an excessively
large file into it.  First time I tried, it crashed and hung the system
for about 30 seconds dumping out a 28 megabyte core file. (In
comparison, it only takes a few secs to dump out a 20 megabyte core
file.) Next time I tried, it crashed and hung the system for FIFTEEN
MINUTES dumping out a 30 megabyte core file. (I have 32 megabytes of
RAM.)

My conclusion is that Linux has some **serious** problems with its
virtual memory / swap handling.  I'm surprised that more people
haven't complained about this; I would view this as a very high
priority item to fix, much more so than moving to ELF or adding
loadable modules or other sorts of things that seem to be occupying
the Linux developers' time.

ben
-- 
"... then the day came when the risk to remain tight in a bud was
more painful than the risk it took to blossom." -- Anais Nin

From: r...@Clever.randomc.com (Clever dot Net)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/21
Message-ID: <3un5bl$39m@news.randomc.com>#1/1
X-Deja-AN: 106594052
references: <3umadm$g85@grovel.iafrica.com>
organization: Godcorp
newsgroups: comp.os.linux.development.system

1> IDE sucks.
2> Swapping in linux sucks.

If you can figure a way to fix #1, (ie adaptec 2940W w/barracuda) you're 
swapping won't be so slow, but if you have an IDE swap drive, it will be 
painful, especially if you're silly enough to have the swap end the end 
of the drive on the same drive as your fs.

From: ssp...@hubcap.clemson.edu (Lex Spoon)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/21
Message-ID: <3unan3$4q1@hubcap.clemson.edu>#1/1
X-Deja-AN: 106594051
references: <3umadm$g85@grovel.iafrica.com> <3un5bl$39m@news.randomc.com>
organization: Clemson University
newsgroups: comp.os.linux.development.system

Clever dot Net (r...@Clever.randomc.com) wrote:
: 1> IDE sucks.

In all seriousness, why is/would swapping to IDE be so much worse
than swapping to SCSI?  Does this effect even EIDE drives with 
VLB controllers?

: 2> Swapping in linux sucks.

Well, swapping sucks.  Is it worse in linux, though, than in any other OS?

-Lex

From: ec531...@student.uq.edu.au (Robert Brockway)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/22
Message-ID: <3upviv$171@dingo.cc.uq.oz.au>#1/1
X-Deja-AN: 106711488
references: <3umadm$g85@grovel.iafrica.com> <3un5bl$39m@news.randomc.com> 
<3unan3$4q1@hubcap.clemson.edu>
organization: University of Queensland
newsgroups: comp.os.linux.development.system

Lex Spoon (ssp...@hubcap.clemson.edu) wrote:
: Clever dot Net (r...@Clever.randomc.com) wrote:
: : 1> IDE sucks.

: In all seriousness, why is/would swapping to IDE be so much worse
: than swapping to SCSI?  Does this effect even EIDE drives with 
: VLB controllers?

: : 2> Swapping in linux sucks.

: Well, swapping sucks.  Is it worse in linux, though, than in any other OS?

No actually recent tests carried out on many PC-unices showed Linux to
be better at swapping than all of the others, thanks to the superior
task scheduler that Linux has.
	-Robert

--Robert Brockway, email: ec531...@student.uq.edu.au
                     WWW: http://student.uq.edu.au/~ec531667
Computers: Can't live with them, can't play Doom without them.

From: troin@ensisun ()
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/22
Message-ID: <3ur1as$hnp@cicg-communication.grenet.fr>#1/1
X-Deja-AN: 106711509
references: <3umadm$g85@grovel.iafrica.com>
organization: ENSIMAG-INPG, France
reply-to: tr...@ensisun.imag.fr
newsgroups: comp.os.linux.development.system

Steve Davies (st...@iafrica.com) wrote:
: Hi,

: Looking for comments and input:  we run a Linux 1.2.10 machine for
: quite heavy usage.  The machine has 32megs of real RAM and about the
: same of swap, on an IDE drive.

: Normally this machine hardly uses the swap space.  2megs or so tops.

: But every now and then we have a problem where a process will start
: grabbing huge amounts of RAM.  Its often pine when a user tries to
: attach or deteach a large MIME attachment.  RSS sizes of 20 to 30
: megabytes have been seen.

: When this happens the machine starts biting heavily in to the swap
: (surprise surprise).  It becomes very slow indeed - to the extent that
: it can take many minutes to get logged in as root and "reboot".  If we
: don't catch it in time it can be necessary to "big red switch" the
: machine.

Well, you don't actually need to use the 'big red switch'. My experiments
with linux shows that Linux 1.2.x (and probabaly late 1.1.x) will recover
from this situation, after a variable amount of time. Linux kills some
processes. Unfortunately, it can kill ANY process, like a daemon if it
has the bad idea of allocating memory when none is available.

I heard that this bad behaviour was fixed in 1.3.x. However, I personnaly
stick with 1.2.11.

: What can I do?  First off I don't really understand why the
: performance becomes so poor.  Is there something that I can do to
: improve performance under memory load?  Seems to me that the scheduler
: should seriously drop the priority of processes using much more RAM
: than the average!

: Secondly: is there some way that I can limit the maximum memory usage
: of a single process to avoid the problem?

You have two ways:
	use ulimit or limit in the /etc/profile or /etc/csh.profile
	use lshell, which basically does the same. Available on sunsite.

					phil.
--
------------------------------------------------------------------------------
Philippe Troin            Etudiant en Architecture des Ordinateurs (ERG/IMAG)
tr...@ensisun.imag.fr     Computer architecture student
ptr...@enserg.fr
------------------------------------------------------------------------------
LEGAL NOTICE: The license to ditribute this message through Microsoft Network
	is $500 (FF2500). Posting this message through Microdoft Network 
	constitutes an agreement to these terms. Copyright 1995 Philippe Troin

From: r...@dyson.iquest.net (John S. Dyson)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/22
Message-ID: <3uqv9g$o1l@dyson.iquest.net>#1/1
X-Deja-AN: 106711518
sender: n...@iquest.net (News Admin)
references: <3umadm$g85@grovel.iafrica.com> <3un5bl$39m@news.randomc.com> 
<3unan3$4q1@hubcap.clemson.edu> <3upviv$171@dingo.cc.uq.oz.au>
organization: John S. Dyson's Machine
newsgroups: comp.os.linux.development.system

In article <3upviv$...@dingo.cc.uq.oz.au>,
Robert Brockway <ec531...@student.uq.edu.au> wrote:
>Lex Spoon (ssp...@hubcap.clemson.edu) wrote:
>: Clever dot Net (r...@Clever.randomc.com) wrote:
>: : 1> IDE sucks.
>
>: In all seriousness, why is/would swapping to IDE be so much worse
>: than swapping to SCSI?  Does this effect even EIDE drives with 
>: VLB controllers?
>
>
>: : 2> Swapping in linux sucks.
>
>: Well, swapping sucks.  Is it worse in linux, though, than in any other OS?
>
>No actually recent tests carried out on many PC-unices showed Linux to
>be better at swapping than all of the others, thanks to the superior
>task scheduler that Linux has.
>	-Robert
>
Actually, I can tell you that many other OSes have seriously broken swapping
and it is not hard to be better than them.  The swapping vs. paging issue has
been investigated seriously on FreeBSD, and there is some algorithmic tuning
that has helped.  Swapping appears to be a b*st*rd step-child of the
memory scheduling issue of various Unix-clones.  Paging is almost as bad.
For example, how many Unix-like OSes actually run statistics on page
usage instead of using the almost useless clock algorithm???  (Clock
actually does work during light memory load conditions -- but usually
falls to pieces during heavy load.)

The swapping algorithms used on many commercial Unix OSes actually clash
with the process scheduling!!!  (I know, I used to maintain/debug the
system sources for a major U**X vendor.)  The fix to swapping on the
commercial Unixes is actually very simple...  Don't swap out processes
unless they have been asleep for approx 5 secs or longer.  Additionally
swap-in the UPAGES immediately (or almost immediately) when the process
has been waken-up.  Additional policies to this almost always hurt
performance.  (Some early versions of FreeBSD got hurt by just a little too
much "policy" :-)).

John
dy...@root.com

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/24
Message-ID: <DC8566.BJH@info.swan.ac.uk>#1/1
X-Deja-AN: 106875698
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3umadm$g85@grovel.iafrica.com> <3un5bl$39m@news.randomc.com> 
<3unan3$4q1@hubcap.clemson.edu>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system

In article <3unan3$...@hubcap.clemson.edu> ssp...@hubcap.clemson.edu (Lex Spoon) 
writes:
>Clever dot Net (r...@Clever.randomc.com) wrote:
>: 1> IDE sucks.
>In all seriousness, why is/would swapping to IDE be so much worse
>than swapping to SCSI?  Does this effect even EIDE drives with 
>VLB controllers?

Yes. Being faster over all it is less of an issue. When you are doing an IDE
request thats it. With SCSI you can queue multiple requests and don't have
the CPU tied up if you have a real controller.

This means you can run one program usefully while another is paged.

>: 2> Swapping in linux sucks.
>Well, swapping sucks.  Is it worse in linux, though, than in any other OS?

Yes Linux is slower than BSD at swapping. Linux + kswap patches 
(ftp.presence.co.uk) is about the same as FreeBSD

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
 ``----------'`----------------------------'`----------------------------''
Redistribution of this message via the Microsoft Network is prohibited

From: st...@iafrica.com (Steve Davies)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/24
Message-ID: <3v11pj$7v2@grovel.iafrica.com>#1/1
X-Deja-AN: 106875708
references: <3umadm$g85@grovel.iafrica.com> 
<3ur1as$hnp@cicg-communication.grenet.fr>
organization: Internet Africa
newsgroups: comp.os.linux.development.system


troin@ensisun () wrote:
> Steve Davies wrote:
>: When this happens the machine starts biting heavily in to the swap
>: (surprise surprise).  It becomes very slow indeed - to the extent that
>: it can take many minutes to get logged in as root and "reboot".  If we
>: don't catch it in time it can be necessary to "big red switch" the
>: machine.

>Well, you don't actually need to use the 'big red switch'. My experiments
>with linux shows that Linux 1.2.x (and probabaly late 1.1.x) will recover
>from this situation, after a variable amount of time. Linux kills some
>processes. Unfortunately, it can kill ANY process, like a daemon if it
>has the bad idea of allocating memory when none is available.

Unfortunately by the time our machine comes back from the dead dozens
of people have phoned us to complain that they can't log in...

>: Secondly: is there some way that I can limit the maximum memory usage
>: of a single process to avoid the problem?

>You have two ways:
>	use ulimit or limit in the /etc/profile or /etc/csh.profile
>	use lshell, which basically does the same. Available on sunsite.

Studying the source for 1.2.10 kernel indicates that, whilst the
kernels accepts and stores the memory limits set through ulimit,
it does not enforce the set limit.  Experiments in practice agree.
(e.g. You can run emacs with an MSS limited to 1K...)

But I will look for lshell.  Thanks.

From: Mike Jagdis <ja...@purplet.demon.co.uk>
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/24
Message-ID: <881.3016BC9B@purplet.demon.co.uk>#1/1
X-Deja-AN: 107041059
x-nntp-posting-host: purplet.demon.co.uk
sender: "newsout1.26" <ufg...@purplet.demon.co.uk>
organization: FidoNet node 2:252/305 - The Purple Tentacle, Reading
newsgroups: comp.os.linux.development.system

* In message <3v11pj$...@grovel.iafrica.com>, Steve Davies said:

SD> Studying the source for 1.2.10 kernel indicates that, whilst the
SD> kernels accepts and stores the memory limits set through ulimit,
SD> it does not enforce the set limit.  Experiments in practice agree.
SD> (e.g. You can run emacs with an MSS limited to 1K...)

There is a missing limit check in the a.out loader certainly - you might not 
be able to brk() 2GB of memory but if you declare it, char mem[2*1024*1024] 
the loader will give you it in the bss :-).

  The patch to allow loading of BSD binaries supplied with the iBCS emulator 
adds this check. Linus didn't want to add this patch before since it would 
break some *very* old Linux binaries which may get misrecognised. Mind you, 
the change to the new development kernel is the traditional time to break 
things :-). Not to mention the move over to ELF anyway...

                                Mike

From: st...@iafrica.com (Steve Davies)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/07/30
Message-ID: <3vgkil$87v@grovel.iafrica.com>#1/1
X-Deja-AN: 107189591
references: <3umadm$g85@grovel.iafrica.com> <3un5bl$39m@news.randomc.com> 
<3unan3$4q1@hubcap.clemson.edu> <3upviv$171@dingo.cc.uq.oz.au> 
<3uqv9g$o1l@dyson.iquest.net>
organization: Internet Africa
newsgroups: comp.os.linux.development.system

r...@dyson.iquest.net (John S. Dyson) wrote:
>Swapping appears to be a b*st*rd step-child of the
>memory scheduling issue of various Unix-clones.  Paging is almost as bad.
>For example, how many Unix-like OSes actually run statistics on page
>usage instead of using the almost useless clock algorithm???  (Clock
>actually does work during light memory load conditions -- but usually
>falls to pieces during heavy load.)

Thanks for your comments.  I now run a daemon that looks for oversized
processes and kills em.  9 times out of 10 its pine (people have these
huge mailbox files).  This seems to be keeping my machine in hand.

Unfortunately "ulimit"ting RSS isn't implemented in my kernel
(1.2.10).  But someone did suggest limited data and code space - I may
try that too.

So all in all I have just dodged the page/swapping speed issue.  And I
guess its easier all round to just chuck in some more RAM!

Thanks,
Steve

From: w...@netcom.com (Ben Wing)
Subject: Serious swapping problems [Don't believe what they tell you]
Date: 1995/07/31
Message-ID: <wingDCK78p.3wz@netcom.com>#1/1
X-Deja-AN: 107189618
sender: w...@netcom19.netcom.com
references: <3umadm$g85@grovel.iafrica.com> <wingDC1Hv4.Gxq@netcom.com> 
<3v2fma$ep4@dingo.cc.uq.oz.au> <3vgsk6$2k0i@majestix.uni-muenster.de>
organization: NETCOM On-line Communication Services (408 261-4700 guest)
newsgroups: comp.os.linux.development.system

OK, I'd like to add some more comments here.  Many people who use
Linux make all sorts of claims about this and that, some of which are
contradicted by my experience and some of which are contradicted
by the NetBSD and FreeBSD people.  So, don't believe everything you
hear ...

So yesterday, on my Linux 1.2.10 machine, I ran a 9-meg shell script that
basically reads

patch <<END_OF_PATCH
[9 megs of stuff here]
END_OF_PATCH

My machine basically locked up for 20-30 minutes -- the "almost dead"
behavior.  At the time I had 32 megs of main memory and 40 megs of swap,
running on a Pentium 90.  The 40 megs of swap were on a SCSI device.

Some days earlier, I ran the same 9-meg shell script, patching the same
set of files, on a Sparc 1+ with 16 megs of main memory and 90 megs of
swap, running Solaris 2.3.  Note that the Sparc 1+ is about three times
or four times as slow as the Pentium 90 (this is verified through the
long hours I've spent waiting for it to finish compiling, the sluggish
behavior I get running XEmacs [not seen on the Pentium 90 even though
the version I run on that machine is slowed down internally by a lot of
internal error-checking], the fact that the processor runs at only 25
MHz, etc.), has slower disk I/O, etc.  However, no such lockup occurred
on the Sparc 1+ -- it was quite usable to edit text, etc., and finished
in maybe 15 minutes.

Therefore, I conclude that something must be abysmally wrong with Linux
swapping.  After this event, I increased my swap space to 160 megs,
but can that really make such a difference?  My Linux machine had
no other load on it at the time.

Are any kernel developers even reading this?

ben
-- 
"... then the day came when the risk to remain tight in a bud was
more painful than the risk it took to blossom." -- Anais Nin

From: torva...@cc.Helsinki.FI (Linus Torvalds)
Subject: Re: Serious swapping problems [Don't believe what they tell you]
Date: 1995/07/31
Message-ID: <3vi5mc$o2f@kruuna.helsinki.fi>#1/1
X-Deja-AN: 107189583
sender: torva...@cc.helsinki.fi
references: <3umadm$g85@grovel.iafrica.com> <3v2fma$ep4@dingo.cc.uq.oz.au> 
<3vgsk6$2k0i@majestix.uni-muenster.de> <wingDCK78p.3wz@netcom.com>
content-type: text/plain; charset=ISO-8859-1
organization: University of Helsinki
mime-version: 1.0
newsgroups: comp.os.linux.development.system

In article <wingDCK78p....@netcom.com>, Ben Wing <w...@netcom.com> wrote:
>OK, I'd like to add some more comments here.  Many people who use
>Linux make all sorts of claims about this and that, some of which are
>contradicted by my experience and some of which are contradicted
>by the NetBSD and FreeBSD people.  So, don't believe everything you
>hear ...
>
>So yesterday, on my Linux 1.2.10 machine, I ran a 9-meg shell script that
>basically reads
>
>patch <<END_OF_PATCH
>[9 megs of stuff here]
>END_OF_PATCH
>
>My machine basically locked up for 20-30 minutes -- the "almost dead"
>behavior.  At the time I had 32 megs of main memory and 40 megs of swap,
>running on a Pentium 90.  The 40 megs of swap were on a SCSI device.
>
>Some days earlier, I ran the same 9-meg shell script, patching the same
>set of files, on a Sparc 1+ with 16 megs of main memory and 90 megs of
>swap, running Solaris 2.3.

This isn't a kernel question, but a shell question.  As far as I know,
bash (the linux /bin/sh) will do here-documents in-memory, which is a
mjor problem when we're talking about large documents. 

Traditional bourne shells will do here-documents in a temporary file. 
So swapping never even enters the picture.  Linux can't use the original
bourne shell, as it's AT&T copyrighted and not freely available. 

		Linus

From: by...@cc.gatech.edu (Byron A Jeff)
Subject: Re: Serious swapping problems [Don't believe what they tell you]
Date: 1995/07/31
Message-ID: <3vjlau$l7k@solaria.cc.gatech.edu>#1/1
X-Deja-AN: 107290935
references: <3umadm$g85@grovel.iafrica.com> <3v2fma$ep4@dingo.cc.uq.oz.au> 
<3vgsk6$2k0i@majestix.uni-muenster.de> <wingDCK78p.3wz@netcom.com>
organization: Georgia Institute of Technology - College of Computing
nntp-posting-user: byron
newsgroups: comp.os.linux.development.system

In article <wingDCK78p....@netcom.com>, Ben Wing <w...@netcom.com> wrote:
> [ Question deleted. Answered elsewhere. ]

>Are any kernel developers even reading this?

Do take note that Linus himself answered your question.
Does that answer your question??

BAJ
-- 
Another random extraction from the mental bit stream of...
Byron A. Jeff - PhD student operating in parallel - And Using Linux!
Georgia Tech, Atlanta GA 30332   Internet: by...@cc.gatech.edu

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: Serious problems with Linux swapping performance
Date: 1995/08/01
Message-ID: <DCMKMw.MH7@info.swan.ac.uk>#1/1
X-Deja-AN: 107290978
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3umadm$g85@grovel.iafrica.com> <wingDC1Hv4.Gxq@netcom.com>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system

In article <wingDC1Hv4....@netcom.com> w...@netcom.com (Ben Wing) writes:
>large file into it.  First time I tried, it crashed and hung the system
>for about 30 seconds dumping out a 28 megabyte core file. (In
>comparison, it only takes a few secs to dump out a 20 megabyte core
>file.) Next time I tried, it crashed and hung the system for FIFTEEN
>MINUTES dumping out a 30 megabyte core file. (I have 32 megabytes of
>RAM.)

A core dump hangs only the process writing it (its a pain and present on
all unixoid systems I've tried). 

>My conclusion is that Linux has some **serious** problems with its
>virtual memory / swap handling.  I'm surprised that more people
>haven't complained about this; I would view this as a very high
>priority item to fix, much more so than moving to ELF or adding
>loadable modules or other sorts of things that seem to be occupying
>the Linux developers' time.

It sounds like your a totally overloading the machine, forcing it to do
a vast amount of work and expecting miracles. Especially if you have IDE
disks you will see a long pause of that process during a core dump and
a lot of activity as each page has to be brought in to write into the core
file.

Hardware limitations are kind of trying to cure 8). Linux already has
everything you need to limit coredump sizes and process sizes. You have
finite resources. For many applications of a system it is not appropriate
to allow all the resources to be taken by one job.

Alan
-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
Do you trust your web client. <IMG src="file:/dev/zero"><IMG src="file:/com1:">

From: w...@netcom.com (Ben Wing)
Subject: Re: Serious swapping problems [Don't believe what they tell you]
Date: 1995/08/01
Message-ID: <wingDCnM4p.5J6@netcom.com>#1/1
X-Deja-AN: 107291014
sender: w...@netcom15.netcom.com
references: <3umadm$g85@grovel.iafrica.com> 
<3vgsk6$2k0i@majestix.uni-muenster.de> <wingDCK78p.3wz@netcom.com> 
<3vi5mc$o2f@kruuna.helsinki.fi>
organization: NETCOM On-line Communication Services (408 261-4700 guest)
newsgroups: comp.os.linux.development.system

In article <3vi5mc$...@kruuna.helsinki.fi>,
Linus Torvalds <torva...@cc.Helsinki.FI> wrote:
|In article <wingDCK78p....@netcom.com>, Ben Wing <w...@netcom.com> wrote:
|>OK, I'd like to add some more comments here.  Many people who use
|>Linux make all sorts of claims about this and that, some of which are
|>contradicted by my experience and some of which are contradicted
|>by the NetBSD and FreeBSD people.  So, don't believe everything you
|>hear ...
|>
|>So yesterday, on my Linux 1.2.10 machine, I ran a 9-meg shell script that
|>basically reads
|>
|>patch <<END_OF_PATCH
|>[9 megs of stuff here]
|>END_OF_PATCH
|>
|>My machine basically locked up for 20-30 minutes -- the "almost dead"
|>behavior.  At the time I had 32 megs of main memory and 40 megs of swap,
|>running on a Pentium 90.  The 40 megs of swap were on a SCSI device.
|>
|>Some days earlier, I ran the same 9-meg shell script, patching the same
|>set of files, on a Sparc 1+ with 16 megs of main memory and 90 megs of
|>swap, running Solaris 2.3.
|
|This isn't a kernel question, but a shell question.  As far as I know,
|bash (the linux /bin/sh) will do here-documents in-memory, which is a
|mjor problem when we're talking about large documents. 
|
|Traditional bourne shells will do here-documents in a temporary file. 
|So swapping never even enters the picture.  Linux can't use the original
|bourne shell, as it's AT&T copyrighted and not freely available. 

Thanks for responding.  I was indeed wondering whether the bash/sh
difference had something to do with it.

It's still hard for me to believe, however, that the bash/sh difference
is the only thing going on here.

To wit: swapping should *never* lock up a machine like I've seen.
Granted, not all OS's handle this well, but Linux handles it worse than
anything else I've seen.  In my experience, Linux seems to lock up
whenever you have a single process that's larger than the amount of
RAM you have installed.  Extensive experience with SunOS and Solaris
indicates that this is not the case under those OS's.  I've had plenty
of large processes running on SunOS on 8-meg Sparcs (of the puniest variety
that Sun made -- those dinky 20 Mhz machines with the computer inside
of the monitor), and the only time you got catastrophic behavior was
when you had a runaway process whose size started to get up to 80 megabytes
or so (some clueless user logged in remotely and ran, using GNU grep,
'grep -f some_large_file search_string').  Now Linux may not have
the developer resources that Sun has, but I think this should be looked
into fairly soon.

(Not that I'm about to give up Linux -- it works better than Solaris
in many respects, and it doesn't cost hundreds of dollars and runs on
PC's.  Free software is a *good* thing.  I would very much like to see
this fixed, though.)

ben
-- 
"... then the day came when the risk to remain tight in a bud was
more painful than the risk it took to blossom." -- Anais Nin

From: iia...@iifeak.swan.ac.uk (Alan Cox)
Subject: Re: Almost dead machine with heavy swapping
Date: 1995/08/02
Message-ID: <DCoFt0.FA2@info.swan.ac.uk>#1/1
X-Deja-AN: 107290941
sender: n...@info.swan.ac.uk
x-nntp-posting-host: iifeak.swan.ac.uk
references: <3un5bl$39m@news.randomc.com> <3unan3$4q1@hubcap.clemson.edu> 
<DC2KFx.BA.0.omega-3@interaccess.com>
organization: Institute For Industrial Information Technology
newsgroups: comp.os.linux.development.system

In article <DC2KFx.BA.0.omeg...@interaccess.com> rnich...@interaccess.com 
(Robert Nichols) writes:
>In practice:  Swapping is also pretty bad with a SCSI disk and a
>    busmastering SCSI adapter.

Swapping is always bad, no disk is approaching RAM speed. How your machine
degrades then depends on the job mix and I/O subsystem. In the good case
another CPU bound task gets to run in all the gaps while the big job swaps.
In the bad case the swapping and a load of I/O bound processes fight for the 
disk.

>poor capabilities of the PC's I/O architecture.

ISA/EISA maybe, PCI however is a real bus.

Alan

-- 
  ..-----------,,----------------------------,,----------------------------,,
 // Alan Cox  //  iia...@www.linux.org.uk   //  GW4PTS@GB7SWN.#45.GBR.EU  //
Redistribution of this message via the Microsoft Network is prohibited
Do you trust your web client. <IMG src="file:/dev/zero"><IMG src="file:/com1:">

From: simonal...@cix.compulink.co.uk ("Simon P Allen")
Subject: Re: Serious swapping problems [Don't believe what they tell you]
Date: 1995/08/02
Message-ID: <DCp6KD.IAI@cix.compulink.co.uk>#1/1
X-Deja-AN: 107433156
references: <wingDCK78p.3wz@netcom.com>
organization: Linux Developer
x-news-software: Ameol
newsgroups: comp.os.linux.development.system


Of course, you made sure you were running the same shell on both the 
Pentium and the Sparc to make the test perfectly fair?  Of course you 
did...

From: w...@netcom.com (Ben Wing)
Subject: Re: Serious swapping problems [Don't believe what they tell you]
Date: 1995/08/03
Message-ID: <wingDCqJGE.5MA@netcom.com>#1/1
X-Deja-AN: 107433117
sender: w...@netcom11.netcom.com
references: <wingDCK78p.3wz@netcom.com> <DCp6KD.IAI@cix.compulink.co.uk>
organization: NETCOM On-line Communication Services (408 261-4700 guest)
newsgroups: comp.os.linux.development.system

In article <DCp6KD....@cix.compulink.co.uk>,
Simon P Allen <simonal...@cix.compulink.co.uk> wrote:
|
|Of course, you made sure you were running the same shell on both the 
|Pentium and the Sparc to make the test perfectly fair?  Of course you 
|did...

I seem to be getting a zillion responses saying "swapping is fine,
you must be having some other problems".  Yet I've seen this
catastrophic swapping over and over.  Telling me again and again that
I must be hallucinating is not going to change this.

Here's another data point:

Just now I started up the latest XEmacs that I'm developing, and
then attached to it using gdb.  Normally this gives me no problems,
even though the running XEmacs is a fairly large application.
However, silly me, I decided to actually try and set a watchpoint.
I had done the same thing before on Sparcs when debugging XEmacs,
and found that the execution time was three or four times as slow
but otherwise worked fine. (Sparcs don't have hardware watchpoints
like the x86 does, so you have to use page faults.) I figured that
things would be at least as good on the x86, since (I think)
there are true hardware watchpoints. (Although I could be wrong,
maybe there are only hardware breakpoints.)

Anyway, this is not the behavior I observed.  Instead, my machine
started swapping madly (a not uncommon experience whenever I try
to do something out of the ordinary).  I tried to use C-c in the
gdb window to stop the madness, but it didn't do anything.  So
I went to the root window and brought up another xterm. (Note that
all these actions feel like molasses due to the swapping madness.)
At this point, my other swap partition comes into action. (I have
a 40 megabyte swap partition on a SCSI disk and a secondary 128
megabyte swap partition on an IDE disk, just to quell the idea
that I might not have enough swap.) After about 5 minutes, the
xterm finally appears.  So I run PS to find out the XEmacs process,
and I'm just about to kill it ...

And then Linux freezes up.  Disk activity shuts down to almost
nothing, and the mouse and keyboard are completely unresponsive.
Ctrl-Alt-Del doesn't work either.  I've seen this behavior once
or twice before when the swapping got catastrophic and didn't
stop of its own accord.  I ended up having to hard reset the
machine.

This is butt ugly.  Maybe, possibly, the freeze is a hardware
problem (although my SCSI controller is quite standard -- Adaptec
1542B -- and I've seen the same behavior with two completely
different motherboards with different chipsets, I/O controllers,
etc.).  You could perhaps try to blame gdb for the behavior
I just saw.  But there's no way that the kernel can escape
some blame.  It's just way too improbable.

ben
-- 
"... then the day came when the risk to remain tight in a bud was
more painful than the risk it took to blossom." -- Anais Nin