From: Peter Waltenberg <pet...@dascom.com>
Subject: SMP Scheduling
Date: 1999/08/08
Message-ID: <fa.mkkckbv.d4k6o4@ifi.uio.no>#1/1
X-Deja-AN: 510397957
Original-Date: Mon, 09 Aug 1999 08:59:30 +1000 (EST)
Sender: owner-linux-ker...@vger.rutgers.edu
Content-Transfer-Encoding: 8bit
Original-Message-ID: <XFMail.990809085930.peterw@surf.dascom.com>
To: andreas.bo...@munich.netsurf.de, linux-ker...@vger.rutgers.edu
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Reply-To: pet...@dascom.com
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

I also have a dual CPU machine.

Under 2.0 if you ran a CPU hog it'd pretty well stick to one CPU.

I.e. if you had xosview running you'd see one CPU at 100%, the other mostly
idle. If there was a load burst, it might move to the other CPU, but that was
pretty unusual.

Under 2.2 you see that one CPU hog hopping CPU's and at regular intervals.
Using xosview to track load what you see is a picket fence effect.
And there are more than "3 processes" running, more like 80 on my machine,
so running xosview alone shouldn't be enough to force this to happen and if
it were, the other processes should be introducing enough noise to make the
CPU swapping more erratic.

This does seem to be "wrong", not so much that the process is changing CPU's,
thats reasonable, but the fact that it's doing it with such regularity now.

I know this has been reported before, and plausible explanations have been 
offered. However plausible isn't the same as "correct" and this does seem to
be a symptom of a real problem, or at least a real change in behaviour.


Peter

----------------------------------
E-Mail: Peter Waltenberg <pet...@surf.dascom.com>
Date: 09-Aug-99
Time: 08:45:38

This message was sent by XFMail
----------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Horst von Brand <vonbr...@sleipnir.valparaiso.cl>
Subject: Re: SMP Scheduling 
Date: 1999/08/09
Message-ID: <fa.ipb41gv.g6a58h@ifi.uio.no>#1/1
X-Deja-AN: 510457379
Original-Date: Sun, 08 Aug 1999 21:28:50 -0400
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-Id: <199908090128.VAA04908@sleipnir.valparaiso.cl>
References: <fa.mkkckbv.d4k6o4@ifi.uio.no>
To: pet...@dascom.com
X-Orcpt: rfc822;linux-kernel-outgoing-dig
X-charset: ISO_8859-1
Organization: Internet mailing list
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

Peter Waltenberg <pet...@dascom.com> said:
[...]

> Under 2.0 if you ran a CPU hog it'd pretty well stick to one CPU.

How is that an improvement?

> I.e. if you had xosview running you'd see one CPU at 100%, the other mostly
> idle. If there was a load burst, it might move to the other CPU, but that was
> pretty unusual.

> Under 2.2 you see that one CPU hog hopping CPU's and at regular intervals.
> Using xosview to track load what you see is a picket fence effect.
> And there are more than "3 processes" running, more like 80 on my machine,
> so running xosview alone shouldn't be enough to force this to happen and if
> it were, the other processes should be introducing enough noise to make the
> CPU swapping more erratic.

If you have that many processes running, your hog will have its state at
the CPU flushed anyway, so the CPU selected is irrelevant.

> This does seem to be "wrong", not so much that the process is changing CPU's,
> thats reasonable, but the fact that it's doing it with such regularity
> now.

File it under "random trivia" then ;-)
-- 
Horst von Brand                             vonbr...@sleipnir.valparaiso.cl
Casilla 9G, Viņa del Mar, Chile                               +56 32 672616

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Waltenberg <pet...@dascom.com>
Subject: Re: SMP Scheduling
Date: 1999/08/09
Message-ID: <fa.mi4akrv.bki5o3@ifi.uio.no>#1/1
X-Deja-AN: 510465552
Original-Date: Mon, 09 Aug 1999 13:24:17 +1000 (EST)
Sender: owner-linux-ker...@vger.rutgers.edu
Content-Transfer-Encoding: 8bit
Original-Message-ID: <XFMail.990809132417.peterw@surf.dascom.com>
References: <fa.ipb41gv.g6a58h@ifi.uio.no>
To: Horst von Brand <vonbr...@sleipnir.valparaiso.cl>
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Reply-To: pet...@dascom.com
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu


On 09-Aug-99 Horst von Brand wrote:
> Peter Waltenberg <pet...@dascom.com> said:
> [...]
> 
>> Under 2.0 if you ran a CPU hog it'd pretty well stick to one CPU.
> 
> How is that an improvement?
> 
>> I.e. if you had xosview running you'd see one CPU at 100%, the other mostly
>> idle. If there was a load burst, it might move to the other CPU, but that
>> was
>> pretty unusual.
> 
>> Under 2.2 you see that one CPU hog hopping CPU's and at regular intervals.
>> Using xosview to track load what you see is a picket fence effect.
>> And there are more than "3 processes" running, more like 80 on my machine,
>> so running xosview alone shouldn't be enough to force this to happen and if
>> it were, the other processes should be introducing enough noise to make the
>> CPU swapping more erratic.
> 
> If you have that many processes running, your hog will have its state at
> the CPU flushed anyway, so the CPU selected is irrelevant.
> 
>> This does seem to be "wrong", not so much that the process is changing
>> CPU's,
>> thats reasonable, but the fact that it's doing it with such regularity
>> now.
> 
> File it under "random trivia" then ;-)

I'd expect the process to be flushed, however in that case I'd expect it to be
re-run on some random CPU. However that doesn't seem to happen, the process
swaps CPU's at REGULAR intervals. The scheduler is supposedly designed so that a
process will have a tendency to run on the same CPU.

It's not that the process changes CPU's, it's that it's doing it at
regular intervals that I find worrying. 
Yes, that could just be coincidence, or it could be a real problem.

I would file it under random trivia, but the costs of moving a process from
CPU to CPU are (relatively) quite high compared with the other kernel
overheads and it SEEMS to be happening when it's not necessary. It's not
just me seeing this, I think we have 3 or 4 separate reports now.

Anyone want to produce figures for how large a % of our timeslice a cache
refill takes ?. I get a very small number at 100HZ, however if we increase
the scheduling rate that obviously gets worse fairly quickly.

And, is there anyone out there with a box with > 2 CPU's ?, if there's no
scheduler problem then with one CPU hog running I'd expect it to jump back and
forward  between two CPU's at most, if it gets cycled around all 4 CPU's in a
regular pattern I'd say there's very likely to be a problem.
 

Peter
----------------------------------
E-Mail: Peter Waltenberg <pet...@surf.dascom.com>
Date: 09-Aug-99
Time: 12:49:43

This message was sent by XFMail
----------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Andrea Arcangeli <and...@suse.de>
Subject: Re: SMP Scheduling
Date: 1999/08/09
Message-ID: <fa.j4b0fov.1656n3c@ifi.uio.no>#1/1
X-Deja-AN: 510546052
Original-Date: Mon, 9 Aug 1999 12:06:09 +0200 (CEST)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <Pine.LNX.4.10.9908091202070.7447-100000@laser.random>
References: <fa.mi4akrv.bki5o3@ifi.uio.no>
To: Peter Waltenberg <pet...@dascom.com>
X-Sender: and...@laser.random
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
X-Public-Key-URL: http://e-mind.com/~andrea/aa.asc
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

On Mon, 9 Aug 1999, Peter Waltenberg wrote:

>It's not that the process changes CPU's, it's that it's doing it at
>regular intervals that I find worrying. 

Please apply this patch against 2.2.10 or 2.3.x and let me know if it
helps:

	ftp://ftp.suse.com/pub/people/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C
	ftp://master.softaplic.com.br/pub/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C
	ftp://ftp.linux.it/pub/People/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C
	ftp://e-mind.com/pub/andrea/kernel-patches/2.2.10/SMP-scheduler-2.2.10-C

Andrea


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Waltenberg <pet...@dascom.com>
Subject: SMP Scheduling. Followup
Date: 1999/08/16
Message-ID: <fa.mekijjv.e4q5g3@ifi.uio.no>#1/1
X-Deja-AN: 513241626
Original-Date: Mon, 16 Aug 1999 14:30:51 +1000 (EST)
Sender: owner-linux-ker...@vger.rutgers.edu
Content-Transfer-Encoding: 8bit
Original-Message-ID: <XFMail.990816143051.peterw@surf.dascom.com>
To: linux-ker...@vger.rutgers.edu
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Reply-To: pet...@dascom.com
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

O.K. since I noted that there might be problems with SMP scheduling
I've collected quite a collection of replies. 
Some fall into the "plausible but not necessarilly correct" category

xosview is causing the problem.
A red herring. Yes running it can disturb the scheduler, is that the
cause of the problem ?, probably not. (See below).

Interrupts.
Yes, they get serviced on both CPU's, however interrupts don't get
scheduled, they just eat a (hopefully small) hole in cache and CPU time 
then go away again. 


And in the best traditions of Linux... here's the code.

Thanks to Andrea Arcangeli for the alternate scheduling policy. 
============================ CUT ================================
/* 
  Program to check for scheduling problems on SMP systems.
*/
        
#include <stdlib.h>
#include <time.h>
#define NITER 100000
/* change this to match your cache size */
#define BUFLEN (128*1024)       /* 128k (size of Celeron Cache) */
int p[BUFLEN/sizeof(int)];      /*Hopefully gcc aligns this for us */ 
void main()
{
time_t t,t1;    
        int i;
        while(1) {
                time(&t);       
                for(i = 0; i < NITER; i++)
                        memset(p,i++,BUFLEN);
                time(&t1);
                t1 -= t;
                printf("%d seconds for %d iterations\n",t1,NITER);
        }
}
============================ CUT ===================================

                Standard Scheduler              Andrea SMP-C
Console         17-18 seconds                   9-10 seconds

Console +       24-26 seconds                   9-10 seconds
X (xdm login)


I'll agree it's fairly pathogical, but it's also the limiting case of a 
well written x86 program. It does most of it's work in cache. 
Programs that are written with performance in mind will tend to approach this.

Note: This is the hit a single cache heavy process takes with the current
scheduler, it's possibly representative of games and some simulation work,
how well that relates to "real life" is another matter.

Results with multiple "hog" processes running also show Andrea's scheduler
performing better than the standard one. 

If anyone has doubts, there's the code. Adjust the buffer size to match the
cache size in your machine and run it yourself.

Andrea's patches are available on:
ftp://ftp.suse.com/pub/people/andrea/kernel-patches/


I'm not saying we should re-write the scheduler on the basis of one
pathological test case, BUT there is now hard evidence to show that
there are cases where the current scheduler is far from optimal, and 
that it can be altered to obtain substantial improvements.



Peter




----------------------------------
E-Mail: Peter Waltenberg <pet...@surf.dascom.com>
Date: 16-Aug-99
Time: 14:19:01

This message was sent by XFMail
----------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/