low-latency scheduling patch for 2.4.0

Path: supernews.google.com!sn-xit-03!supernews.com!
cyclone-sjo1.usenetserver.com!news-out.usenetserver.com!
newsxfer.interpacket.net!news-hog.berkeley.edu!ucberkeley!
hrotti.ifi.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3A57DA3E.6AB70887@uow.edu.au>
Original-Date: 	Sun, 07 Jan 2001 13:53:50 +1100
From: Andrew Morton <andr...@uow.edu.au>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0-test8 i586)
X-Accept-Language: en
MIME-Version: 1.0
To: lkml <linux-ker...@vger.kernel.org>,
        lad <linux-audio-...@ginette.musique.umontreal.ca>
Subject: low-latency scheduling patch for 2.4.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Sun, 7 Jan 2001 02:48:30 GMT
Message-ID: <fa.dsf5flv.e0i3p4@ifi.uio.no>
Lines: 49


A patch against kernel 2.4.0 final which provides low-latency
scheduling is at

	http://www.uow.edu.au/~andrewm/linux/schedlat.html#downloads

Some notes:

- Worst-case scheduling latency with *very* intense workloads is now
  0.8 milliseconds on a 500MHz uniprocessor.

  For normal workloads you can expect to achieve better than 0.5
  milliseconds for ever.  For example, worst-case latency between entry
  to an interrupt routine and activation of a usermode process during a
  `make clean && make bzImage' is 0.35 milliseconds.  This is one to
  three orders of magnitude better than BeOS, MacOS and the Windowses.

- Low latency is enabled from the `Processor type and features'
  kernel configuration menu for all architectures.  It would be nice to
  hear from non-x86 users.

- The SMP problem hasn't been addressed.  Enabling low-latency for
  SMP works well under normal workloads but comes unstuck under very
  heavy workloads.  I'll be taking a further look at this.

- The supporting tools `rtc_debug' and `amlat' have been updated. 
  These are quite useful tools for providing accurate measurement of
  latencies.  They may also be used to identify the causes of poor
  latency in the kernel.

- Remaining problem areas (the Don't Do That list) is pretty small:

  - Scrolling the fb console.
  - Running hdparm.
  - Using LILO
  - Starting the X server

- Low latency will probably only be achieved when using the ext2 and
  NFS filesystems.

- If you care about latency, be *very* cautious about upgrading to
  XFree86 4.x.  I'll cover this issue in a separate email, copied
  to the XFree team.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!newsfeed.ision.net!
ision!newsfeed.wirehub.nl!news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
hrotti.ifi.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
From: Jay Ts <j...@toltec.metran.cx>
Original-Message-Id: <200101110312.UAA06343@toltec.metran.cx>
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
To: andr...@uow.edu.au (Andrew Morton)
Original-Date: 	Wed, 10 Jan 2001 20:12:18 -0700 (MST)
Cc: linux-ker...@vger.kernel.org (lkml),
        linux-audio-...@ginette.musique.umontreal.ca (lad)
In-Reply-To: <3A57DA3E.6AB70887@uow.edu.au> from "Andrew Morton" at Jan 07, 2001 01:53:50 PM
Reply-To: ja...@bigfoot.com
X-Mailer: ELM [version 2.5 PL1]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 11 Jan 2001 03:13:33 GMT
Message-ID: <fa.gqp2b7v.16k67qu@ifi.uio.no>
References: <fa.dsf5flv.e0i3p4@ifi.uio.no>
Lines: 39

> A patch against kernel 2.4.0 final which provides low-latency
> scheduling is at
> 
> 	http://www.uow.edu.au/~andrewm/linux/schedlat.html#downloads
> 
> Some notes:
> 
> - Worst-case scheduling latency with *very* intense workloads is now
>   0.8 milliseconds on a 500MHz uniprocessor.

Wow!  That's super.  Now about the only thing left is to get it included
in the standard kernel.  Do you think Linus Torvalds is more likely
to accept these patches than Ingo's?  I sure hope this one works out.

>   This is one to
>   three orders of magnitude better than BeOS, MacOS and the Windowses.

** salivates **

> - Low latency will probably only be achieved when using the ext2 and
>   NFS filesystems.

Well it's extremely nice to see NFS included at least.  I was really
worried about that one.  What about Samba?  (Keeping in mind that
serious "professional" musicians will likely have their Linux systems
networked to a Windows box, at least until they have all the necessary
tools on Linux.

> - If you care about latency, be *very* cautious about upgrading to
>   XFree86 4.x.  I'll cover this issue in a separate email, copied
>   to the XFree team.

Did that email pass by me unnoticed?  What's the prob with XF86 4.0?

- Jay Ts
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!newsfeed.wirehub.nl!
news.maxwell.syr.edu!npeer.kpnqwest.net!EU.net!Norway.EU.net!uninett.no!
uio.no!nntp.uio.no!hrotti.ifi.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Wed, 10 Jan 2001 21:19:41 -0800
Original-Message-Id: <200101110519.VAA02784@pizda.ninka.net>
From: "David S. Miller" <da...@redhat.com>
To: andr...@uow.edu.au
CC: ja...@bigfoot.com, linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca, xp...@xfree86.org,
        mcric...@mpp.ecs.umass.edu
In-reply-to: <3A5D994A.1568A4D5@uow.edu.au> (message from Andrew Morton on
	Thu, 11 Jan 2001 22:30:18 +1100)
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
Original-References: <3A57DA3E.6AB70...@uow.edu.au> 
from "Andrew Morton" at Jan 07, 2001 01:53:50 PM 
<200101110312.UAA06...@toltec.metran.cx> <3A5D994A.1568A...@uow.edu.au>
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 11 Jan 2001 13:20:45 GMT
Message-ID: <fa.hcd89gv.15n8608@ifi.uio.no>
References: <fa.dmfdbtv.1m0es2r@ifi.uio.no>
Lines: 45


Just some commentary and a bug report on your patch Andrew:

Opinion: Personally, I think the approach in Andrew's patch
	 is the way to go.

	 Not because it can give the absolute best results.
	 But rather, it is because it says "here is where a lot
         of time is spent".

	 This has two huge benefits:
	 1) It tells us where possible algorithmic improvements may
	    be possible.  In some cases we may be able to improve the
	    code to the point where the pre-emption points are no
	    longer necessary and can thus be removed.
	 2) It affects only code which can burn a lot of cpu without
	    scheduling.  Compare this to schemes which make the kernel
	    fully pre-emptable, causing _EVERYONE_ to pay the price of
	    low-latency.  If we were to later fine algorithmic
	    improvements to the high-latency pieces of code, we
            couldn't then just "undo" support for pre-emption because
	    dependencies will have swept across the whole kernel
	    already.

            Pre-emption, by itself, also doesn't help in situations
	    where lots of time is spent while holding spinlocks.
	    There are several other operating systems which support
	    pre-emption where you will find hard coded calls to the
	    scheduler in time-consuming code.  Heh, it's almost like,
	    "what's the frigging point of pre-emption then if you
	    still have to manually check in some spots?"

Bug:	In the tcp_minisock.c changes, if you bail out of the loop
	early (ie. max_killed=1) you do not decrement tcp_tw_count
	by killed, which corrupts the state of the TIME_WAIT socket
	reaper.  The fix is simple, just duplicate the tcp_tw_count
	decrement into the "if (max_killed)" code block.

Later,
David S. Miller
da...@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!
cyclone-sjo1.usenetserver.com!news-out.usenetserver.com!
newsfeed.mesh.ad.jp!uio.no!nntp.uio.no!hrotti.ifi.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3A5D994A.1568A4D5@uow.edu.au>
Original-Date: 	Thu, 11 Jan 2001 22:30:18 +1100
From: Andrew Morton <andr...@uow.edu.au>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0 i586)
X-Accept-Language: en
MIME-Version: 1.0
To: ja...@bigfoot.com
CC: lkml <linux-ker...@vger.kernel.org>,
        lad <linux-audio-...@ginette.musique.umontreal.ca>, xp...@xfree86.org,
        "mcric...@mpp.ecs.umass.edu" <mcric...@mpp.ecs.umass.edu>
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
Original-References: <3A57DA3E.6AB70...@uow.edu.au> 
from "Andrew Morton" at Jan 07, 2001 01:53:50 PM 
<200101110312.UAA06...@toltec.metran.cx>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 11 Jan 2001 11:24:48 GMT
Message-ID: <fa.dmfdbtv.1m0es2r@ifi.uio.no>
References: <fa.gqp2b7v.16k67qu@ifi.uio.no>
Lines: 115

Jay Ts wrote:
> 
> > A patch against kernel 2.4.0 final which provides low-latency
> > scheduling is at
> >
> >       http://www.uow.edu.au/~andrewm/linux/schedlat.html#downloads
> >
> > Some notes:
> >
> > - Worst-case scheduling latency with *very* intense workloads is now
> >   0.8 milliseconds on a 500MHz uniprocessor.
> 
> Wow!  That's super.  Now about the only thing left is to get it included
> in the standard kernel.  Do you think Linus Torvalds is more likely
> to accept these patches than Ingo's?  I sure hope this one works out.

Neither, I think.

We can't apply some patch and say "there; it's low-latency".

We (or "he") need to decide up-front that Linux is to become
a low latency kernel. Then we need to decide the best way of
doing that.

Making the kernel internally preemptive is probably the best way of
doing this.  But it's a *big* task to which must beard-scratching must
be put.  It goes way beyond the preemptive-kernel patches which have
thus far been proposed.

I could propose a simple patch for 2.4 (say, the ten most-needed
scheduling points).  This would get us down to maybe 5-10 milliesconds
under heavy load (10-20x improvement).

That would probably be a great and sufficient improvement for
the HA heartbeat monitoring apps, the database TP monitors,
the QuakeIII players and, of course, people who are only
interested in audio record and playback - I'd need advice
from the audio experts for that.

I hope that one or more of the desktop-oriented Linux distributors
discover that hosing HTML out of gigE ports is not really the
One True Appplication of Linux, and that they decide to offer
a low-latency kernel for the other 99.99% of Linux users.

> >   This is one to
> >   three orders of magnitude better than BeOS, MacOS and the Windowses.
> 
> ** salivates **
> 
> > - Low latency will probably only be achieved when using the ext2 and
> >   NFS filesystems.
> 
> Well it's extremely nice to see NFS included at least.  I was really
> worried about that one.  What about Samba?  (Keeping in mind that
> serious "professional" musicians will likely have their Linux systems
> networked to a Windows box, at least until they have all the necessary
> tools on Linux.

I would expect the smbfs client code to be OK.  Will test - thanks.

> > - If you care about latency, be *very* cautious about upgrading to
> >   XFree86 4.x.  I'll cover this issue in a separate email, copied
> >   to the XFree team.
> 
> Did that email pass by me unnoticed?  What's the prob with XF86 4.0?

I haven't gathered the energy to send it.

The basic problem with many video cards is this:

Video adapters have on-board command FIFOs.  They also
have a "FIFO has spare room" control bit.

If you write to the FIFO when there is no spare room,
the damned thing busies the PCI bus until there *is*
room.  This can be up to twenty *milliseconds*.

This will screw up realtime operating systems,
will cause network receive overruns, will screw
up isochronous protocols such as USB and 1394
and will of course screw up scheduling latency.

In xfree3 it was OK - the drivers polled the "spare room"
bit before writing.  But in xfree4 the drivers are starting
to take advantage of this misfeature.  I am told that
a significant number of people are backing out xfree4
upgrades because of this.  For audio.

The manufacturers got caught out by the trade press
in '98 and '99 and they added registry flags to their
drivers to turn off this obnoxious behaviour.

What needs to happen is for the xfree guys to add a
control flag to XF86Config for this.  I believe they
have - it's called `PCIRetry'.

I believe PCIRetry defaults to `off'.  This is bad.
It should default to `on'.

You can read about this minor scandal at the following
URLs:

        http://www.zefiro.com/vgakills.txt
        http://www.zdnet.com/pcmag/news/trends/t980619a.htm
        http://www.research.microsoft.com/~mbj/papers/tr-98-29.html

So,  we need to talk to the xfree team.

Whoops!  I accidentally Cc'ed them :-)

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!
europa.netcrusader.net!193.162.153.122!news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Thu, 11 Jan 2001 12:55:35 -0800 (PST)
From: Nigel Gamble <ni...@nrg.org>
Reply-To: ni...@nrg.org
To: "David S. Miller" <da...@redhat.com>
cc: andr...@uow.edu.au, linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
In-Reply-To: <200101110519.VAA02784@pizda.ninka.net>
Original-Message-ID: <Pine.LNX.4.05.10101111233241.5936-100000@cosmic.nrg.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 11 Jan 2001 20:56:47 GMT
Message-ID: <fa.k290p4v.eiurg9@ifi.uio.no>
References: <fa.hcd89gv.15n8608@ifi.uio.no>
Lines: 54

On Wed, 10 Jan 2001, David S. Miller wrote:
> Opinion: Personally, I think the approach in Andrew's patch
> 	 is the way to go.
> 
> 	 Not because it can give the absolute best results.
> 	 But rather, it is because it says "here is where a lot
>          of time is spent".
> 
> 	 This has two huge benefits:
> 	 1) It tells us where possible algorithmic improvements may
> 	    be possible.  In some cases we may be able to improve the
> 	    code to the point where the pre-emption points are no
> 	    longer necessary and can thus be removed.

This is definitely an important goal.  But lock-metering code in a fully
preemptible kernel an also identify spots where algorithmic improvements
are most important.

> 	 2) It affects only code which can burn a lot of cpu without
> 	    scheduling.  Compare this to schemes which make the kernel
> 	    fully pre-emptable, causing _EVERYONE_ to pay the price of
> 	    low-latency.  If we were to later fine algorithmic
> 	    improvements to the high-latency pieces of code, we
>             couldn't then just "undo" support for pre-emption because
> 	    dependencies will have swept across the whole kernel
> 	    already.
> 
>             Pre-emption, by itself, also doesn't help in situations
> 	    where lots of time is spent while holding spinlocks.
> 	    There are several other operating systems which support
> 	    pre-emption where you will find hard coded calls to the
> 	    scheduler in time-consuming code.  Heh, it's almost like,
> 	    "what's the frigging point of pre-emption then if you
> 	    still have to manually check in some spots?"

Spinlocks should not be held for lots of time.  This adversely affects
SMP scalability as well as latency.  That's why MontaVista's kernel
preemption patch uses sleeping mutex locks instead of spinlocks for the
long held locks.  In a fully preemptible kernel that is implemented
correctly, you won't find any hard-coded calls to the scheduler in time
consuming code.  The scheduler should only be called in response to an
interrupt (IO or timeout) when we know that a higher priority process
has been made runnable, or when the running process sleeps (voluntarily
or when it has to wait for something) or exits.  This is the case in
both of the fully preemptible kernels which I've worked on (IRIX and
REAL/IX).

Nigel Gamble                                    ni...@nrg.org
Mountain View, CA, USA.                         http://www.nrg.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!hermes.visi.com!
news-out.visi.com!skynet.be!193.162.153.122.MISMATCH!news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
From: "David S. Miller" <da...@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Original-Message-ID: <14942.9759.730641.804611@pizda.ninka.net>
Original-Date: 	Thu, 11 Jan 2001 13:31:11 -0800 (PST)
To: ni...@nrg.org
Cc: andr...@uow.edu.au, linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
In-Reply-To: <Pine.LNX.4.05.10101111233241.5936-100000@cosmic.nrg.org>
Original-References: <200101110519.VAA02...@pizda.ninka.net>
	<Pine.LNX.4.05.10101111233241.5936-100...@cosmic.nrg.org>
X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Thu, 11 Jan 2001 21:32:19 GMT
Message-ID: <fa.fj4mibv.i5amai@ifi.uio.no>
References: <fa.k290p4v.eiurg9@ifi.uio.no>
Lines: 17


Nigel Gamble writes:
 > That's why MontaVista's kernel preemption patch uses sleeping mutex
 > locks instead of spinlocks for the long held locks.

Anyone who uses sleeping mutex locks is asking for trouble.  Priority
inversion is an issue I dearly hope we never have to deal with in the
Linux kernel, and sleeping SMP mutex locks lead to exactly this kind
of problem.

Later,
David S. Miller
da...@redhat.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!xfer13.netnews.com!
netnews.com!newsfeeds.belnet.be!news.belnet.be!news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3A5F04BC.1BD5981C@uow.edu.au>
Original-Date: 	Sat, 13 Jan 2001 00:21:00 +1100
From: Andrew Morton <andr...@uow.edu.au>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0 i586)
X-Accept-Language: en
MIME-Version: 1.0
To: "David S. Miller" <da...@redhat.com>
CC: ja...@bigfoot.com, linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca,
        mcric...@mpp.ecs.umass.edu
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
Original-References: <3A5D994A.1568A...@uow.edu.au> (message from Andrew Morton on
		Thu, 11 Jan 2001 22:30:18 +1100),
		<3A57DA3E.6AB70...@uow.edu.au> 
from "Andrew Morton" at Jan 07, 2001 01:53:50 PM 
<200101110312.UAA06...@toltec.metran.cx> <3A5D994A.1568A...@uow.edu.au> 
<200101110519.VAA02...@pizda.ninka.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Fri, 12 Jan 2001 13:15:15 GMT
Message-ID: <fa.dnejjlv.f74vas@ifi.uio.no>
References: <fa.hcd89gv.15n8608@ifi.uio.no>
Lines: 25

"David S. Miller" wrote:
> 
> ...
> Bug:    In the tcp_minisock.c changes, if you bail out of the loop
>         early (ie. max_killed=1) you do not decrement tcp_tw_count
>         by killed, which corrupts the state of the TIME_WAIT socket
>         reaper.  The fix is simple, just duplicate the tcp_tw_count
>         decrement into the "if (max_killed)" code block.

Well that was moderately stupid.  Thanks.  It doesn't seem to cause
problems in practice though.  Maybe in the longer term...

I believe the tcp_minisucks.c code needs redoing irrespective
of latency stuff.  It can spend several hundred milliseconds
in a timer handler, which is rather unsociable.

There are a number of moderately complex ways of smoothing out
its behaviour, but I'm inclined to just punt the whole thing
up to process context via schedule_task().

We'll see...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!xfer13.netnews.com!
netnews.com!europa.netcrusader.net!193.162.153.122!news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3A5F0706.6A8A8141@uow.edu.au>
Original-Date: 	Sat, 13 Jan 2001 00:30:46 +1100
From: Andrew Morton <andr...@uow.edu.au>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0 i586)
X-Accept-Language: en
MIME-Version: 1.0
To: ni...@nrg.org
CC: "David S. Miller" <da...@redhat.com>, linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
Original-References: <200101110519.VAA02...@pizda.ninka.net> 
<Pine.LNX.4.05.10101111233241.5936-100...@cosmic.nrg.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Fri, 12 Jan 2001 13:25:09 GMT
Message-ID: <fa.dkf1d5v.c7e1ag@ifi.uio.no>
References: <fa.k290p4v.eiurg9@ifi.uio.no>
Lines: 53

Nigel Gamble wrote:
> 
> Spinlocks should not be held for lots of time.  This adversely affects
> SMP scalability as well as latency.  That's why MontaVista's kernel
> preemption patch uses sleeping mutex locks instead of spinlocks for the
> long held locks.

Nigel,

what worries me about this is the Apache-flock-serialisation saga.

Back in -test8, kumon@fujitsu demonstrated that changing this:

	lock_kernel()
	down(sem)
	<stuff>
	up(sem)
	unlock_kernel()

into this:

	down(sem)
	<stuff>
	up(sem)

had the effect of *decreasing* Apache's maximum connection rate
on an 8-way from ~5,000 connections/sec to ~2,000 conn/sec.

That's downright scary.

Obviously, <stuff> was very quick, and the CPUs were passing through
this section at a great rate.

How can we be sure that converting spinlocks to semaphores
won't do the same thing?  Perhaps for workloads which we
aren't testing?

So this needs to be done with caution.

As davem points out, now we know where the problems are
occurring, a good next step is to redesign some of those
parts of the VM and buffercache.  I don't think this will
be too hard, but they have to *want* to change :)

Some of those algorithms are approximately O(N^2), for huge
values of N.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!
cyclone-sjo1.usenetserver.com!news-out.usenetserver.com!news.tele.dk!
129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Date: 	Fri, 12 Jan 2001 14:46:29 -0800 (PST)
From: Nigel Gamble <ni...@nrg.org>
Reply-To: ni...@nrg.org
To: Andrew Morton <andr...@uow.edu.au>
cc: "David S. Miller" <da...@redhat.com>, linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
In-Reply-To: <3A5F0706.6A8A8141@uow.edu.au>
Original-Message-ID: <Pine.LNX.4.05.10101121432270.8988-100000@cosmic.nrg.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Fri, 12 Jan 2001 22:47:26 GMT
Message-ID: <fa.k396qcv.8i4qo3@ifi.uio.no>
References: <fa.dkf1d5v.c7e1ag@ifi.uio.no>
Lines: 65

On Sat, 13 Jan 2001, Andrew Morton wrote:
> Nigel Gamble wrote:
> > Spinlocks should not be held for lots of time.  This adversely affects
> > SMP scalability as well as latency.  That's why MontaVista's kernel
> > preemption patch uses sleeping mutex locks instead of spinlocks for the
> > long held locks.
> 
> Nigel,
> 
> what worries me about this is the Apache-flock-serialisation saga.
> 
> Back in -test8, kumon@fujitsu demonstrated that changing this:
> 
> 	lock_kernel()
> 	down(sem)
> 	<stuff>
> 	up(sem)
> 	unlock_kernel()
> 
> into this:
> 
> 	down(sem)
> 	<stuff>
> 	up(sem)
> 
> had the effect of *decreasing* Apache's maximum connection rate
> on an 8-way from ~5,000 connections/sec to ~2,000 conn/sec.
> 
> That's downright scary.
> 
> Obviously, <stuff> was very quick, and the CPUs were passing through
> this section at a great rate.

Yes, this demonstrates that spinlocks are preferable to sleep locks for
short sections.  However, it looks to me like the implementation of up()
may be partly to blame.  It looks to me as if it tends to prefer to
context switch to the woken up process, instead of continuing to run the
current process.  Surrounding the semaphore with the BKL has the effect
of enforcing the latter behavior, because the semaphore itself will
never have any waiters.

> How can we be sure that converting spinlocks to semaphores
> won't do the same thing?  Perhaps for workloads which we
> aren't testing?
> 
> So this needs to be done with caution.
> 
> As davem points out, now we know where the problems are
> occurring, a good next step is to redesign some of those
> parts of the VM and buffercache.  I don't think this will
> be too hard, but they have to *want* to change :)

Yes, wherever the code can be redesigned to avoid long held locks, that
would definitely be my preferred solution.  I think everyone would be
happy if we could end up with a maintainable solution using only
spinlocks that are held for no longer than a couple of hundred
microseconds.

Nigel Gamble                                    ni...@nrg.org
Mountain View, CA, USA.                         http://www.nrg.org/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-02!supernews.com!newsfeed.mesh.ad.jp!
news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3A5F8E50.6720DF5E@mvista.com>
Original-Date: 	Fri, 12 Jan 2001 15:08:00 -0800
From: george anzinger <geo...@mvista.com>
X-Mailer: Mozilla 4.72 [en] (X11; I; Linux 2.2.12-20b i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Andrew Morton <andr...@uow.edu.au>
CC: ni...@nrg.org, "David S. Miller" <da...@redhat.com>,
        linux-ker...@vger.kernel.org,
        linux-audio-...@ginette.musique.umontreal.ca
Subject: Re: [linux-audio-dev] low-latency scheduling patch for 2.4.0
Original-References: <200101110519.VAA02...@pizda.ninka.net> 
<Pine.LNX.4.05.10101111233241.5936-100...@cosmic.nrg.org> <3A5F0706.6A8A8...@uow.edu.au>
Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Monta Vista Software
Date: Fri, 12 Jan 2001 23:12:35 GMT
Message-ID: <fa.cfo9mcv.186i1ip@ifi.uio.no>
References: <fa.dkf1d5v.c7e1ag@ifi.uio.no>
Lines: 72

Andrew Morton wrote:
> 
> Nigel Gamble wrote:
> >
> > Spinlocks should not be held for lots of time.  This adversely affects
> > SMP scalability as well as latency.  That's why MontaVista's kernel
> > preemption patch uses sleeping mutex locks instead of spinlocks for the
> > long held locks.
> 
> Nigel,
> 
> what worries me about this is the Apache-flock-serialisation saga.
> 
> Back in -test8, kumon@fujitsu demonstrated that changing this:
> 
>         lock_kernel()
>         down(sem)
>         <stuff>
>         up(sem)
>         unlock_kernel()
> 
> into this:
> 
>         down(sem)
>         <stuff>
>         up(sem)
> 
> had the effect of *decreasing* Apache's maximum connection rate
> on an 8-way from ~5,000 connections/sec to ~2,000 conn/sec.
> 
> That's downright scary.
> 
> Obviously, <stuff> was very quick, and the CPUs were passing through
> this section at a great rate.

If <stuff> was that fast, maybe the down/up should have been a spinlock
too.  But what if it is changed to:

      BKL_enter_mutx()
      down(sem)
      <stuff>
      up(sem)
      BKL_exit_mutex()
> 
> How can we be sure that converting spinlocks to semaphores
> won't do the same thing?  Perhaps for workloads which we
> aren't testing?

The key is to keep the fast stuff on the spinlock and the slow stuff on
the mutex.  Otherwise you WILL eat up the cpu with the overhead.
> 
> So this needs to be done with caution.
> 
> As davem points out, now we know where the problems are
> occurring, a good next step is to redesign some of those
> parts of the VM and buffercache.  I don't think this will
> be too hard, but they have to *want* to change :)

They will *want* to change if they pop up due to other work :)
> 
> Some of those algorithms are approximately O(N^2), for huge
> values of N.
> 
> -
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Path: supernews.google.com!sn-xit-03!supernews.com!logbridge.uoregon.edu!
newsfeed.cwix.com!news.tele.dk!129.240.148.23!uio.no!nntp.uio.no!
ifi.uio.no!internet-mailinglist
Newsgroups: fa.linux.kernel
Return-Path: <linux-kernel-ow...@vger.kernel.org>
Original-Message-ID: <3A618F17.FD285E2B@uow.edu.au>
Original-Date: 	Sun, 14 Jan 2001 22:35:51 +1100
From: Andrew Morton <andr...@uow.edu.au>
X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.4.0 i586)
X-Accept-Language: en
MIME-Version: 1.0
To: lkml <linux-ker...@vger.kernel.org>,
        lad <linux-audio-...@ginette.musique.umontreal.ca>
Subject: Re: low-latency scheduling patch for 2.4.0
Original-References: <3A57DA3E.6AB70...@uow.edu.au>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-ow...@vger.kernel.org
Precedence: bulk
X-Mailing-List: 	linux-kernel@vger.kernel.org
Organization: Internet mailing list
Date: Sun, 14 Jan 2001 11:30:02 GMT
Message-ID: <fa.dsgpjcv.d7mtha@ifi.uio.no>
References: <fa.dsf5flv.e0i3p4@ifi.uio.no>
Lines: 34

Andrew Morton wrote:
> 
> A patch against kernel 2.4.0 final which provides low-latency
> scheduling is at
> 
>         http://www.uow.edu.au/~andrewm/linux/schedlat.html#downloads
> 

This has been updated for 2.4.1-pre3

- Fixed latency problems with some /proc files and forking
  when many files are open.

- Fixed the tcp-minisocks thing.

- The patch now works properly on SMP.

  If a wakeup is directed to a SCHED_FIFO or SCHED_RR
  task then we request a reschedule on *all* non-idle
  CPUs.

  This causes any CPU which is holding a long-lived
  spinlock to bale out, allowing the target CPU to
  acquire the spinlock and then reschedule normally.

  Bit of a hack, but it works very well and there
  is no impact on the system unless there are
  non-SCHED_OTHER tasks running.

  Five lines of code :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/