Very amusing DNS...

From: p...@3dillusion.com (Paul Miller)
Subject: Re: [OFFTOPIC] Very amusing DNS...
Date: 1998/06/17
Message-ID: <Pine.LNX.3.96.980616235359.6838A-100000@serv1.3dillusion.com>#1/1
X-Deja-AN: 363415092
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96.980616173734.1139A-100000@scitus>
Newsgroups: muc.lists.linux-kernel

A couple of days ago, they had a web page up at
http://linus.microsoft.com.  Guess what it was -- The default page for
Apache on a RedHat installation! 

hmm... I guess microsoft finally decided that windows was too unstable to
run.  Or, maybe they just wanted to steal some of the source code!

-Paul

On Tue, 16 Jun 1998, Spirilis wrote:

> Hmm...
> 
> <root>:/root# nslookup 131.107.74.11 198.6.1.1
> Server:  cache00.ns.uu.net
> Address:  198.6.1.1
> 
> Name:    linus.microsoft.com
> Address:  131.107.74.11
> 
> 
> <root>:/root# nslookup linus.microsoft.com 198.6.1.1
> Server:  cache00.ns.uu.net
> Address:  198.6.1.1
> 
> Non-authoritative answer:
> Name:    linus.microsoft.com
> Address:  131.107.74.11
> 
> I wonder what MS uses that host for? ;-)
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.rutgers.edu
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: barba...@mail.cis.fordham.edu (Anthony Barbachan)
Subject: Re: [OFFTOPIC] Very amusing DNS...
Date: 1998/06/18
Message-ID: <009b01bd9a8a$e7b67260$04c809c0@Fake.Domain.com>#1/1
X-Deja-AN: 363784533
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
Newsgroups: muc.lists.linux-kernel

-----Original Message-----
From: Paul Miller <p...@3dillusion.com>
To: Spirilis <spiri...@mindmeld.dyn.ml.org>
Cc: linux-ker...@vger.rutgers.edu <linux-ker...@vger.rutgers.edu>
Date: Tuesday, June 16, 1998 11:27 PM
Subject: Re: [OFFTOPIC] Very amusing DNS...

>
>A couple of days ago, they had a web page up at
>http://linus.microsoft.com.  Guess what it was -- The default page for
>Apache on a RedHat installation!
>

This could mean that they have finally started porting IE 4.01 to Linux as
they have done for Solaris and HPUX.  I heard that the IE for UNIX
programmers were all (or at least mostly) Linux guys, they may have
convinced MS to release IE for Linux.  Or they might just have been
compiling Apache 1.3.0 with frontpage extensions (and the other bundled
utilities) for Linux.  If it is IE, the addition of MS as an application
provider for Linux should be benifitial to us.

>hmm... I guess microsoft finally decided that windows was too unstable to
>run.  Or, maybe they just wanted to steal some of the source code!
>
>-Paul
>
>On Tue, 16 Jun 1998, Spirilis wrote:
>
>> Hmm...
>>
>> <root>:/root# nslookup 131.107.74.11 198.6.1.1
>> Server:  cache00.ns.uu.net
>> Address:  198.6.1.1
>>
>> Name:    linus.microsoft.com
>> Address:  131.107.74.11
>>
>>
>> <root>:/root# nslookup linus.microsoft.com 198.6.1.1
>> Server:  cache00.ns.uu.net
>> Address:  198.6.1.1
>>
>> Non-authoritative answer:
>> Name:    linus.microsoft.com
>> Address:  131.107.74.11
>>
>> I wonder what MS uses that host for? ;-)
>>
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
>> the body of a message to majord...@vger.rutgers.edu
>>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majord...@vger.rutgers.edu
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: n...@bayside.net
Subject: Re: [OFFTOPIC] Very amusing DNS...
Date: 1998/06/19
Message-ID: <Pine.LNX.3.96.980618213107.2725C-100000@nuklear.steelcity.net>#1/1
X-Deja-AN: 363930472
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <009b01bd9a8a$e7b67260$04c809c0@Fake.Domain.com>
Newsgroups: muc.lists.linux-kernel

> >
> >A couple of days ago, they had a web page up at
> >http://linus.microsoft.com.  Guess what it was -- The default page for
> >Apache on a RedHat installation!
> >
> 
> This could mean that they have finally started porting IE 4.01 to Linux as
> they have done for Solaris and HPUX.  I heard that the IE for UNIX
> programmers were all (or at least mostly) Linux guys, they may have
> convinced MS to release IE for Linux.  Or they might just have been
> compiling Apache 1.3.0 with frontpage extensions (and the other bundled
> utilities) for Linux.  If it is IE, the addition of MS as an application
> provider for Linux should be benifitial to us.
> 
> >hmm... I guess microsoft finally decided that windows was too unstable to
> >run.  Or, maybe they just wanted to steal some of the source code!

oh, you haven't read http://www.microsoft.com/ie/unix/devs.htm yet?

a quick quote from the page:

And the fact is that both Chapman and Dawson [IE4/solaris developers] have
grown quite comfortable shuttling back and forth between the worlds of
Windows and UNIX. "It's amazing to me how far UNIX has to go today to
catch up to NT," says Dawson. "Take, just for one example, threading
support. UNIX still has benefits, but NT is just a lot more
full-featured."

it's good for a laugh, at least :)
 _        _  __     __             _ _                                  _
|        / |/ /_ __/ /_____         |       Nuke Skyjumper               |
|       /    / // /  '_/ -_)        |         "Master of the Farce"      |
|_     /_/|_/\_,_/_/\_\\__/        _|_           n...@bayside.net       _|

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: alex.bu...@tahallah.demon.co.uk (Alex Buell)
Subject: Re: [OFFTOPIC] Very amusing DNS...
Date: 1998/06/18
Message-ID: <35895509.2A79@tahallah.demon.co.uk>#1/1
X-Deja-AN: 363936230
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96.980618213107.2725C-100000@nuklear.steelcity.net>
Organization: Advanced Buell Software Engineering Ltd
Newsgroups: muc.lists.linux-kernel

n...@bayside.net wrote:

> And the fact is that both Chapman and Dawson [IE4/solaris developers] 
> have grown quite comfortable shuttling back and forth between the 
> worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go 
> today to catch up to NT," says Dawson. "Take, just for one example, 
> threading support. UNIX still has benefits, but NT is just a lot more
> full-featured."

OH HAHAHAHA!!! I haven't laughed so much since the time someone fell on
a wall and mangled his private bits. Who are Chapman and Dawson kidding?
HAHAHA!! I can't believe these two are Solaris developers and yet come
out with this tripe?!

-- 
Cheers,
Alex.

Watch out, the NSA are everywhere. Your computer must be watched!

 /\_/\  Legalise cannabis now! 
( o.o ) Smoke some cannabis today! 
 > ^ <  Peace, Love, Unity and Respect to all.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: [OFFTOPIC] Very amusing DNS...
Date: 1998/06/18
Message-ID: <Pine.LNX.3.96dg4.980618112252.15896G-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 363941479
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <35895509.2A79@tahallah.demon.co.uk>
Newsgroups: muc.lists.linux-kernel

On Thu, 18 Jun 1998, Alex Buell wrote:

> n...@bayside.net wrote:
> 
> > And the fact is that both Chapman and Dawson [IE4/solaris developers] 
> > have grown quite comfortable shuttling back and forth between the 
> > worlds of Windows and UNIX. "It's amazing to me how far UNIX has to go 
> > today to catch up to NT," says Dawson. "Take, just for one example, 
> > threading support. UNIX still has benefits, but NT is just a lot more
> > full-featured."
> 
> OH HAHAHAHA!!! I haven't laughed so much since the time someone fell on
> a wall and mangled his private bits. Who are Chapman and Dawson kidding?
> HAHAHA!! I can't believe these two are Solaris developers and yet come
> out with this tripe?!

Have you worked with threads under NT and worked with threads under, say,
linux?  Linux is in the dark ages as far as threads go.  There's
linuxthreads, but to debug them you need to patch the kernel.  You don't
get core dumps without another kernel patch.  gdb doesn't support it all
directly, unless you patch it.  None of that has made it into the main
distributions.

Even with the debugging problems solved, linuxthreads are heavier than
solaris pthreads or NT fibers.  Both of those use a multiplexed user-level
and kernel-level threading system which results in fewer kernel context
switches.  In userland a "context switch" is just a function call.  But
we'll see this solved with Netscape's NSPR which was released with mozilla
-- it provides a multiplexed threading model (that particular model isn't
ported to linux yet).  There's a paper from sun regarding solaris
pthreads, see
<http://www.arctic.org/~dgaudet/apache/2.0/impl_threads.ps.gz> for a copy
of it.  You may also want to visit the JAWS papers at
<http://www.cs.wustl.edu/~jxh/research/research.html> for more discussion
on various threading paradigms. 

Have you read my posts regarding file descriptors and other unix semantics
that are "unfortunate" when threading?  They're not the end of the world,
but it's really obvious once you start digging into things that much of
unix was designed with a process in mind.  For example, on NT there is
absolutely no problem with opening up 10000 files at the same time and
holding onto the file handles.  This is exactly what's required to build a
top end webserver to get winning Specweb96 numbers on NT using
TransmitFile.  On unix there's no TransmitFile, and instead we end up
using mmap() which has performance problems.  Even if we had TransmitFile,
10k file descriptors isn't there.  "You have to recompile your kernel for
that."  Uh, no thanks, I have a hard enough time getting webserver
reviewers to use the right configuration file, asking them to recompile a
kernel is absolutely out of the question. 

Unix multiplexing facilities -- select and poll -- are wake-all
primitives.  When something happens, everything waiting is awakened and
immediately starts fighting for something to do.  What a waste.  They make
a lot of sense for processes though.  On NT completion ports provide
wake-one semantics... which are perfect for threads.

NT may not be stable, but there's a lot of nice ideas in there.  Don't
just shoo it away saying "pah, that's microsoft's piece of crap".  DEC had
their hand in some of the architecture.

Dean

P.S. And now I'll go ask myself why I'm even responding to an advocacy
thread on linux-kernel. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: da...@dm.cobaltmicro.com (David S. Miller)
Subject: Thread implementations...
Date: 1998/06/19
Message-ID: <199806190241.TAA03833@dm.cobaltmicro.com>#1/1
X-Deja-AN: 364067658
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980618112252.15896G-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

   Date: 	Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
   From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org>

[ My commented is not directed to Dean or anyone in particular,
  there were just some things I wanted to state in general wrt.
  to the issues raised here. ]

   Even with the debugging problems solved, linuxthreads are heavier
   than solaris pthreads or NT fibers.  Both of those use a
   multiplexed user-level and kernel-level threading system which
   results in fewer kernel context switches.  In userland a "context
   switch" is just a function call.  But we'll see this solved with
   Netscape's NSPR which was released with mozilla -- it provides a
   multiplexed threading model (that particular model isn't ported to
   linux yet).

Making threads under Linux not be multiplexed at the user side was a
conscious design decision.  Doing it half in user half in kernel (and
this is the distinction being mentioned when Solaris nomenclature
speaks of kernel bound and non-kernel bound threads) leads to enormous
levels of complexity for fundamental things such a signal handling.

The folks at Solaris spent a lot of time fixing bugs that were solely
getting signals right in their threads implementation.  Keeping track
of what the kernel sends to a "kernel bound thread" and making sure
the right "pure user thread" within gets that signal correctly is
tricky buisness.  It's complex and hell to get right.  (search the
Solaris patch databases for "threads" and "signals" to see that I'm
for real here about how difficult it is to get right)

This is why we do it the way we do it.

   For example, on NT there is absolutely no problem with opening up
   10000 files at the same time and holding onto the file handles.
   This is exactly what's required to build a top end webserver to get
   winning Specweb96 numbers on NT using TransmitFile.

Yes, I know this.

   On unix there's no TransmitFile, and instead we end up using mmap()
   which has performance problems.  Even if we had TransmitFile, 10k
   file descriptors isn't there.

One thing to keep in mind when people start howling "xxx OS allows
such and such feature and Linux still does not yet, why is it so
limited etc.???"  Go do a little research, and find out what the cost
of 10k file descriptors capability under NT is for processes which
don't use nearly that many.

I know, without actually being able to look at how NT does it, it's
hard to say for sure.  But I bet low end processes pay a bit of a
price so these high end programs can have the facility.

This is the reason Linux is still upcoming with the feature.  We won't
put it in until we come up with an implementation which costs next to
nothing for "normal" programs.

   "You have to recompile your kernel for that."  Uh, no thanks, I
   have a hard enough time getting webserver reviewers to use the
   right configuration file, asking them to recompile a kernel is
   absolutely out of the question.

I actually don't tell people to do this.  Instead I tell them to find
a solution within the current framework, and that what they are after
is in fact in the works.  If someone can't make it work in the current
framework, Linux is not for them at least for now.  A bigger danger
than losing users or apps for the moment due to missing features, is
to mis-design something and end up paying for it forever, this is the
path other unixs have gone down.

   Unix multiplexing facilities -- select and poll -- are wake-all
   primitives.  When something happens, everything waiting is awakened
   and immediately starts fighting for something to do.  What a waste.
   They make a lot of sense for processes though.  On NT completion
   ports provide wake-one semantics... which are perfect for threads.

Yes, this does in fact suck.  However, the path to go down is not to
expect the way select/poll work to change, rather look at other
existing facilities or invent new ones which solve this problem.
Too much user code exists which depends upon the wake-all semantics,
so the only person to blame is whoever designed the behaviors of these
unix operations to begin with ;-)

Later,
David S. Miller
da...@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: a...@muc.de (Andi Kleen)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <k27m2egm8h.fsf@zero.aec.at>#1/1
X-Deja-AN: 364077104
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980618112252.15896G-100000@twinlark.arctic.org> 
Newsgroups: muc.lists.linux-kernel

"David S. Miller" <da...@dm.cobaltmicro.com> writes:

> The folks at Solaris spent a lot of time fixing bugs that were solely
> getting signals right in their threads implementation.  Keeping track
> of what the kernel sends to a "kernel bound thread" and making sure
> the right "pure user thread" within gets that signal correctly is
> tricky buisness.  It's complex and hell to get right.  (search the
> Solaris patch databases for "threads" and "signals" to see that I'm
> for real here about how difficult it is to get right)

Linux (LinuxThreads) has is it not really right unfortunately. There is 
no way to send a signal to a process consisting of multiple threads and
it to be delivered to the first thread that has it unblocked (as defined
in POSIX) - it will be always delivered to the thread with the pid it was 
directed to.

To fix it CLONE_PID would need to be made fully working. 

Unfortunately that opens a can of worms - either a new tid is needed (with
new system calls etc. - ugly), or the the upper 16bits of pid space are
reused - but those are already allocated from Beowulf. 

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: spiri...@mindmeld.dyn.ml.org (Spirilis)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <Pine.LNX.3.96.980619001045.17049A-100000@mindmeld.dyn.ml.org>#1/1
X-Deja-AN: 364084144
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806190241.TAA03833@dm.cobaltmicro.com>
Newsgroups: muc.lists.linux-kernel

On Thu, 18 Jun 1998, David S. Miller wrote:

> 
>    For example, on NT there is absolutely no problem with opening up
>    10000 files at the same time and holding onto the file handles.
>    This is exactly what's required to build a top end webserver to get
>    winning Specweb96 numbers on NT using TransmitFile.
> 
> Yes, I know this.

Is it not possible to configure Linux to be able to use 10k or greater file
descriptors (in 2.1.xxx) by tweaking /proc/sys/fs/file-max and inode-max?
(shooting down the earlier comment regarding recompiling the kernel to allow 10k
or greater file descriptors...)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <Pine.LNX.3.96dg4.980618222356.18429D-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 364103122
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806190241.TAA03833@dm.cobaltmicro.com>
Newsgroups: muc.lists.linux-kernel

On Thu, 18 Jun 1998, David S. Miller wrote:

>    Date: 	Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
>    From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org>
> 
> [ My commented is not directed to Dean or anyone in particular,
>   there were just some things I wanted to state in general wrt.
>   to the issues raised here. ]
> 
>    Even with the debugging problems solved, linuxthreads are heavier
>    than solaris pthreads or NT fibers.  Both of those use a
>    multiplexed user-level and kernel-level threading system which
>    results in fewer kernel context switches.  In userland a "context
>    switch" is just a function call.  But we'll see this solved with
>    Netscape's NSPR which was released with mozilla -- it provides a
>    multiplexed threading model (that particular model isn't ported to
>    linux yet).
> 
> Making threads under Linux not be multiplexed at the user side was a
> conscious design decision.  Doing it half in user half in kernel (and
> this is the distinction being mentioned when Solaris nomenclature
> speaks of kernel bound and non-kernel bound threads) leads to enormous
> levels of complexity for fundamental things such a signal handling.

Sure.  If you need signals that sucks.  This makes pthreads really hard to
split up like this, and I can totally see why linuxthreads is the way it
is.

But something like NSPR which requires folks to write in a dialect that is
portable between unix and NT (and still access performance features on
both) doesn't have signals... because asynchronous signalling leads to far
too many race conditions and other crap, it's not even considered good
programming practice these days.  I don't miss it at all.  NSPR gives me
primitives like PR_Send() which writes data, with a timeout....  which
nails the main thing I would use signals for in posix -- for timeouts.

(For reference NSPR on linux defaults to single process, multiplexed via
poll/select.  It can be compiled to use pthreads directly, which also
works on linux.  It has a hybrid mode that hasn't been ported to linux
yet.) 

> One thing to keep in mind when people start howling "xxx OS allows
> such and such feature and Linux still does not yet, why is it so
> limited etc.???"  Go do a little research, and find out what the cost
> of 10k file descriptors capability under NT is for processes which
> don't use nearly that many.
> 
> I know, without actually being able to look at how NT does it, it's
> hard to say for sure.  But I bet low end processes pay a bit of a
> price so these high end programs can have the facility.

I'm not sure.  Did you see my extended file handles proposal?  I carefully
avoided O(n) crap, I think it can be done O(1) for everything but process
destruction (where you have to scan the open descriptors).  And the stuff
I was proposing is close to what NT provides.  But of course it's not
POSIX :)

Briefly, an extended file handle is a global index, all processes get
handles out of this single space.  To implement access rights you place an
extra field in each file structure, call it file_access_right.  Each
process also has a file_access_right, they have to compare equal for the
handle's use to be permitted.  exec() causes a new file_access_right to be
selected.  fork() uses the same file_access_right (to set up exec),
clone() uses the same file_access_right.

This is essentially what NT provides.  They don't have fork -- when you
create a process you explicitly decide which handles will be passed into
the new process... and they're given new addresses in the new process.  To
do that with my scheme you first need to dup an extended fh into a regular
handle.  NT does that "behind the scenes". 

>    Unix multiplexing facilities -- select and poll -- are wake-all
>    primitives.  When something happens, everything waiting is awakened
>    and immediately starts fighting for something to do.  What a waste.
>    They make a lot of sense for processes though.  On NT completion
>    ports provide wake-one semantics... which are perfect for threads.
> 
> Yes, this does in fact suck.  However, the path to go down is not to
> expect the way select/poll work to change, rather look at other
> existing facilities or invent new ones which solve this problem.
> Too much user code exists which depends upon the wake-all semantics,
> so the only person to blame is whoever designed the behaviors of these
> unix operations to begin with ;-)

Right, I've said before that I don't care what the facility looks like, as
long as it provides wake-one :) 

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <199806191136.VAA09491@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 364163551
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806190241.TAA03833@dm.cobaltmicro.com>
Newsgroups: muc.lists.linux-kernel

David S. Miller writes:
>    Date: 	Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
>    From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org>
[...]
>    Unix multiplexing facilities -- select and poll -- are wake-all
>    primitives.  When something happens, everything waiting is awakened
>    and immediately starts fighting for something to do.  What a waste.
>    They make a lot of sense for processes though.  On NT completion
>    ports provide wake-one semantics... which are perfect for threads.
> 
> Yes, this does in fact suck.  However, the path to go down is not to
> expect the way select/poll work to change, rather look at other
> existing facilities or invent new ones which solve this problem.
> Too much user code exists which depends upon the wake-all semantics,
> so the only person to blame is whoever designed the behaviors of these
> unix operations to begin with ;-)

On the other hand you could say that the UNIX semantics are fine and
are quite scalable, provided you use them sensibly. Some of these
"problems" are due to applications not being properly thought out in
the first place. If for example you have N threads each polling a
chunk of FDs, things can run well, provided you don't have *each*
thread polling *all* FDs. Of course, you want to use poll(2) rather
than select(2), but other than that the point stands.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: a...@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <m0yn0VT-000aOnC@the-village.bc.nu>#1/1
X-Deja-AN: 364179101
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96.980619001045.17049A-100000@mindmeld.dyn.ml.org>
Newsgroups: muc.lists.linux-kernel

> >    10000 files at the same time and holding onto the file handles.
> >    This is exactly what's required to build a top end webserver to get
> >    winning Specweb96 numbers on NT using TransmitFile.
> > 
> > Yes, I know this.
> 
> Is it not possible to configure Linux to be able to use 10k or greater file
> descriptors (in 2.1.xxx) by tweaking /proc/sys/fs/file-max and inode-max?
> (shooting down the earlier comment regarding recompiling the kernel to allow 10k
> or greater file descriptors...)

With Bill Hawes patches for handling file arrays it is. For the generic case
its not. Note that you can forget using select() with 10K descriptors
if you ever want to get any work done.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: abel...@phobos.illtel.denver.co.us (Alex Belits)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <Pine.LNX.3.96.980619055839.26631A-100000@phobos.illtel.denver.co.us>#1/1
X-Deja-AN: 364188372
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806191136.VAA09491@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Fri, 19 Jun 1998, Richard Gooch wrote:

> David S. Miller writes:
> >    Date: 	Thu, 18 Jun 1998 11:37:28 -0700 (PDT)
> >    From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org>
> [...]
> >    Unix multiplexing facilities -- select and poll -- are wake-all
> >    primitives.  When something happens, everything waiting is awakened
> >    and immediately starts fighting for something to do.  What a waste.
> >    They make a lot of sense for processes though.  On NT completion
> >    ports provide wake-one semantics... which are perfect for threads.
> > 
> > Yes, this does in fact suck.  However, the path to go down is not to
> > expect the way select/poll work to change, rather look at other
> > existing facilities or invent new ones which solve this problem.
> > Too much user code exists which depends upon the wake-all semantics,
> > so the only person to blame is whoever designed the behaviors of these
> > unix operations to begin with ;-)
> 
> On the other hand you could say that the UNIX semantics are fine and
> are quite scalable, provided you use them sensibly. Some of these
> "problems" are due to applications not being properly thought out in
> the first place. 

#ifdef SARCASM

"Thundering Herd Problem II", with all original cast... ;-) This time it's
not accept(), but poll(), and the whole thing is multithreaded...

#endif

> If for example you have N threads each polling a
> chunk of FDs, things can run well, provided you don't have *each*
> thread polling *all* FDs. Of course, you want to use poll(2) rather
> than select(2), but other than that the point stands.

  Can anyone provide a clear explanation, what is the benefit of doing
that in multiple threads vs. having one thread polling everything, if the
response on fd status change takes negligible time for the thread/process
that is polling them (other processes complete the operation while polling
comtinues)? I have a server that uses separate process mostly for polling,
however I'm not sure what poll()/select() scalability problems it may
encounter if used with huge fd number.

--
Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: da...@dm.cobaltmicro.com (David S. Miller)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <199806191311.GAA14665@dm.cobaltmicro.com>#1/1
X-Deja-AN: 364193482
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96.980619055839.26631A-100000@phobos.illtel.denver.co.us>
Newsgroups: muc.lists.linux-kernel

   Date: Fri, 19 Jun 1998 06:11:10 -0700 (PDT)
   From: Alex Belits <abel...@phobos.illtel.denver.co.us>

     Can anyone provide a clear explanation, what is the benefit of
   doing that in multiple threads vs. having one thread polling
   everything, if the response on fd status change takes negligible
   time for the thread/process that is polling them (other processes
   complete the operation while polling comtinues)? I have a server
   that uses separate process mostly for polling, however I'm not sure
   what poll()/select() scalability problems it may encounter if used
   with huge fd number.

I look at it this way.

If you can divide the total set of fd's logically into seperate
groups, one strictly to a particular thread.  Do it this way.
The problem with one thread polling all fd's and passing event
notification to threads via some other mechanism has the problem that
this one thread becomes the bottle neck.

The problem, for one, with web etc. servers is the incoming connection
socket.  If you could tell select/poll "hey, when a new conn comes in,
wake up one of us", poof this issue would be solved.  However the
defined semantics for these interfaces says to wake everyone polling
on it up.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: rjo...@orchestream.com (Richard Jones)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <358A7A62.BCE36B77@orchestream.com>#1/1
X-Deja-AN: 364218529
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96.980619055839.26631A-100000@phobos.illtel.denver.co.us> 
Organization: Orchestream Ltd.
Newsgroups: muc.lists.linux-kernel

David S. Miller wrote:
> The problem, for one, with web etc. servers is the incoming connection
> socket.  If you could tell select/poll "hey, when a new conn comes in,
> wake up one of us", poof this issue would be solved.  However the
> defined semantics for these interfaces says to wake everyone polling
> on it up.

Apache handles this very nicely. It runs a group of processes,
and each *blocks* on accept(2). When a new connection comes in,
the kernel wakes up one, which handles that socket alone, using
blocking I/O (it uses alarm(2) to do timeouts).

This way they avoid the poll/select issue entirely.

[This applies to Apache 1.2, not sure about later versions]

Rich.

-- 
Richard Jones rjo...@orchestream.com Tel: +44 171 598 7557 Fax: 460 4461
Orchestream Ltd.  125 Old Brompton Rd. London SW7 3RP PGP: www.four11.com
"boredom ... one of the most overrated emotions ... the sky is made
of bubbles ..."   Original message content Copyright © 1998

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: f...@omnicron.com (Mike Ford Ditto)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <358AB36A.30CD@yoda.omnicron.com>#1/1
X-Deja-AN: 364284416
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <358A7A62.BCE36B77@orchestream.com>
Newsgroups: muc.lists.linux-kernel

> > The problem, for one, with web etc. servers is the incoming connection
> > socket.  If you could tell select/poll "hey, when a new conn comes in,
> > wake up one of us", poof this issue would be solved.  However the
> > defined semantics for these interfaces says to wake everyone polling
> > on it up.
>
> Apache handles this very nicely. It runs a group of processes,
> and each *blocks* on accept(2). When a new connection comes in,
> the kernel wakes up one, which handles that socket alone, using
> blocking I/O (it uses alarm(2) to do timeouts).

This demonstrates the point that select and poll are workarounds for
the lack of threading support in Unix.  They aren't needed if you use
a threads facility (or a separate process for each thread you need).

Once you have threads you can stick to the intuitive synchronous model
of system calls, which has always effectively handled waking one of
multiple waiters.

Off topic, I would like to pick a nit:

accept() is a system call.  accept(2) is not a system call, it is a
manual page.  One doesn't block on accept(2), one *reads* accept(2)
to find out how to use accept().

					-=] Ford [=-

"Heaven is exactly like where you	(In Real Life:  Mike Ditto)
are right now, only much, much better."	f...@omnicron.com
 -- Laurie Anderson			http://www.omnicron.com/~ford/ford.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: a...@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Thread implementations...
Date: 1998/06/19
Message-ID: <m0yn6zH-000aOpC@the-village.bc.nu>#1/1
X-Deja-AN: 364289956
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <358AB36A.30CD@yoda.omnicron.com>
Newsgroups: muc.lists.linux-kernel

> > the kernel wakes up one, which handles that socket alone, using
> > blocking I/O (it uses alarm(2) to do timeouts).
> 
> This demonstrates the point that select and poll are workarounds for
> the lack of threading support in Unix.  They aren't needed if you use
> a threads facility (or a separate process for each thread you need).

Actually select and poll are more efficient ways of describing most
multiple source event models without the overhead of threads.

And there are plenty of cases where each one is better. Select is clearly
a better model for inetd for example.

> accept() is a system call.  accept(2) is not a system call, it is a
> manual page.  One doesn't block on accept(2), one *reads* accept(2)
> to find out how to use accept().

Using accept(2) to indicate you are talking about the system call goes
back to at least my student days read comp.unix.wizards

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/20
Message-ID: <Pine.LNX.3.96dg4.980619181258.29884c-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 364363398
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806191136.VAA09491@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Fri, 19 Jun 1998, Richard Gooch wrote:

> On the other hand you could say that the UNIX semantics are fine and
> are quite scalable, provided you use them sensibly. Some of these
> "problems" are due to applications not being properly thought out in
> the first place. If for example you have N threads each polling a
> chunk of FDs, things can run well, provided you don't have *each*
> thread polling *all* FDs. Of course, you want to use poll(2) rather
> than select(2), but other than that the point stands.

You may not be able to exploit the parallism available in the hardware
unless you can "load balance" the descriptors well enough...

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/20
Message-ID: <199806200952.TAA16430@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 364448988
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980619181258.29884c-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

Dean Gaudet writes:
> 
> On Fri, 19 Jun 1998, Richard Gooch wrote:
> 
> > On the other hand you could say that the UNIX semantics are fine and
> > are quite scalable, provided you use them sensibly. Some of these
> > "problems" are due to applications not being properly thought out in
> > the first place. If for example you have N threads each polling a
> > chunk of FDs, things can run well, provided you don't have *each*
> > thread polling *all* FDs. Of course, you want to use poll(2) rather
> > than select(2), but other than that the point stands.
> 
> You may not be able to exploit the parallism available in the hardware
> unless you can "load balance" the descriptors well enough...

Use 10 threads. Seems to me that would provide reasonable load
balancing. And increasing that to 100 threads would be even better.
The aim is to ensure that, statistically, most threads will remain
sleeping for several clock ticks.
With a bit of extra work you could even slowly migrate consistently
active FDs to one or a few threads.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: l...@bitmover.com (Larry McVoy)
Subject: Re: Thread implementations...
Date: 1998/06/20
Message-ID: <199806201951.MAA30491@bitmover.com>#1/1
X-Deja-AN: 364548603
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
Newsgroups: muc.lists.linux-kernel

:    Even with the debugging problems solved, linuxthreads are heavier
:    than solaris pthreads or NT fibers.  

So how about quantifying that a bit and show us some numbers and how they
affect things in real life?

:    Unix multiplexing facilities -- select and poll -- are wake-all
:    primitives.  When something happens, everything waiting is awakened
:    and immediately starts fighting for something to do.  What a waste.
:    They make a lot of sense for processes though.  On NT completion
:    ports provide wake-one semantics... which are perfect for threads.
: 
: Yes, this does in fact suck.  However, the path to go down is not to
: expect the way select/poll work to change, rather look at other
: existing facilities or invent new ones which solve this problem.
: Too much user code exists which depends upon the wake-all semantics,

Hmm.  SGI changed accept() from wakeup-all to wakeup-one with no problem.

I'd be interested in knowing which programs depend on the race.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: l...@bitmover.com (Larry McVoy)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <199806210128.SAA31866@bitmover.com>#1/1
X-Deja-AN: 364611098
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
Newsgroups: muc.lists.linux-kernel

: This demonstrates the point that select and poll are workarounds for
: the lack of threading support in Unix.  They aren't needed if you use
: a threads facility (or a separate process for each thread you need).
: 
: Once you have threads you can stick to the intuitive synchronous model
: of system calls, which has always effectively handled waking one of
: multiple waiters.

There are a number of people, usually systems / kernel types, who realize
that multiple threads/processes can have a severe negative effect
on performance, especially when you are trying to make things fit in
a small processor cache.  Event driven programming tends to use less
system resources than threaded programming.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/20
Message-ID: <Pine.LNX.3.96dg4.980620132805.15494K-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 364552022
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806200952.TAA16430@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Sat, 20 Jun 1998, Richard Gooch wrote:

> Dean Gaudet writes:
> > 
> > On Fri, 19 Jun 1998, Richard Gooch wrote:
> > 
> > > On the other hand you could say that the UNIX semantics are fine and
> > > are quite scalable, provided you use them sensibly. Some of these
> > > "problems" are due to applications not being properly thought out in
> > > the first place. If for example you have N threads each polling a
> > > chunk of FDs, things can run well, provided you don't have *each*
> > > thread polling *all* FDs. Of course, you want to use poll(2) rather
> > > than select(2), but other than that the point stands.
> > 
> > You may not be able to exploit the parallism available in the hardware
> > unless you can "load balance" the descriptors well enough...
> 
> Use 10 threads. Seems to me that would provide reasonable load
> balancing. And increasing that to 100 threads would be even better.

No it wouldn't.  100 kernel-level threads is overkill.  Unless your box
can do 100 things at a time there's no benefit from giving the kernel 100
objects to schedule.  10 is a much more reasonable number, and even that
may be too high.  You only need as many kernel threads as there is
parallelism to exploit in the hardware.  Everything else can, and should,
happen in userland where timeslices can be maximized and context switches
minimized. 

> The aim is to ensure that, statistically, most threads will remain
> sleeping for several clock ticks.

What?  If I am wasting system memory for a kernel-level thread I'm not
going to go about ensuring that it remains asleep!  no way.  I'm going to
use each and every time slice to its fullest -- because context switches
have a non-zero cost, it may be small, but it is non-zero.

> With a bit of extra work you could even slowly migrate consistently
> active FDs to one or a few threads.

But migrating them costs you extra CPU time.  That's time that strictly
speaking, which does not need to be spent.  NT doesn't have to spend this
time when using completion ports (I'm sounding like a broken record). 

Look at this another way.  If I'm using poll() to implement something,
then I typically have a structure that describes each FD and the state it
is in.  I'm always interested in whether that FD is ready for read or
write.  When it is ready I'll do some processing, modify the state,
read/write something, and then do nothing with it until it is ready again. 

To do this I list for the kernel all the FDs and call poll().  Then the
kernel goes around and polls everything.  For many descriptors (i.e. slow
long haul internet clients) this is a complete waste.  There are two
approaches I've seen to deal with this:

- don't poll everything as frequently, do complex migration between
different "pools" sorted by how active the FD is.  This reduces the number
of times slow sockets are polled.  This is a win, but I feel it is far too
complex (read: easy to get wrong). 

- let the kernel queue an event when the FD becomes ready.  So rather than
calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
basis "when this is ready for read/write queue an event on this pipe, and
could you please hand me back this void * with it?  thanks".  In this
model when a write() returns EWOULDBLOCK the kernel implicitly sets that
FD up as "waiting for write", similarly for a read().  This means that no
matter what speed the socket is, it won't be polled and no complex
dividing of the FDs into threads needs to be done. 

The latter model is a lot like completion ports... but probably far easier
to implement.  When the kernel changes an FD in a way that could cause it
to become ready for read or write it checks if it's supposed to queue an
event.  If the event queue becomes full the kernel should queue one event
saying "event queue full, you'll have to recover in whatever way you find
suitable... like use poll()". 

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/20
Message-ID: <Pine.LNX.3.96dg4.980620142324.15494N-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 364563215
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806201951.MAA30491@bitmover.com>
Newsgroups: muc.lists.linux-kernel

On Sat, 20 Jun 1998, Larry McVoy wrote:

> :    Even with the debugging problems solved, linuxthreads are heavier
> :    than solaris pthreads or NT fibers.  
> 
> So how about quantifying that a bit and show us some numbers and how they
> affect things in real life?

As a matter of fact I can quantify this somewhat. 

NSPR provides two modes of operation on linux -- one uses pthreads, the
other users a portable userland threads library (the standard
setjmp/longjmp deal although it uses sigsetjmp/siglongjmp, and needs a
little more optimization).  I've ported apache 1.3 to NSPR as an
experiment for future versions of apache.  I built the non-debugging
versions of the NSPR library, linked my apache-nspr code against it, and
set up a rather crude benchmark. 

% dd if=/dev/zero of=htdocs/6k bs=1024 count=6
(the squid folks used to tell me 6k was the average object size on the
net, maybe the number is different these days)

% zb 127.0.0.1 /6k -p 8080 -c 10 -t 10 -k
(this is zeusbench asking for the 6k document, 10 simultaneous clients (it
uses select to multiplex), run for 10 seconds, use keep-alive persistent
http connections)

With pthreads it achieves 811 req/s.
With user threads it achieves 1024.40 req/s.

The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104. 

Caveats:  While NSPR has been designed extremely well, and the interfaces
don't show any immediate problems with doing underlying optimizations,
it's certainly not top speed yet.  This applies in both cases however. 
NSPR has a hybrid user/system model that lives on top of pthreads, I
haven't tried it yet (it's not ported to linux according to the docs). 

I can do comparisons with the process-model based apache, and I used to
have a native pthreads port of apache... but the latter is out of date now
because I switched my efforts to NSPR in order to have greater portability
(including win32). 

Larry does lmbench have a threads component that can benchmark different
threads libraries easily?  I have to admit I'm not terribly familiar with
lmbench... but if you've got some benchmarks you'd like me to run I can
try them.  Or you can try them -- NSPR comes with mozilla, after
downloading the tarball, "cd mozilla/nsprpub", then do "make BUILD_OPT=1" 
to get the user-threads version, and do "make BUILD_OPT=1 USE_PTHREADS=1" 
to get the pthreads version.

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: da...@dm.cobaltmicro.com (David S. Miller)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <199806210213.TAA02349@dm.cobaltmicro.com>#1/1
X-Deja-AN: 364616896
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980620142324.15494N-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

   Date: 	Sat, 20 Jun 1998 14:37:36 -0700 (PDT)
   From: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org>

   With pthreads it achieves 811 req/s.
   With user threads it achieves 1024.40 req/s.

   The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104. 

If you have the opportunity, perform the same benchmark on an
architecture that implements context pids in the TLB.  The entire TLB
is for all intents and purposes, flushed entirely of all userland
translations for even thread context switches.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: mi...@valerie.inf.elte.hu (MOLNAR Ingo)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <Pine.GSO.3.96.980621045311.25881B-100000@valerie.inf.elte.hu>#1/1
X-Deja-AN: 364627002
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806210213.TAA02349@dm.cobaltmicro.com>
Reply-To: MOLNAR Ingo <mi...@valerie.inf.elte.hu>
Newsgroups: muc.lists.linux-kernel

On Sat, 20 Jun 1998, David S. Miller wrote:

>    With pthreads it achieves 811 req/s.
>    With user threads it achieves 1024.40 req/s.
> 
>    The machine is a single cpu ppro 200 with 128Mb of RAM running 2.1.104. 
> 
> If you have the opportunity, perform the same benchmark on an
> architecture that implements context pids in the TLB.  The entire TLB
> is for all intents and purposes, flushed entirely of all userland
> translations for even thread context switches.

on x86 it is not flushed across thread-thread switches ... and on a PPro,
parts of the TLB are tagged as 'global' (kernel pages obviously), which
keeps the TLB-lossage even across non-shared-VM threads small. (zb->apache
and apache->zb switches in this case).

one thing i noticed about LinuxThreads, the most 'out of balance' basic
pthreads operation in pthread_create(). Does NSPR create a pre-allocated
pool of threads? (or some kind of adaptive pool?) If it's creating threads
heavily (say per-request), then thats bad, at least with the current
LinuxThreads implementation. We have a 1:5 gap between the latency of
clone() and pthread_create() there...

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: da...@dm.cobaltmicro.com (David S. Miller)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <199806210312.UAA02799@dm.cobaltmicro.com>#1/1
X-Deja-AN: 364627003
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.GSO.3.96.980621045311.25881B-100000@valerie.inf.elte.hu>
Newsgroups: muc.lists.linux-kernel

   Date: 	Sun, 21 Jun 1998 05:03:29 +0200 (MET DST)
   From: MOLNAR Ingo <mi...@valerie.inf.elte.hu>

   on x86 it is not flushed across thread-thread switches ... and on a
   PPro, parts of the TLB are tagged as 'global' (kernel pages
   obviously), which keeps the TLB-lossage even across non-shared-VM
   threads small. (zb->apache and apache->zb switches in this case).

I assumed that TSS switches were defined to reload csr3, which by
definition flushes the TLB of user entires.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: mi...@valerie.inf.elte.hu (MOLNAR Ingo)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <Pine.GSO.3.96.980621051437.25881D-100000@valerie.inf.elte.hu>#1/1
X-Deja-AN: 364627004
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806210312.UAA02799@dm.cobaltmicro.com>
Newsgroups: muc.lists.linux-kernel

On Sat, 20 Jun 1998, David S. Miller wrote:

> I assumed that TSS switches were defined to reload csr3, which by
> definition flushes the TLB of user entires.

it does have a 'short-cut' in the microcode, it does not flush the TLB if
cr3(A) == cr3(B) ... ugly :(

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: da...@dm.cobaltmicro.com (David S. Miller)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <199806210320.UAA02864@dm.cobaltmicro.com>#1/1
X-Deja-AN: 364630047
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806210312.UAA02799@dm.cobaltmicro.com>
Newsgroups: muc.lists.linux-kernel

   Date: 	Sat, 20 Jun 1998 20:12:35 -0700
   From: "David S. Miller" <da...@dm.cobaltmicro.com>

   I assumed that TSS switches were defined to reload csr3, which by
   definition flushes the TLB of user entires.

Thats broken, not because it's a silly workaround for the Intel TLB
mis-design, but rather because it changes behavior from what older
CPU's did.  So if someone optimized things to defer TLB flushes for
mapping changes, when they knew they would task switch once before
running the task again, this "microcode optimization" would break the
behavior such a trick would depend upon.

Later,
David S. Miller
da...@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: mi...@valerie.inf.elte.hu (MOLNAR Ingo)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <Pine.GSO.3.96.980621052422.25881E-100000@valerie.inf.elte.hu>#1/1
X-Deja-AN: 364630046
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806210320.UAA02864@dm.cobaltmicro.com>
Newsgroups: muc.lists.linux-kernel

On Sat, 20 Jun 1998, David S. Miller wrote:

>    I assumed that TSS switches were defined to reload csr3, which by
>    definition flushes the TLB of user entires.
> 
> Thats broken, not because it's a silly workaround for the Intel TLB
> mis-design, but rather because it changes behavior from what older
> CPU's did.  So if someone optimized things to defer TLB flushes for
> mapping changes, when they knew they would task switch once before
> running the task again, this "microcode optimization" would break the
> behavior such a trick would depend upon.

unless this deferred TLB flush feature gets into 2.1, i plan on making a
new version of the softswitch stuff (that replaces TSS switching) for 2.3,
which should give us more pronounced control over TLB flushes and more ...

-- mingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <199806210320.NAA20480@vindaloo.atnf.CSIRO.AU>
X-Deja-AN: 364630052
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980620132805.15494K-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

Dean Gaudet writes:
> 
> 
> On Sat, 20 Jun 1998, Richard Gooch wrote:
[...]
> > Use 10 threads. Seems to me that would provide reasonable load
> > balancing. And increasing that to 100 threads would be even better.
> 
> No it wouldn't.  100 kernel-level threads is overkill.  Unless your box
> can do 100 things at a time there's no benefit from giving the kernel 100
> objects to schedule.  10 is a much more reasonable number, and even that
> may be too high.  You only need as many kernel threads as there is
> parallelism to exploit in the hardware.  Everything else can, and should,
> happen in userland where timeslices can be maximized and context switches
> minimized. 
> 
> > The aim is to ensure that, statistically, most threads will remain
> > sleeping for several clock ticks.
> 
> What?  If I am wasting system memory for a kernel-level thread I'm not
> going to go about ensuring that it remains asleep!  no way.  I'm going to
> use each and every time slice to its fullest -- because context switches
> have a non-zero cost, it may be small, but it is non-zero.

The point is that *most* FDs are inactive. If in every timeslice you
have only 5 active FDs (taken from a uniform random distribution),
then with 10 threads only half of those are woken up. Hence only half
the number of FDs have to be scanned when these threads have processed
the activity. For 1000 FDs, then is a saving of 500 FD scans, which is
1.5 ms. So scanning load has gone from 30% to 15% (10 ms timeslice).
Also note that only 5 threads are woken up (scheduled), the other 5
remain asleep.

Now lets look at 100 threads. With 5 active FDs, you still get at most
5 threads woken up. But now FD scanning after processing activity
drops to a total of 50 FDs. So scanning load (per timeslice!) has
dropped to 150 us. So compared with the 10 thread case, we have saved
1.35 ms of FD scanning time. Compared with the 1 thread case, we have
saved 2.85 ms of scanning time (as always, per 10 ms timeslice). In
other words, only 0.15% scanning load. And still we are only
scheduling 5 threads *this timeslice*!

I don't know why you care so much about context switches: the time
taken for select(2) or poll(2) for many FDs is dominant!

Just how much time do you think scheduling is taking???

> > With a bit of extra work you could even slowly migrate consistently
> > active FDs to one or a few threads.
> 
> But migrating them costs you extra CPU time.  That's time that strictly
> speaking, which does not need to be spent.  NT doesn't have to spend this
> time when using completion ports (I'm sounding like a broken record). 

Migration is pretty cheap: it's a matter of swapping some entries in a
table. And migration only happens upon FD activity. Adding a few extra
microseconds for migration is peanuts compared with the time taken to
process a datagram.

> Look at this another way.  If I'm using poll() to implement something,
> then I typically have a structure that describes each FD and the state it
> is in.  I'm always interested in whether that FD is ready for read or
> write.  When it is ready I'll do some processing, modify the state,
> read/write something, and then do nothing with it until it is ready again. 

Yep, fine. My conceptual model is that I call a callback for each
active FD. Same thing.

> To do this I list for the kernel all the FDs and call poll().  Then the
> kernel goes around and polls everything.  For many descriptors (i.e. slow
> long haul internet clients) this is a complete waste.  There are two
> approaches I've seen to deal with this:
> 
> - don't poll everything as frequently, do complex migration between
> different "pools" sorted by how active the FD is.  This reduces the number
> of times slow sockets are polled.  This is a win, but I feel it is far too
> complex (read: easy to get wrong). 

It only needs to be done "right" once. In a library. Heck, I might
even modify my own FD management library code to do this just to prove
the point. Write once, use many!
Note that even the "complex" migration is optional: simply dividing up
FDs equally between N threads is a win.
Having migration between a small number of threads is going to be a
*real* win.

> - let the kernel queue an event when the FD becomes ready.  So rather than
> calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> basis "when this is ready for read/write queue an event on this pipe, and
> could you please hand me back this void * with it?  thanks".  In this
> model when a write() returns EWOULDBLOCK the kernel implicitly sets that
> FD up as "waiting for write", similarly for a read().  This means that no
> matter what speed the socket is, it won't be polled and no complex
> dividing of the FDs into threads needs to be done. 

I think this will be more complex to implement than a small userspace
library that uses a handful of threads.

> The latter model is a lot like completion ports... but probably far easier
> to implement.  When the kernel changes an FD in a way that could cause it
> to become ready for read or write it checks if it's supposed to queue an
> event.  If the event queue becomes full the kernel should queue one event
> saying "event queue full, you'll have to recover in whatever way you find
> suitable... like use poll()". 

This involves kernel bloat. It seems to me that there is such a simple
userspace solution, so why bother hacking the kernel?
I'd much rather hack the kernel to speed up select(2) and poll(2) a
few times. This benefits all existing Linux/UNIX programmes.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <Pine.LNX.3.96dg4.980621134214.28501J-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 364786346
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806210320.NAA20480@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Sun, 21 Jun 1998, Richard Gooch wrote:

> Just how much time do you think scheduling is taking???

I care more about cache pollution.  That is a side-effect of
context-switching which isn't entirely obvious from the context-switch
cost itself. 

> It only needs to be done "right" once. In a library. Heck, I might
> even modify my own FD management library code to do this just to prove
> the point. Write once, use many!
> Note that even the "complex" migration is optional: simply dividing up
> FDs equally between N threads is a win.
> Having migration between a small number of threads is going to be a
> *real* win.

Right, and if you'll release this in a license other than GPL (i.e. LGPL
or MPL) so that it can be reused in non-GPL code (i.e. NSPR which is NPL),
that would be most excellent.  (acronyms rewl). 

> This involves kernel bloat. It seems to me that there is such a simple
> userspace solution, so why bother hacking the kernel?

I don't think the userspace solution is as fast as the event queue
solution.

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: ko...@jagunet.com (John Kodis)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <19980621103913.09836@jagunet.com>#1/1
X-Deja-AN: 364714846
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806200952.TAA16430@vindaloo.atnf.CSIRO.AU> 
Reply-To: ko...@jagunet.com
Newsgroups: muc.lists.linux-kernel

On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote:

> - let the kernel queue an event when the FD becomes ready.  So rather than
> calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> basis "when this is ready for read/write queue an event on this pipe, and
> could you please hand me back this void * with it?  thanks".

Yow!  Shades of VMS!  This sounds very much like the VMS Async System
Trap mechanism that allowed you to perform a queued IO operation using
a call something like:

    status = sys$qio(
        READ_OPCODE, fd, buffer, sizeof(buffer),
        <lots of other parameters that I've long since forgotten>,
        ast_function, ast_parameter, ...);

The read would get posted, and when complete the ast_function would
get called with the ast_parameter in the context of the process that
posted the QIO.  This provided a powerful and easy-to-use method of
dealing with async IO.  It's one of the few VMS features that I wish
Unix supported.

-- John Kodis.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: groud...@club-internet.fr (Gerard Roudier)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <Pine.LNX.3.95.980621171244.475A-100000@localhost>#1/1
X-Deja-AN: 364743618
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <19980621103913.09836@jagunet.com>
Newsgroups: muc.lists.linux-kernel

On Sun, 21 Jun 1998, John Kodis wrote:

> On Sat, Jun 20, 1998 at 01:49:50PM -0700, Dean Gaudet wrote:
> 
> > - let the kernel queue an event when the FD becomes ready.  So rather than
> > calling poll() with a list of 100s of FDs, we tell the kernel on a per-FD
> > basis "when this is ready for read/write queue an event on this pipe, and
> > could you please hand me back this void * with it?  thanks".
> 
> Yow!  Shades of VMS!  This sounds very much like the VMS Async System
> Trap mechanism that allowed you to perform a queued IO operation using
> a call something like:
> 
>     status = sys$qio(
>         READ_OPCODE, fd, buffer, sizeof(buffer),
>         <lots of other parameters that I've long since forgotten>,
>         ast_function, ast_parameter, ...);
> 
> The read would get posted, and when complete the ast_function would
> get called with the ast_parameter in the context of the process that
> posted the QIO.  This provided a powerful and easy-to-use method of
> dealing with async IO.  It's one of the few VMS features that I wish
> Unix supported.

RSX and friends (IAS, ...) already had such a feature.
With such a mechanism, application programs get IO completion (software) 
interrupt as the kernel get completion interrupt from the hardware.
DEC O/Ses have had AST mechanisms for years without offering threads.
Speaking about VMS, you can pass data (or event) using interlocked 
queues between AST and process and between processes using shared 
memory and so you donnot need to use critical sections for synchonizing 
data or event passing. No need to use several threads sharing a process
address space to make things rights.

Using multi-threading into a single process context is, IMO, just 
importing into user-land kernel-like problems and providing such 
a feature complexifies significantly the involved kernel.
Multi-threading into processes is not the way to go, IMO, especially 
if you want to be portable across platforms.

If one really need to use threads, then, one of the following is true, 
in my opinion:
- One likes complexity since one is stupid as most programmers.
- One's O/S handles processes as bloat entities.
- One has heared too much O/S 2 lovers.
- One is believing that MicroSoft-BASIC is multi-threaded.

There is probably lots of OS2 multi-threaded programs that can only be
broken on SMP, since I often heared OS2 multi-braindeaded programmers  
assuming that threads inside a process are only preempted when 
they call a system service.

I have written and maintained lots of application programs under VMS, 
UNIX, some have been ported to a dozen of O/S, none of them uses threads.
I donnot envision to use multi-threads in application software and I 
donnot want to have to deal with applications that uses this, for the 
simple reasons that threads semantics differs too much between operating 
systems and that application programs are often large programs that 
donnot follow the high level of quality of O/S softwares.

Traditionnal UNIXes used light processes and preferently blocking I/Os.
Signals were preferently for error conditions.
The select() semantic has been a hack that has been very usefull for 
implementing event-driven applications using a low number of fds, as 
the X Server. Trying to use such a semantic to deal with thousands of 
handles can only lead to performance problems. This is trivial.

Regards,
   Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: a...@lxorguk.ukuu.org.uk (Alan Cox)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <m0yntEW-000aOnC@the-village.bc.nu>#1/1
X-Deja-AN: 364820020
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980621134214.28501J-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

> > This involves kernel bloat. It seems to me that there is such a simple
> > userspace solution, so why bother hacking the kernel?
> 
> I don't think the userspace solution is as fast as the event queue
> solution.

I think thats pretty obvious. Select() is an event queue mechanism which
does a setup for each select(). Asynchronous I/O has some similar 
properties (clone, I/O , signal) but is only per handle. A pure event
queue model does one setup per handle only per handle that matters and
not per event setups. You just get the queue overheads

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/21
Message-ID: <199806212338.JAA26410@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 364833942
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <m0yntEW-000aOnC@the-village.bc.nu>
Newsgroups: muc.lists.linux-kernel

Alan Cox writes:
> > > This involves kernel bloat. It seems to me that there is such a simple
> > > userspace solution, so why bother hacking the kernel?
> > 
> > I don't think the userspace solution is as fast as the event queue
> > solution.
> 
> I think thats pretty obvious. Select() is an event queue mechanism which
> does a setup for each select(). Asynchronous I/O has some similar 
> properties (clone, I/O , signal) but is only per handle. A pure event
> queue model does one setup per handle only per handle that matters and
> not per event setups. You just get the queue overheads

The point is a good userspace solution should be *fast enough*. I
define "fast enough" to be "such that polling overheads contribute
less than 10% of the application load".

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/22
Message-ID: <199806220715.RAA28264@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 364905841
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980621134214.28501J-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

  I've written a document that tries to cover the various issues with
I/O events. Check out:
http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/22
Message-ID: <Pine.LNX.3.96dg4.980622004318.19675P-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 364909153
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806220715.RAA28264@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

Note that the poll_ctrl you introduce in
<ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme>
is almost all the work required for a completion queue.  The additional
code required is to add "void *user_data; int completion_fd;" to the event
structure.  If the low level code is smart enough to fill in your events
structure it's smart enough to plop a word into a pipe when necessary.  So
are you sure it'd be too much bloat to do completion queues?  :)

Dean

On Mon, 22 Jun 1998, Richard Gooch wrote:

>   I've written a document that tries to cover the various issues with
> I/O events. Check out:
> http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html
> 
> 				Regards,
> 
> 					Richard....
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/22
Message-ID: <199806220753.RAA28663@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 364911751
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980622004318.19675P-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

Dean Gaudet writes:
> Note that the poll_ctrl you introduce in
>
> <ftp://ftp.atnf.csiro.au/pub/people/rgooch/linux/kernel-patches/v2.1/fastpoll-readme>

Hey! Someone's already read it:-)

> is almost all the work required for a completion queue.  The additional
> code required is to add "void *user_data; int completion_fd;" to the event
> structure.  If the low level code is smart enough to fill in your events
> structure it's smart enough to plop a word into a pipe when necessary.  So
> are you sure it'd be too much bloat to do completion queues?  :)
> 
> On Mon, 22 Jun 1998, Richard Gooch wrote:
> 
> >   I've written a document that tries to cover the various issues with
> > I/O events. Check out:
> > http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html

The new mechanism I introduce optimises an existing POSIX
interface. Also, it is optional: drivers which continue to do things
the old way will still work, they just won't be as fast. With
completion ports all drivers will have to be modified, so it involves
a lot more work.

I do agree that if my fastpoll optimisation is added, then the logical
place to add completion port support is in poll_notify(). I've added a
note in my documentation about that.

BTW: what happens when a FD is closed before the completion event is
read by the application? Protecting against that could be tricky, and
may require more code than simply dropping an int into a pipe.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: l...@bitmover.com (Larry McVoy)
Subject: Re: Thread implementations...
Date: 1998/06/22
Message-ID: <199806221544.IAA03108@bitmover.com>#1/1
X-Deja-AN: 365026270
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
Newsgroups: muc.lists.linux-kernel

: > If one really need to use threads, then, one of the following is true, 
: > in my opinion:
: > - One likes complexity since one is stupid as most programmers.
: > - One's O/S handles processes as bloat entities.
: > - One has heared too much O/S 2 lovers.
: > - One is believing that MicroSoft-BASIC is multi-threaded.
: 
: Wow! This is really arrogant!

Maybe, maybe not.  I happen to agree with him, minus the inflammatory stuff.

: > The select() semantic has been a hack that has been very usefull for 
: > implementing event-driven applications using a low number of fds, as 
: > the X Server. Trying to use such a semantic to deal with thousands of 
: > handles can only lead to performance problems. This is trivial.
: 
: A lightweight userspace solution that uses a modest number of threads
: is cabable of giving us a fast and scalable mechanism for handling
: very large numbers of FDs. And it can do this without changing one
: line of kernel code.

So this is interesting.  Can you either point towards a document or explain
why using threads would make your mechanism faster?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: groud...@club-internet.fr (Gerard Roudier)
Subject: Re: Thread implementations...
Date: 1998/06/22
Message-ID: <Pine.LNX.3.95.980622200832.371A-100000@localhost>#1/1
X-Deja-AN: 365082668
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806212336.JAA26380@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Mon, 22 Jun 1998, Richard Gooch wrote:

> Gerard Roudier writes:
> > 
> > Using multi-threading into a single process context is, IMO, just 
> > importing into user-land kernel-like problems and providing such 
> > a feature complexifies significantly the involved kernel.
> > Multi-threading into processes is not the way to go, IMO, especially 
> > if you want to be portable across platforms.
> 
> I'm proposing a userspace abstraction that, on Unix systems, uses
> select(2)/poll(2) and a modest number of threads. It could be ported
> to another OS which has completion ports, if you cared.

If I have completion, then I donnot need threads at all.

> > If one really need to use threads, then, one of the following is true, 
> > in my opinion:
> > - One likes complexity since one is stupid as most programmers.
> > - One's O/S handles processes as bloat entities.
> > - One has heared too much O/S 2 lovers.
> > - One is believing that MicroSoft-BASIC is multi-threaded.
> 
> Wow! This is really arrogant!

Nothing arrogant here, only kindness. The thread-mania started with OS2
and I have had to deal for years with people who claim that threads are  
fine since you can use 1 thread to read the keyboard and another thread 
to send data to the printer. No need to use an O/S for such a construct, 
doing I/O directly from the hardware is easier to me. 
OS2 did not have completion nor real signals, so you had to use threads
if you wanted to be asynchronous. BTW, no need to kill a dead O/S.
About the thread-oriented Win/NT which is a 32 bit hardwired O/S that 
has been invented once 32 bit architectures has become obsolete, I could 
tell some other kindnesses, too ... What about the ridiculous 32 bit 
port to Alpha? The thread-mania that does bloat UNIX systems comes 
from these brain-deaded things.

Microsoft guys are modern alchemists who are able to make gold from sh*t.
Gold is for them, sh*t is for us. :-)

> > There is probably lots of OS2 multi-threaded programs that can only be
> > broken on SMP, since I often heared OS2 multi-braindeaded programmers  
> > assuming that threads inside a process are only preempted when 
> > they call a system service.
> 
> I don't see what this has to do with real threads on a real Unix.

When I see a > 5 MB kernel image, I donnot beleive it is a real UNIX.
I have the impression that recent UNIXen try to look like some 
proprietary O/Ses and POSIX extensions to UNIX services lead to 
BLOATIX, IMO.

> > I have written and maintained lots of application programs under VMS, 
> > UNIX, some have been ported to a dozen of O/S, none of them uses threads.
> > I donnot envision to use multi-threads in application software and I 
> > donnot want to have to deal with applications that uses this, for the 
> > simple reasons that threads semantics differs too much between operating 
> > systems and that application programs are often large programs that 
> > donnot follow the high level of quality of O/S softwares.
> 
> Threads have their uses. Sure, they can be abused. So what?

I agree with you that threads have their uses, but it seems to me that 
programmers want to use them even if it is not needed.

> > Traditionnal UNIXes used light processes and preferently blocking I/Os.
> > Signals were preferently for error conditions.
> > The select() semantic has been a hack that has been very usefull for 
> > implementing event-driven applications using a low number of fds, as 
> > the X Server. Trying to use such a semantic to deal with thousands of 
> > handles can only lead to performance problems. This is trivial.
> 
> A lightweight userspace solution that uses a modest number of threads
> is cabable of giving us a fast and scalable mechanism for handling
> very large numbers of FDs. And it can do this without changing one
> line of kernel code.
> Independently, we can optimise the kernel to speed up select(2) and
> poll(2) so that both this userspace library as well as other Unix
> programmes benefit.

select() and poll() are slow by design, at least in user-land.
Existing programs will get benefits, but this is not a long term  
solution. The right solution is an asynchronous completion mechanism 
as DEC O/Ses are offering since more than 20 years.

Regards,
   Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <199806230249.MAA04068@vindaloo.atnf.CSIRO.AU>
X-Deja-AN: 365192280
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.95.980622200832.371A-100000@localhost>
Newsgroups: muc.lists.linux-kernel


Gerard Roudier writes:
> 
> On Mon, 22 Jun 1998, Richard Gooch wrote:
> 
> > Gerard Roudier writes:
> > > 
> > > Using multi-threading into a single process context is, IMO, just 
> > > importing into user-land kernel-like problems and providing such 
> > > a feature complexifies significantly the involved kernel.
> > > Multi-threading into processes is not the way to go, IMO, especially 
> > > if you want to be portable across platforms.
> > 
> > I'm proposing a userspace abstraction that, on Unix systems, uses
> > select(2)/poll(2) and a modest number of threads. It could be ported
> > to another OS which has completion ports, if you cared.
> 
> If I have completion, then I donnot need threads at all.

Completion ports are not likely to ever be POSIX. Hence we should
first evaluate a lightweight solution based on threads before
implementing completion ports. If the threads solution is shown to be
unsatisfactory, only then does it make sense to look further.

> > > If one really need to use threads, then, one of the following is true, 
> > > in my opinion:
> > > - One likes complexity since one is stupid as most programmers.
> > > - One's O/S handles processes as bloat entities.
> > > - One has heared too much O/S 2 lovers.
> > > - One is believing that MicroSoft-BASIC is multi-threaded.
> > 
> > Wow! This is really arrogant!
> 
> Nothing arrogant here, only kindness. The thread-mania started with OS2
> and I have had to deal for years with people who claim that threads are  
> fine since you can use 1 thread to read the keyboard and another thread 
> to send data to the printer. No need to use an O/S for such a construct, 
> doing I/O directly from the hardware is easier to me. 
> OS2 did not have completion nor real signals, so you had to use threads
> if you wanted to be asynchronous. BTW, no need to kill a dead O/S.
> About the thread-oriented Win/NT which is a 32 bit hardwired O/S that 
> has been invented once 32 bit architectures has become obsolete, I could 
> tell some other kindnesses, too ... What about the ridiculous 32 bit 
> port to Alpha? The thread-mania that does bloat UNIX systems comes 
> from these brain-deaded things.

It's arrogant because when someone proposes a solution based on
threads, you belittle the whole idea (and by implication, the
person). Instead, you should first evaluate the idea on it's merits. I
may well be that a clever solution based on threads will work very
well. If my userspace solution doesn't perform well, then I'll
advocate (and maybe even implement) completion ports. But I first want
to see how far we can go without departing from UNIX.

> Microsoft guys are modern alchemists who are able to make gold from sh*t.
> Gold is for them, sh*t is for us. :-)

I have no interst in what M$ does in their OS. They are already a step
behind the UNIX world, two steps behind Linux, and we are widening the
gap.

> > > There is probably lots of OS2 multi-threaded programs that can only be
> > > broken on SMP, since I often heared OS2 multi-braindeaded programmers  
> > > assuming that threads inside a process are only preempted when 
> > > they call a system service.
> > 
> > I don't see what this has to do with real threads on a real Unix.
> 
> When I see a > 5 MB kernel image, I donnot beleive it is a real UNIX.
> I have the impression that recent UNIXen try to look like some 
> proprietary O/Ses and POSIX extensions to UNIX services lead to 
> BLOATIX, IMO.

So Linux is bloatware? After all, it has threads.

And I still don't see the relevance of your kernel-bashing
arguments. I'm proposing a wholly userspace solution. Hence it
requires no extra kernel code. Unlike completion ports, I might add...

> > > I have written and maintained lots of application programs under VMS, 
> > > UNIX, some have been ported to a dozen of O/S, none of them uses threads.
> > > I donnot envision to use multi-threads in application software and I 
> > > donnot want to have to deal with applications that uses this, for the 
> > > simple reasons that threads semantics differs too much between operating 
> > > systems and that application programs are often large programs that 
> > > donnot follow the high level of quality of O/S softwares.
> > 
> > Threads have their uses. Sure, they can be abused. So what?
> 
> I agree with you that threads have their uses, but it seems to me that 
> programmers want to use them even if it is not needed.

Well, I'm not one to jump to using threads just for the hell of
it. Why not read the proposal carefully before jumping up and saying a
threads-based solution is flawed? See:
http://www.atnf.csiro.au/~rgooch/linux/docs/io-events.html

> > > Traditionnal UNIXes used light processes and preferently blocking I/Os.
> > > Signals were preferently for error conditions.
> > > The select() semantic has been a hack that has been very usefull for 
> > > implementing event-driven applications using a low number of fds, as 
> > > the X Server. Trying to use such a semantic to deal with thousands of 
> > > handles can only lead to performance problems. This is trivial.
> > 
> > A lightweight userspace solution that uses a modest number of threads
> > is cabable of giving us a fast and scalable mechanism for handling
> > very large numbers of FDs. And it can do this without changing one
> > line of kernel code.
> > Independently, we can optimise the kernel to speed up select(2) and
> > poll(2) so that both this userspace library as well as other Unix
> > programmes benefit.
> 
> select() and poll() are slow by design, at least in user-land.
> Existing programs will get benefits, but this is not a long term  
> solution. The right solution is an asynchronous completion mechanism 
> as DEC O/Ses are offering since more than 20 years.

The right solution is one that works with minimal departure from the
UNIX interface. If completion ports provide no measurable performance
advantage over a userspace solution, there is no point to implementing
completion ports. We want compatibility with the UNIX world, not with
VMS.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <Pine.LNX.3.96dg4.980622230902.20096T-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 365227081
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806220753.RAA28663@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Mon, 22 Jun 1998, Richard Gooch wrote:

> The new mechanism I introduce optimises an existing POSIX
> interface. Also, it is optional: drivers which continue to do things
> the old way will still work, they just won't be as fast. With
> completion ports all drivers will have to be modified, so it involves
> a lot more work.

As long as ext2 and sockets support it I'd be happy ;) 

> I do agree that if my fastpoll optimisation is added, then the logical
> place to add completion port support is in poll_notify(). I've added a
> note in my documentation about that.
> 
> BTW: what happens when a FD is closed before the completion event is
> read by the application? Protecting against that could be tricky, and
> may require more code than simply dropping an int into a pipe.

I don't see a problem -- it's the application that interprets the meanings
of the ints coming off the pipe.  If the app closes while it possibly
still has outstanding stuff then that's a bug in the app.  There's no
problem for the kernel -- if the FD doesn't get re-used it'll return EBADF
when the app tries to use it... if it's re-used then the app gets whatever
damage it creates. 

But suppose it was re-used.  The data coming off the completion port means
only "ready for read" or "ready for write".  The app is almost certainly
using non-blocking read/write, and when it attempts it'll get EWOULDBLOCK
if things weren't ready. 

Although I suppose you could queue a special event on close... so that the
app could be sure that all events were flushed. 

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <199806230607.QAA05654@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 365227083
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980622230902.20096T-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel


Dean Gaudet writes:
> 
> 
> On Mon, 22 Jun 1998, Richard Gooch wrote:
> 
> > The new mechanism I introduce optimises an existing POSIX
> > interface. Also, it is optional: drivers which continue to do things
> > the old way will still work, they just won't be as fast. With
> > completion ports all drivers will have to be modified, so it involves
> > a lot more work.
> 
> As long as ext2 and sockets support it I'd be happy ;) 

ext2? You mean regular files?

> > I do agree that if my fastpoll optimisation is added, then the logical
> > place to add completion port support is in poll_notify(). I've added a
> > note in my documentation about that.
> > 
> > BTW: what happens when a FD is closed before the completion event is
> > read by the application? Protecting against that could be tricky, and
> > may require more code than simply dropping an int into a pipe.
> 
> I don't see a problem -- it's the application that interprets the meanings
> of the ints coming off the pipe.  If the app closes while it possibly
> still has outstanding stuff then that's a bug in the app.  There's no
> problem for the kernel -- if the FD doesn't get re-used it'll return EBADF
> when the app tries to use it... if it's re-used then the app gets whatever
> damage it creates. 
> 
> But suppose it was re-used.  The data coming off the completion port means
> only "ready for read" or "ready for write".  The app is almost certainly
> using non-blocking read/write, and when it attempts it'll get EWOULDBLOCK
> if things weren't ready. 
> 
> Although I suppose you could queue a special event on close... so that the
> app could be sure that all events were flushed. 

What does NT do? If we're considering implementing something similar
to NT, it would be worth knowing what the NT policy is.

I still think that you can get as good performance with a few threads
and simple FD migration, though.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <Pine.LNX.3.96dg4.980622233140.20096V-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 365227084
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806230607.QAA05654@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Tue, 23 Jun 1998, Richard Gooch wrote:

> ext2? You mean regular files?

Yeah.

> > > I do agree that if my fastpoll optimisation is added, then the logical
> > > place to add completion port support is in poll_notify(). I've added a
> > > note in my documentation about that.
> > > 
> > > BTW: what happens when a FD is closed before the completion event is
> > > read by the application? Protecting against that could be tricky, and
> > > may require more code than simply dropping an int into a pipe.
> > 
> > I don't see a problem -- it's the application that interprets the meanings
> > of the ints coming off the pipe.  If the app closes while it possibly
> > still has outstanding stuff then that's a bug in the app.  There's no
> > problem for the kernel -- if the FD doesn't get re-used it'll return EBADF
> > when the app tries to use it... if it's re-used then the app gets whatever
> > damage it creates. 
> > 
> > But suppose it was re-used.  The data coming off the completion port means
> > only "ready for read" or "ready for write".  The app is almost certainly
> > using non-blocking read/write, and when it attempts it'll get EWOULDBLOCK
> > if things weren't ready. 
> > 
> > Although I suppose you could queue a special event on close... so that the
> > app could be sure that all events were flushed. 
> 
> What does NT do? If we're considering implementing something similar
> to NT, it would be worth knowing what the NT policy is.

You know it just occured to me that some time back I stopped advocating
the NT method -- and by extension the VMS method... but I wasn't too clear
on describing my current method maybe.  NT/VMS actually do completion --
for example, if you do a write() you're told when it completes.  That I
think is way too expensive... I'm totally with you on the bloatness
of that.

What I'm advocating now is something akin to select()/poll(), and I've
been wrong to be calling it "completion ports".  It's more like a "ready
queue" -- a queue of FDs which are ready for read or write.  You do a
write()  the kernel says EWOULDBLOCK, so you stop writing and you put that
"thread" to sleep (you note that it's waiting for the FD to become ready
for write).  Sometime later the kernel tells you its ready for write()
by sending the FD down the ready queue.

That seems like a fairly light kernel change (and I think the stuff may
be in POSIX already -- rt signals and aio?  I need to get a copy of the
POSIX docs some day!).  If you need true completion ports you can
implement the rest of them at user-level.

> I still think that you can get as good performance with a few threads
> and simple FD migration, though.

Yeah I need to investigate this anyhow -- because I still need to support
other unixes.  So it's probably the first approach that would be good
once me (or someone else) gets into optimizing NSPR.

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <199806230732.RAA06474@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 365239583
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980622233140.20096V-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

Dean Gaudet writes:
> On Tue, 23 Jun 1998, Richard Gooch wrote:
> 
> > ext2? You mean regular files?
> 
> Yeah.

You currently can't poll for when a regular file delivers the block of
data you asked for. I'm not aware of any UNIX that supports this.
This is a whole new can of worms than the implementation of completion
ports/whatever.

> You know it just occured to me that some time back I stopped advocating
> the NT method -- and by extension the VMS method... but I wasn't too clear
> on describing my current method maybe.  NT/VMS actually do completion --
> for example, if you do a write() you're told when it completes.  That I
> think is way too expensive... I'm totally with you on the bloatness
> of that.

What exactly do you mean "you're told when it completes"?

> What I'm advocating now is something akin to select()/poll(), and I've
> been wrong to be calling it "completion ports".  It's more like a "ready
> queue" -- a queue of FDs which are ready for read or write.  You do a
> write()  the kernel says EWOULDBLOCK, so you stop writing and you put that
> "thread" to sleep (you note that it's waiting for the FD to become ready
> for write).  Sometime later the kernel tells you its ready for write()
> by sending the FD down the ready queue.

Earlier last year when you described completion ports, you suggested
that the queue for the completion events could just be a simple FD, so
I've assumed that's what you meant.
How is this different from "completion ports" in NT/VMS? It looks to
me these "event queues" are much the same as "completion ports", based
on the (vague) descriptions.

> That seems like a fairly light kernel change (and I think the stuff may
> be in POSIX already -- rt signals and aio?  I need to get a copy of the
> POSIX docs some day!).  If you need true completion ports you can
> implement the rest of them at user-level.

When people have talked about implementing AIO in Linux, they had in
mind a userspace library which used threads to do the work. Each AIO
request is given a thread. I think part of the reason for such an
implementation is that you can't poll a regular file, so you need a
blocking thread. The other reason is why do it in the kernel if we can
develop a good userspace solution?

> > I still think that you can get as good performance with a few threads
> > and simple FD migration, though.
> 
> Yeah I need to investigate this anyhow -- because I still need to support
> other unixes.  So it's probably the first approach that would be good
> once me (or someone else) gets into optimizing NSPR.

Or you could use my library...

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <Pine.LNX.3.96dg4.980623014005.20096i-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 365254338
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806230732.RAA06474@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Tue, 23 Jun 1998, Richard Gooch wrote:

> You currently can't poll for when a regular file delivers the block of
> data you asked for. I'm not aware of any UNIX that supports this.
> This is a whole new can of worms than the implementation of completion
> ports/whatever.

This is just asynch i/o.  I'd be surprised if any of the commercial unixes
lack it. 

> What exactly do you mean "you're told when it completes"?

You write/read a buffer, and control returns immediately.  Some
unspecified time later, when the write/read completes, your program is
informed either via a completion port (NT), or via a function you passed
to the kernel (VMS).

> How is this different from "completion ports" in NT/VMS? It looks to
> me these "event queues" are much the same as "completion ports", based
> on the (vague) descriptions.

Nope, completion ports are far heavier...  they actually imply that some
I/O has completed.  Whereas what I'm advocating only implies that some I/O
wouldn't block if tried. 

> When people have talked about implementing AIO in Linux, they had in
> mind a userspace library which used threads to do the work. Each AIO
> request is given a thread. I think part of the reason for such an
> implementation is that you can't poll a regular file, so you need a
> blocking thread. The other reason is why do it in the kernel if we can
> develop a good userspace solution?

This is going in circles -- this is exactly the point I've been debating
-- whether this is "good" or not.

> Or you could use my library...

I believe you said in a previous post that you don't care about NT.
Unfortunately, I do.  And I can't find anything on your pages about
portability...  I'm assuming you're referring to karma.  NSPR is already
ported to 20 unixes, plus WIN32, and has ports underway for pretty much
everything else of interest.

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <199806230904.TAA07206@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 365257749
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980623014005.20096i-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

Dean Gaudet writes:
> On Tue, 23 Jun 1998, Richard Gooch wrote:
> 
> > You currently can't poll for when a regular file delivers the block of
> > data you asked for. I'm not aware of any UNIX that supports this.
> > This is a whole new can of worms than the implementation of completion
> > ports/whatever.
> 
> This is just asynch i/o.  I'd be surprised if any of the commercial unixes
> lack it. 

Ah, OK, you're referring explicitely to aio_*(), right?

> > What exactly do you mean "you're told when it completes"?
> 
> You write/read a buffer, and control returns immediately.  Some
> unspecified time later, when the write/read completes, your program is
> informed either via a completion port (NT), or via a function you passed
> to the kernel (VMS).

Can these NT completion ports multiple events from multiple FDs?

> > How is this different from "completion ports" in NT/VMS? It looks to
> > me these "event queues" are much the same as "completion ports", based
> > on the (vague) descriptions.
> 
> Nope, completion ports are far heavier...  they actually imply that some
> I/O has completed.  Whereas what I'm advocating only implies that some I/O
> wouldn't block if tried. 

We have that now with non-blocking I/O. I still don't understand the
model you are proposing.

> > When people have talked about implementing AIO in Linux, they had in
> > mind a userspace library which used threads to do the work. Each AIO
> > request is given a thread. I think part of the reason for such an
> > implementation is that you can't poll a regular file, so you need a
> > blocking thread. The other reason is why do it in the kernel if we can
> > develop a good userspace solution?
> 
> This is going in circles -- this is exactly the point I've been debating
> -- whether this is "good" or not.

So you want AIO in the kernel. That is even more bloatware than
"completion ports", "event queues" or whatever you're calling
them. From what I've seen on this list in the past, a kernel-space AIO
implementation is not favoured.

If you think that a userpace implementation is going to be too slow,
you have to show evidence of that.

> > Or you could use my library...
> 
> I believe you said in a previous post that you don't care about NT.

I don't care about it in the context of a solution for a UNIX
system. If there is an NT solution, but it doesn't exist in UNIX, then
it doesn't help me, or others who want to get the best out of their
UNIX systems.

> Unfortunately, I do.  And I can't find anything on your pages about
> portability...  I'm assuming you're referring to karma.  NSPR is already

Karma has been ported to:
VXMVX
alpha_OSF1
c2_ConvexOS
crayPVP_UNICOS
hp9000_HPUX
i386_Linux
i386_Solaris
mips1_IRIX5
mips1_ULTRIX
mips2_IRIX5
mips2_IRIX6
mips4_IRIX6
rs6000_AIX
sparc_Solaris
sparc_SunOS

and code that doesn't care about CPU type (i.e. most of it) will also
compile on a "generic" POSIX machine.

plus I'll be distributing a small tarball which contains just the
stuff needed to compile the FD management package, so it can be
included in a separate package/library.

> ported to 20 unixes, plus WIN32, and has ports underway for pretty much
> everything else of interest.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/23
Message-ID: <199806230909.TAA07258@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 365257750
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806230904.TAA07206@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

Richard Gooch writes:
> Dean Gaudet writes:
> > On Tue, 23 Jun 1998, Richard Gooch wrote:
> > 
> > > What exactly do you mean "you're told when it completes"?
> > 
> > You write/read a buffer, and control returns immediately.  Some
> > unspecified time later, when the write/read completes, your program is
> > informed either via a completion port (NT), or via a function you passed
> > to the kernel (VMS).
> 
> Can these NT completion ports multiple events from multiple FDs?

Make that: "Can these NT completion ports multiplex events from
multiple FDs?"

> > > Or you could use my library...
> > 
> > I believe you said in a previous post that you don't care about NT.
> 
> I don't care about it in the context of a solution for a UNIX
> system. If there is an NT solution, but it doesn't exist in UNIX, then
> it doesn't help me, or others who want to get the best out of their
> UNIX systems.

I should also say that I have no problem with making use of some
native NT mechanism where appropriate, *for NT*. My library makes the
best use of whatever the OS supplies.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <Pine.LNX.3.96dg4.980623164906.29998D-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 366064024
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806230909.TAA07258@vindaloo.atnf.CSIRO.AU>
Reply-To: Dean Gaudet <dgaudet-list-linux-ker...@arctic.org>
Newsgroups: muc.lists.linux-kernel

On Tue, 23 Jun 1998, Richard Gooch wrote:

> Richard Gooch writes:
> > Dean Gaudet writes:
> > > On Tue, 23 Jun 1998, Richard Gooch wrote:
> > > 
> > > > What exactly do you mean "you're told when it completes"?
> > > 
> > > You write/read a buffer, and control returns immediately.  Some
> > > unspecified time later, when the write/read completes, your program is
> > > informed either via a completion port (NT), or via a function you passed
> > > to the kernel (VMS).
> > 
> > Can these NT completion ports multiple events from multiple FDs?
> 
> Make that: "Can these NT completion ports multiplex events from
> multiple FDs?"

Yes. 

A typical method of using them is to maintain a homogenous pool of worker
threads.  Each worker thread can pick up a completed I/O, do further
processing on the request, and "suspend" the request when it next needs to
do I/O, and loop back to pick up some other completed I/O.  To get an
event on the port you have to start an I/O and the kernel then registers
when the I/O has completed. 

This is different from select/poll event processing.  In this case the
events that the kernel delivers are of the form "if you read/write this FD
right now, it won't block".  To get an event to occur you first try to
read/write and get EWOULDBLOCK and then you ask the kernel to tell you
when it wouldn't block. 

Your proposal puts an event structure onto each FD, which the low level
driver updates to indicate read/write readiness.  I'm advocating taking
that one step further and plop that readiness event onto a readiness
queue.  In this way you can completely avoid the select/poll and all the
associated overhead -- instead you get a stream of "readiness" events from
the kernel.

Note that with sockets/pipes there is a read and write buffer, and it's
obvious how the above works for them (readiness indicates a
non-empty/non-full buffer as appropriate).

It's somewhat less critical for non-sockets, but something similar is
possible.  Readiness for read means that a readahead completed... when the
app finally read()s the buffer may or may not be present -- if it isn't
present then return EWOULDBLOCK.  For write, "readiness for write" means
that there is buffer space to take at least one page of data.  And if the
app takes too long to issue the write(), return EWOULDBLOCK.  i.e. just
pretend there is a read and write buffer... there is one, it's all the
buffer memory.

Now, completion ports and readiness queues are totally related.  You can
implement a completion port in userland given a readiness queue... and you
can implement a completion port in userland given select/poll.  At issue
is the efficiency of each solution.

BTW there's another class of problems with regular files which
applications like Squid run into (and which Apache will possibly run into
as we thread it... although I think I have an architecture to mostly avoid
the problems).  open(), close(), unlink(), rename(), ... all the metadata
operations are synchronous.  For example if I write a lean and mean single
threaded poll() based web server I'm still stuck with synchronous
open()... and to work around that I need to spawn multiple threads which
do the synchronous work.  (This is how Squid works.)  Making all of this
work without extra threads is a lot of trouble... and is probably not
worth it. 

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <199806240030.KAA06585@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 366064066
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980623164906.29998D-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel


Dean Gaudet writes:
> 
> 
> On Tue, 23 Jun 1998, Richard Gooch wrote:
> 
> > Richard Gooch writes:
> > > Can these NT completion ports multiple events from multiple FDs?
> > 
> > Make that: "Can these NT completion ports multiplex events from
> > multiple FDs?"
> 
> Yes. 
> 
> A typical method of using them is to maintain a homogenous pool of worker
> threads.  Each worker thread can pick up a completed I/O, do further
> processing on the request, and "suspend" the request when it next needs to
> do I/O, and loop back to pick up some other completed I/O.  To get an
> event on the port you have to start an I/O and the kernel then registers
> when the I/O has completed. 
> 
> This is different from select/poll event processing.  In this case the
> events that the kernel delivers are of the form "if you read/write this FD
> right now, it won't block".  To get an event to occur you first try to
> read/write and get EWOULDBLOCK and then you ask the kernel to tell you
> when it wouldn't block. 
> 
> Your proposal puts an event structure onto each FD, which the low level
> driver updates to indicate read/write readiness.  I'm advocating taking
> that one step further and plop that readiness event onto a readiness
> queue.  In this way you can completely avoid the select/poll and all the
> associated overhead -- instead you get a stream of "readiness" events from
> the kernel.

Sorry, I still don't see the difference between your completion ports
and event queues. In both cases, as far as I can tell, when I/O
completes a "message" is sent to some place. The application can then
pick off these events. Part of the message includes the FD which had
the completed I/O.

> Note that with sockets/pipes there is a read and write buffer, and it's
> obvious how the above works for them (readiness indicates a
> non-empty/non-full buffer as appropriate).
> 
> It's somewhat less critical for non-sockets, but something similar is
> possible.  Readiness for read means that a readahead completed... when the
> app finally read()s the buffer may or may not be present -- if it isn't
> present then return EWOULDBLOCK.  For write, "readiness for write" means
> that there is buffer space to take at least one page of data.  And if the
> app takes too long to issue the write(), return EWOULDBLOCK.  i.e. just
> pretend there is a read and write buffer... there is one, it's all the
> buffer memory.

The last time I tried non-blocking I/O on a regular file, it still
blocked :-( This was with Linux 2.1.x.

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: ch...@cybernet.co.nz (Chris Wedgwood)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <19980624125030.A8986@caffeine.ix.net.nz>#1/1
X-Deja-AN: 366064008
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806230909.TAA07258@vindaloo.atnf.CSIRO.AU> 
Newsgroups: muc.lists.linux-kernel

On Wed, Jun 24, 1998 at 10:30:00AM +1000, Richard Gooch wrote:
> 
> The last time I tried non-blocking I/O on a regular file, it still
> blocked :-( This was with Linux 2.1.x.

I just looked at the fs code briefly and don't see anything to handle
O_NONBLOCK for regular files.

In fact... I'm not even sure how easy this would be to add to the kernel as
you would really need a kernel thread for each outstanding request (this is
starting to go in circles).

I was looking into this for sendfile(2), which can have similar constraints
and requirements.

-Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <Pine.LNX.3.96dg4.980623181943.29998O-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 366064027
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <19980624125030.A8986@caffeine.ix.net.nz>
Newsgroups: muc.lists.linux-kernel

On Wed, 24 Jun 1998, Chris Wedgwood wrote:

> I was looking into this for sendfile(2), which can have similar constraints
> and requirements.

It occured to me last night that sendfile() may not be the best thing... 
my latest scheme for speeding up apache involves what I'm calling "HTTP
flows", and the short story is that the web server has a front-end and a
back-end.  The front-end is extremely light, dumb, and single threaded; 
the back-end is full featured, and looks almost the same as current
apache.  The front-end handles only well-formed HTTP requests and only
requests that fit patterns that the back-end has fed it.  In its simplest
form it's a mapping from URL to mmap-region/FD (but it can handle far more
than just these static-only servers).  If sendfile() is blocking I can't
use it for this.

I've got a prototype of this method already, and it outperforms threaded
apache by a factor of 50%.  It all makes sense when you sit back and
realise the cache benefits from a single thread, not to mention the coding
short-cuts I can take because I can punt any request that isn't
well-formed to the slower, fully functional, backend.  The backend is
fully threaded (one thread per request) because it's far easier to let
folks extend the server in a threaded programming model... the backend
wouldn't have any problem with a blocking sendfile().  But the front-end
is where sendfile() would be of the most use... right now it's a typical
poll()/write() implementation.

Food for thought... glad to see someone is thinking about sendfile() :) 

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: ch...@cybernet.co.nz (Chris Wedgwood)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <19980624150059.A9649@caffeine.ix.net.nz>#1/1
X-Deja-AN: 366064034
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <19980624125030.A8986@caffeine.ix.net.nz> 
Newsgroups: muc.lists.linux-kernel

On Tue, Jun 23, 1998 at 06:37:58PM -0700, Dean Gaudet wrote:

> It occured to me last night that sendfile() may not be the best thing... 

Its probably not. I'm not even sure if sendfile belongs in the kernel (well,
not initially, long term it probably does), but it probably does need
implementing as some point as most other OS's have or will have some variant
of it.

> my latest scheme for speeding up apache involves what I'm calling "HTTP
> flows", and the short story is that the web server has a front-end and a
> back-end.  The front-end is extremely light, dumb, and single threaded;
> the back-end is full featured, and looks almost the same as current
> apache.

I've looked at the code and stuff. Looks pretty nice, but my head still
needs twisting before I can get my mind completely around it.

How does this scale for n processors, n frontends?

> The front-end handles only well-formed HTTP requests and only requests
> that fit patterns that the back-end has fed it.  In its simplest form it's
> a mapping from URL to mmap-region/FD (but it can handle far more than just
> these static-only servers).  If sendfile() is blocking I can't use it for
> this.

sendfile needn't be blocking, but the question is, under which conditions
should sendfile block?

For something like (al la PH-UX):

      ssize_t sendfile(int s, int fd, off_t offset, size_t nbytes,
              const struct iovec *hdtrl, int flags);

where s is the NETWORK socket, fd is the FILESYSTEM file descriptor.

Now, if both s and fd are set non-blocking, then logically, sendfile
shouldn't block, if s and fd are set to block, then logically it should
block. 

But, what is s is blocking and fd isn't, or vice versa? I would say here we
are entitled (and perhaps should be required) to block, but its not terribly
clear what is logical in this instance.

Oh, logically being defined as what I think makes sense. YMMV.

> The backend is fully threaded (one thread per request) because it's far
> easier to let folks extend the server in a threaded programming model...

One thread/request? 

I assume this means when I send "GET /index.html HTTP/0.9" it wakes up one
thread (from a preallocated pool), does the work, then sleeps (returning the
thread) ?

> the backend wouldn't have any problem with a blocking sendfile().  But the
> front-end is where sendfile() would be of the most use... right now it's a
> typical poll()/write() implementation.
> 
> Food for thought... glad to see someone is thinking about sendfile() :) 

As mentioned above, if async. IO can be done (at least in part) in
userspace, then I think sendfile should probably be implemented at the libc
level to start with. 

Sure, this partially defeats the purpose of it to some extent, but then
again some buffer-cache + vm tweaks may make it quite viable.

-Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 366064016
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <19980624150059.A9649@caffeine.ix.net.nz>
Newsgroups: muc.lists.linux-kernel


Chris Wedgwood writes:
> On Tue, Jun 23, 1998 at 06:37:58PM -0700, Dean Gaudet wrote:
> 
> > It occured to me last night that sendfile() may not be the best thing... 
> 
> Its probably not. I'm not even sure if sendfile belongs in the kernel (well,
> not initially, long term it probably does), but it probably does need
> implementing as some point as most other OS's have or will have some variant
> of it.
> 
> > my latest scheme for speeding up apache involves what I'm calling "HTTP
> > flows", and the short story is that the web server has a front-end and a
> > back-end.  The front-end is extremely light, dumb, and single threaded;
> > the back-end is full featured, and looks almost the same as current
> > apache.
> 
> I've looked at the code and stuff. Looks pretty nice, but my head still
> needs twisting before I can get my mind completely around it.
> 
> How does this scale for n processors, n frontends?
> 
> > The front-end handles only well-formed HTTP requests and only requests
> > that fit patterns that the back-end has fed it.  In its simplest form it's
> > a mapping from URL to mmap-region/FD (but it can handle far more than just
> > these static-only servers).  If sendfile() is blocking I can't use it for
> > this.
> 
> sendfile needn't be blocking, but the question is, under which conditions
> should sendfile block?
> 
> For something like (al la PH-UX):
> 
>       ssize_t sendfile(int s, int fd, off_t offset, size_t nbytes,
>               const struct iovec *hdtrl, int flags);
> 
> where s is the NETWORK socket, fd is the FILESYSTEM file descriptor.
> 
> Now, if both s and fd are set non-blocking, then logically, sendfile
> shouldn't block, if s and fd are set to block, then logically it should
> block. 
> 
> But, what is s is blocking and fd isn't, or vice versa? I would say here we
> are entitled (and perhaps should be required) to block, but its not terribly
> clear what is logical in this instance.
> 
> Oh, logically being defined as what I think makes sense. YMMV.
> 
> > The backend is fully threaded (one thread per request) because it's far
> > easier to let folks extend the server in a threaded programming model...
> 
> One thread/request? 
> 
> I assume this means when I send "GET /index.html HTTP/0.9" it wakes up one
> thread (from a preallocated pool), does the work, then sleeps (returning the
> thread) ?
> 
> > the backend wouldn't have any problem with a blocking sendfile().  But the
> > front-end is where sendfile() would be of the most use... right now it's a
> > typical poll()/write() implementation.
> > 
> > Food for thought... glad to see someone is thinking about sendfile() :) 
> 
> As mentioned above, if async. IO can be done (at least in part) in
> userspace, then I think sendfile should probably be implemented at the libc
> level to start with. 

Why bother with sendfile() if you have aio_*() available? sendfile()
is a trivial wrapper to aio_*().

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: dgaudet-list-linux-ker...@arctic.org (Dean Gaudet)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <Pine.LNX.3.96dg4.980624025515.26983E-100000@twinlark.arctic.org>#1/1
X-Deja-AN: 366064026
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU>
Newsgroups: muc.lists.linux-kernel

On Wed, 24 Jun 1998, Richard Gooch wrote:

> Why bother with sendfile() if you have aio_*() available? sendfile()
> is a trivial wrapper to aio_*().

aio_* are user space.  So they use either read() or mmap() to get the data
to be sent... which are the methods already available to apps, so there's
no need to use aio. 

read() is painful because it involves an extra copy of the data --
although that could be optimized by putting page flipping into the kernel,
and writing the app to ensure it uses page aligned buffers.  read() cannot
exercise the hardware to its fullest. 

mmap() is painful when your working set exceeds the RAM available because
it doesn't readahead more than a page.  read() does 4 page readahead (I
think these are the numbers), and outperforms mmap() in this situation. 
DavidM gave me a patch to improve things... but they're still not quite at
the level that read() is at... and read() isn't at the level the hardware
can handle.

sendfile() could be used to give a huge hint to the kernel about the
nature of the data to be sent... so the kernel can make better judgements
about when to readahead, and what to throw away in low memory situations. 
It isn't terribly necessary if the mmap() readahead problem is solved, but
DavidM made it sound like that was an icky problem to solve.

The main reason you want mmap() (or sendfile()) over read() is to be able
to perform single-copy and zero-copy TCP.  read() with page-flipping is
another way to do it, but I really don't know the difficulty.

Dean

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: Richard.Go...@atnf.CSIRO.AU (Richard Gooch)
Subject: Re: Thread implementations...
Date: 1998/06/24
Message-ID: <199806241213.WAA10661@vindaloo.atnf.CSIRO.AU>#1/1
X-Deja-AN: 366116036
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <Pine.LNX.3.96dg4.980624025515.26983E-100000@twinlark.arctic.org>
Newsgroups: muc.lists.linux-kernel

Dean Gaudet writes:
> 
> 
> On Wed, 24 Jun 1998, Richard Gooch wrote:
> 
> > Why bother with sendfile() if you have aio_*() available? sendfile()
> > is a trivial wrapper to aio_*().
> 
> aio_* are user space.  So they use either read() or mmap() to get the data
> to be sent... which are the methods already available to apps, so there's
> no need to use aio. 

OK, you're looking from the point of view of squeezing out more
performance.
Whether aio_*() is implemented in user-space or kernel-space probably
makes very little difference.

> read() is painful because it involves an extra copy of the data --
> although that could be optimized by putting page flipping into the kernel,
> and writing the app to ensure it uses page aligned buffers.  read() cannot
> exercise the hardware to its fullest. 
> 
> mmap() is painful when your working set exceeds the RAM available because
> it doesn't readahead more than a page.  read() does 4 page readahead (I
> think these are the numbers), and outperforms mmap() in this situation. 
> DavidM gave me a patch to improve things... but they're still not quite at
> the level that read() is at... and read() isn't at the level the hardware
> can handle.

That could be fixed with some decent flags for madvise(2). We could do
with that anyway for other applications.

> sendfile() could be used to give a huge hint to the kernel about the
> nature of the data to be sent... so the kernel can make better judgements
> about when to readahead, and what to throw away in low memory situations. 
> It isn't terribly necessary if the mmap() readahead problem is solved, but
> DavidM made it sound like that was an icky problem to solve.

I think the madvise(2) problem needs to be solved in any case.

> The main reason you want mmap() (or sendfile()) over read() is to be able
> to perform single-copy and zero-copy TCP.  read() with page-flipping is
> another way to do it, but I really don't know the difficulty.

If we get madvise(2) right, we don't need sendfile(2), correct?

				Regards,

					Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: ch...@cybernet.co.nz (Chris Wedgwood)
Subject: Re: Thread implementations...
Date: 1998/06/25
Message-ID: <19980625161310.B22513@caffeine.ix.net.nz>#1/1
X-Deja-AN: 366135313
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU> 
Newsgroups: muc.lists.linux-kernel

On Wed, Jun 24, 1998 at 10:13:57PM +1000, Richard Gooch wrote:

> If we get madvise(2) right, we don't need sendfile(2), correct?

It would probably suffice. In fact, having a working implementation of
madvise, etc. would make sendfile pretty trivial to do in libc. (Again, I
assuming that whether or not we need it, if it can be implemented in
userspace then why not...)

-Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: torva...@transmeta.com (Linus Torvalds)
Subject: Re: Thread implementations...
Date: 1998/06/25
Message-ID: <6msk1d$n4$1@palladium.transmeta.com>#1/1
X-Deja-AN: 366135316
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <199806240915.TAA09504@vindaloo.atnf.CSIRO.AU> 
Organization: Transmeta Corporation, Santa Clara, CA
Newsgroups: muc.lists.linux-kernel

In article <19980625161310.B22...@caffeine.ix.net.nz>,
Chris Wedgwood  <ch...@cybernet.co.nz> wrote:
>On Wed, Jun 24, 1998 at 10:13:57PM +1000, Richard Gooch wrote:
>
>> If we get madvise(2) right, we don't need sendfile(2), correct?
>
>It would probably suffice. In fact, having a working implementation of
>madvise, etc. would make sendfile pretty trivial to do in libc. (Again, I
>assuming that whether or not we need it, if it can be implemented in
>userspace then why not...)

However, the thing to notice is that a "sendfile()" system call can
potentially be a lot faster than anything else.  In particular, it can
be as clever as it wants about sending stuff directly from kernel
buffers etc. 

I know there are a lot of people who think zero-copying is cool, and
that tricks with mmap() etc can be used to create zero-copy.  But don't
forget that it's a major mistake to think that performance is about
whether the algorithm is O(1) or O(n) or O(n^2).  People tend to forget
the constant factor, and look blindly at other things. 

In particular, doing a mmap() itself is fairly expensive.  It implies a
lot of bookkeeping, and it also implies a fair amount of mucking around
with CPU VM issues (TLBs, page tables etc).  In short, it can be rather
expensive. 

Due to that expense, things that use mmap() often have a "cache" of
mappings that they have active.  Thet gets rid of one expense, but then
there is the new expense of maintaining that cache (and it can be a
fairly costly thing to maintain if you want to doa threaded webserver). 

In contrast, a "sendfile()" approach can be extremely light-weight, and
threads much better because it doesn't imply the same kinds of
maintenance. 

Now, I'm no NT person, but I suspect that we actually do want to have a
"sendfile()" kind of thing just because it should be fairly easy to
implement, and would offer some interesting performance advantages for
some cases.  No, it's not truly generic, but it is useful enough in many
circustances. 

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: e...@arbat.com (Erik Corry)
Subject: Re: Thread implementations...
Date: 1998/06/25
Message-ID: <19980625090558.A1141@arbat.com>#1/1
X-Deja-AN: 366135218
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
Newsgroups: muc.lists.linux-kernel

In article <6msk1d$n...@palladium.transmeta.com> you wrote:

> Now, I'm no NT person, but I suspect that we actually do want to have a
> "sendfile()" kind of thing just because it should be fairly easy to
> implement, and would offer some interesting performance advantages for
> some cases.  No, it's not truly generic, but it is useful enough in many
> circustances. 

I'm a little curious as to which circumstances you are thinking of.
As far as I can see, it's a syscall for a single application (a
web server serving static objects) which is basically little more
than a benchmark. If you really have such a hugely loaded web server
you are likely to be doing lots of database lookups, cookie-controlled
variable content, shtml, other cgi trickery, etc. And if you really
just want to serve static objects as fast as possible, a round-robin
DNS with multiple servers gets you more robustness and a solution that
scales above Ethernet speeds.

Would we just be doing this to look good agains NT in webstones?

-- 
Erik Corry

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu

From: torva...@transmeta.com (Linus Torvalds)
Subject: Re: Thread implementations...
Date: 1998/06/25
Message-ID: <Pine.LNX.3.95.980625094857.27350A-100000@penguin.transmeta.com>#1/1
X-Deja-AN: 366135400
Approved: g...@greenie.muc.de
Sender: muc.de!l-linux-kernel-owner
References: <19980625090558.A1141@arbat.com>
Newsgroups: muc.lists.linux-kernel

On Thu, 25 Jun 1998, Erik Corry wrote:
>
> [ "sendfile()" ]
> 
> I'm a little curious as to which circumstances you are thinking of.
> As far as I can see, it's a syscall for a single application (a
> web server serving static objects) which is basically little more
> than a benchmark.

It's actually perfectly usable for other things too, like ftp servers etc. 

The way I would probably implement it, it would actually work for "cp" as
well - you could "sendfile()" to another file, not just to a socket. 

>		 If you really have such a hugely loaded web server
> you are likely to be doing lots of database lookups, cookie-controlled
> variable content, shtml, other cgi trickery, etc.

My personal observation has been that most webservers do mostly static
stuff, with a small percentage of dynamic behaviour. For example, even if
they have lots of CGI etc, often a big part of the page (bandwidth-wise)
tend to be pictures etc.

>						 And if you really
> just want to serve static objects as fast as possible, a round-robin
> DNS with multiple servers gets you more robustness and a solution that
> scales above Ethernet speeds.

That works if you have a _completely_ static setup. Which is one common
thing to have, but at the same time it is certainly not what most people
want to have.

> Would we just be doing this to look good agains NT in webstones?

We want to do that too. I don't think it's only that, though. The apache
people get some impressive numbers out of Linux, but when I talk to Dean
Gaudet I also very often get the feeling that in order to get better
numbers they have to do really bad things, and those things are going to
slow them down in many circumstances.

One thing is actually the latency of setting up a small transfer. This
sounds unimportant, but it's actually fairly important in order to do well
under load: the lower latency you have, the more likely you are to not get
into the bad situation that you have lots of outstanding requests and all
while you serve those you get new requests at the same rate and never make
any progress after a certain load. 

That's one reason I don't like mmap() - it has horrible latency. mmap
under linux is fast, but it's really slow compared to what _could_ be
done. Similarly, "read()+write()" implies using user-space buffers, which
implies a certain amount of memory management and certainly bad
utilization of memory that could be better used for caching something.

And web serving is one of the things a lot of people want. And if they
make their judgements by benchmarks, we'd better be good at them. Never
discount benchmark numbers just because you don't like the benchmark: I
much prefer to go by real numbers than by "feeling".

I know some people that every time they see Linux beating somebody at a
benchmark, they claim that "the benchmark is meaningless, under real load
the issues are different". That's a cop-out. If NT is better than Linux at
something, we'd better look out or have a _really_ good explanation.. And I
think webstone is "real enough" that we can't really explain it away.

(I'm not saying NT is faster - I don't actually know the numbers. But I
don't want to be in the situation that it could be faster).

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu