Some very thought-provoking ideas about OS architecture

From: "Eric S. Raymond" <e...@snark.thyrsus.com>
Subject: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.gvbbgev.180mo0k@ifi.uio.no>#1/1
X-Deja-AN: 491626142
Original-Date: Sun, 20 Jun 1999 00:57:35 -0400
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-Id: <199906200457.AAA10204@snark.thyrsus.com>
To: linux-ker...@vger.rutgers.edu
X-Authentication-Warning: snark.thyrsus.com: 
esr set sender to e...@snark.thyrsus.com using -f
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

(Please copy any replies to me explicitly, as I'm not presently subscribed
to the linux-kernel list -- it's not practical when I'm spending so
much time on the road.)

Gents and ladies, I believe I have may have seen what comes after
Unix. Not a half-step like Plan 9, but an advance in OS architecture
as fundamental at Multics or Unix was in its day.

As an old Unix hand myself, I don't make this claim lightly; I've
been wrestling with it for a couple of weeks now. Nor am I suggesting
we ought to drop what we're doing and hare off in a new direction.
What I am suggesting is that Linus and the other kernel architects
should be taking a hard look at this stuff and thinking about it. It
may take a while for all the implications to sink in. They're huge.

What comes after Unix will, I now believe, probably resemble at least
in concept an experimental operating system called EROS. Full details
are available at <http://www.eros-os.org/>, but for the impatient I'll
review the high points here.

EROS is built around two fundamental and intertwined ideas. One is
that all data and code persistence is handled directly by the OS.
There is no file system. Yes, I said *no file system*. Instead, 
everything is structures built in virtual memory and checkpointed out
to disk every so often (every five minutes in EROS). Want something?
Chase a pointer to it; EROS memory management does the rest.

The second fundamental idea is that of a pure capability architecture
with provably correct security. This is something like ACLs, except
that an OS with ACLs on a file system has a hole in it; programs can
communicate (in ways intended or unintended) through the file system
that everybody shares access to.

Capabilities plus checkpointing is a combination that turns out to
have huge synergies. Obviously programming is a lot simpler -- no
more hours and hours spent writing persistence/pickling/marshalling
code. The OS kernel is a lot simpler too; I can't find the figure to
be sure, but I believe EROS's is supposed to clock in at about 50K of code.

Here's another: All disk I/O is huge sequential BLTs done as part of
checkpoint operations. You can actually use close to 100% of your
controller's bandwidth, as opposed to the 30%-50% typical for
explicit-I/O operating systems that are doing seeks a lot of the time.
This means the maximum I/O throughput the OS can handle effectively
more than doubles. With simpler code. You could even afford the time
to verify each checkpoint write...

Here's a third: Had a crash or power-out? On reboot, the system
simply picks up pointers to the last checkpointed state. Your OS, and
all your applications, are back in thirty seconds. No fscks, ever
again!

And I haven't even talked about the advantages of capabilities over
userids yet. I would, but I just realized I'm running out of time --
gotta get ready to fly to Seattle tomorrow to upset some stomachs
at Microsoft.

www.eros-os.org. Eric sez check it out. Mind-blowing stuff once
you've had a few days to digest it.
-- 
<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The Bible is not my book, and Christianity is not my religion. I could never
give assent to the long, complicated statements of Christian dogma.
-- Abraham Lincoln

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Rik van Riel <r...@nl.linux.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.lhe9p7v.6gqp8q@ifi.uio.no>#1/1
X-Deja-AN: 491695539
Original-Date: Sun, 20 Jun 1999 11:40:51 +0200 (CEST)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <Pine.LNX.4.03.9906201138390.534-100000@mirkwood.nl.linux.org>
References: <fa.gvbbgev.180mo0k@ifi.uio.no>
To: "Eric S. Raymond" <e...@snark.thyrsus.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Search-Engine-Bait: http://humbolt.nl.linux.org/
X-Orcpt: rfc822;linux-kernel-outgoing-dig
X-My-Own-Server: http://www.nl.linux.org/
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

On Sun, 20 Jun 1999, Eric S. Raymond wrote:

> What comes after Unix will, I now believe, probably resemble at
> least in concept an experimental operating system called EROS. 
> Full details are available at <http://www.eros-os.org/>, but for
> the impatient I'll review the high points here.

Unfortunately, EROS is still based on the PC hardware as
we've got it today and not modeled after a JINI-like
appliances model (the network is the computer).

With the death of the monolithic computer (if it happens)
will come the death of Unix, Windows _and_ EROS.

At the moment I can see only one Open Source system that
could become ready for a world like that. Alliance OS
(http://www.allos.org/).

cheers,

Rik -- Open Source: you deserve to be in control of your data.
+-------------------------------------------------------------------+
| Le Reseau netwerksystemen BV: http://www.reseau.nl/ |
| Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ |
| Nederlandse Linux documentatie: http://www.nl.linux.org/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: torva...@transmeta.com (Linus Torvalds)
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.j0u16ev.cm0d9i@ifi.uio.no>
X-Deja-AN: 491808056
Original-Date: 20 Jun 1999 17:52:33 GMT
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <7kj9p1$fqn$1@palladium.transmeta.com>
References: <fa.lhe9p7v.6gqp8q@ifi.uio.no>
To: linux-ker...@vger.rutgers.edu
Original-References: <199906200457.AAA10...@snark.thyrsus.com> 
<Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org>
X-Authentication-Warning: palladium.transmeta.com: 
bin set sender to n...@transmeta.com using -f
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Transmeta Corporation, Santa Clara, CA
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

In article <Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org>,
Rik van Riel <r...@nl.linux.org> wrote:
>
>Unfortunately, EROS is still based on the PC hardware as
>we've got it today and not modeled after a JINI-like
>appliances model (the network is the computer).
>
>With the death of the monolithic computer (if it happens)
>will come the death of Unix, Windows _and_ EROS.

That's a classic thing said by "OS Research People".

And it's complete crap and idiocy, and I'm finally going to stand up and
ask people to THINK instead of repeating the old and stinking dogma. 

It's _much_ better to have stand-alone appliances that can work well in
a networked environment than to have a networked appliance. 

I don't understand people who think that "distribution" implies
"collective". A distributed system should _not_ be composed of mindless
worker ants that only work together with other mindless worker ants.

A distributed system should be composed of individual stand-alone
systems that can work together. They should be real systems in their
own right, and have the power to make their own decisions. Too many
distributed OS projects are thinking "bees in a hive" - while what you
should aim for is "humans in society". 

I'll take humans over bees any day. Real OS's, with real operating
systems. Monolithic, because they CAN stand alone, and in fact do most
of their stuff without needing hand-holding every single minute. 
General-purpose instead of being able to do just one thing. 

>At the moment I can see only one Open Source system that
>could become ready for a world like that. Alliance OS
>(http://www.allos.org/).

I will tell you anything based on message passing is stupid. It's very
simple:

- if you end up doing remote communication, the largest overhead is in
the communication, not in how you initiate it. This is only going to
be more true with mobile computing, not less. 

Ergo: optimizing for message passing is stupid. You should _always_
optimize for the local case, because it's the only case where the
calling protocol really matters - once you go remote you have time to
massage the arguments any which way you like.

- Most operations are going to be local. Any operating system that
starts out from the notion that most operations are going to be
remote is going to die off as computers get more and more powerful.

Things may start off distributed, but in the end network bandwidth is
always going to be more expensive than CPU power.

- Truly mobile computing implies that a noticeable portion of the time
you do _not_ want to be in contact with any other computers. Your
computer had better be a very capable one even on its own. Anybody
who thinks anything else is just unbelievably misguided.

This implies that your computer had better have a local filesystem,
and had better be designed to work as well without any connectivity
as it does _with_ connectivity. It can't communicate, but that
shouldn't mean that it can't work.

So right now people are pointing at PDA's, and saying that they should
be running a "light" OS, all based on message passing, because obviously
all the real work would be done on a server. It makes sense, no?

NO. It does NOT make sense. People used to say the same thing about
workstations: workstations used to be expensive and not quite powerful
enough, and people wanted to have more than one. Where are those people
today? Face it, the hardware just got so much better that suddenly REAL
operating systems didn't have any of the alledged downsides, and while
you obviously want the ability to communicate, you should not think that
that is what you optimize for. 

The same is going to happen in the PDA space. Right now we have PalmOS. 
It's already doing internet connectivity, how much do you want to bet
that in the not too distant future they'll want to offer more and more?
There is no technical reason why a Palm in a few years won't have a few
hundred megs of RAM and a CPU that is quite equipped to handle a real
OS. (If they had selected the strongarm instead of a cut-down 68k it
would already). 

In short: message passing as the fundamental operation of the OS is just
an excercise in computer science masturbation. It may feel good, but
you don't actually get anything DONE. Nobody has ever shown that it
made sense in the real world. It's basically just much simpler and
saner to have a function call interface, and for operations that are
non-local it gets transparently _promoted_ to a message. There's no
reason why it should be considered to be a message when it starts out. 

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Rik van Riel <r...@nl.linux.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.lgtnp7v.608p8h@ifi.uio.no>#1/1
X-Deja-AN: 491828237
Original-Date: Sun, 20 Jun 1999 21:06:59 +0200 (CEST)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <Pine.LNX.4.03.9906202033110.534-100000@mirkwood.nl.linux.org>
References: <fa.j0u16ev.cm0d9i@ifi.uio.no>
To: Linus Torvalds <torva...@transmeta.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Search-Engine-Bait: http://humbolt.nl.linux.org/
X-Orcpt: rfc822;linux-kernel-outgoing-dig
X-My-Own-Server: http://www.nl.linux.org/
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

On 20 Jun 1999, Linus Torvalds wrote:

> >Unfortunately, EROS is still based on the PC hardware as
> >we've got it today and not modeled after a JINI-like
> >appliances model (the network is the computer).
> 
> That's a classic thing said by "OS Research People".

Which I'm not :)

> I don't understand people who think that "distribution" implies
> "collective".
> 
> A distributed system should be composed of individual stand-alone
> systems that can work together.

OK. I can agree with this. Still, Bill Joy can't be alltogether
wrong, can he? This is somewhat of a dillemma -- having to choose
between the paradigms of Bill Joy and Linus Torvalds... :)

> >(http://www.allos.org/).
> 
> I will tell you anything based on message passing is stupid. It's
> very simple:
[communications overhead is either in the actual communication
channel or unneeded]

The Alliance OS uses something like a shared library so that
what looks like message passing from higher levels only is
message passing when it needs to be -- otherwise the higher
overhead is avoided.

> - Most operations are going to be local.

Optimizing for the local case doesn't mean that remote operations
can't be made transparent. Because of latency problems, you are
probably right though...

> - Truly mobile computing implies that a noticeable portion of the time
> you do _not_ want to be in contact with any other computers. Your
> computer had better be a very capable one even on its own.

It depends. If a computer is used as a way of getting at information,
then you will want it to be connected. Mobile phones simply aren't
very useful on the north pole, however well they might function on
their own. Computing is more and more about communication and not
about number-crunching or playing games -- which, I agree, can be done
very well without network access.

Even for things like a calendar you will want access to outside
information. If you plan an appointment with someone else, you
need to be able to communicate with eachother to agree on a date/time
both of you are able to meet...

> In short: message passing as the fundamental operation of the OS
> is just an excercise in computer science masturbation. It may
> feel good, but you don't actually get anything DONE.

It can help achieve things we can't do with Linux:

Upgrade (parts of) the OS while running.
Since message passing objects are self-contained, you can
replace them more easily than possible with 'classic' OSes.
User process migration and other nice scalability and/or
reliability tricks are also more easily done.

Transparent networking.
While userland can do this in a library, this feature can be
very useful to achieve high availability because it makes
clustering at the filesystem level extremely easy (to name a
thing).

Sandboxing parts of the OS.
Finally it's possible to test new kernel parts without risking
the rest of your system. Debugging a new networking library
(kernel-level, that is) without endangering the rest of the
system to stray pointers. The sandboxing protection can always
be removed later when the new kernel addition has been found
to work properly.

While I agree with your point that computers will be both powerful
enough to do anything and never powerful enough not to need the
maximum level of optimization (this seemed to be what you were
saying and, however paradoxical it may seem, will probably be true
forever), I think you should take a more positive attitude towards
new system concepts.

Even if you don't like them, some very nice and useful spinoffs
could come from the research in those areas. We need diversity
in order to select the best alternative for the job at hand.

I won't try to convince anyone to turn Linux into such a system.
Not only doesn't it make sense, I wouldn't even want Linux if it
became like that -- other systems can take on other roles that
are to be played in the huge playing field that's out there...

Rik -- Open Source: you deserve to be in control of your data.
+-------------------------------------------------------------------+
| Le Reseau netwerksystemen BV: http://www.reseau.nl/ |
| Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ |
| Nederlandse Linux documentatie: http://www.nl.linux.org/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: "Eric S. Raymond" <e...@snark.thyrsus.com>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.gtrpfuv.1ag8pgq@ifi.uio.no>
X-Deja-AN: 491726329
Original-Date: Sun, 20 Jun 1999 08:39:05 -0400
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-Id: <199906201239.IAA11200@snark.thyrsus.com>
To: a...@lxorguk.ukuu.org.uk
X-Authentication-Warning: snark.thyrsus.com: 
esr set sender to e...@snark.thyrsus.com using -f
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

(Apologies for losing the thread ID. Alan's mail to me bounced.)

Alan Cox writes:
> > EROS is built around two fundamental and intertwined ideas. One is
> > that all data and code persistence is handled directly by the OS.
> > There is no file system. Yes, I said *no file system*. Instead,
> > everything is structures built in virtual memory and checkpointed out
> > to disk every so often (every five minutes in EROS). Want something?
> > Chase a pointer to it; EROS memory management does the rest.
> 
> This is actually an old idea. The problem that has never been solved well is
> recovery from errors. You lose 1% of your object store. How do you tidy up.
> 20% of your object store dies in a disk crash, how do you run an fscobject
> tool. You can do it, but you end up back with file system complexity and
> all the other fs stuff.

Accepting your analysis, it still seems to me there's a difference,
though. In an EROS-like world, you would only pay the complexity cost of doing
fscobject-like things in the postmortem analyzer that's trying to
stitch together the remaining pieces. You wouldn't have to pay that same
cost in the kernel for each and every access to persistent stuff; no
namespace management to worry about.

So, yes. An EROS-like architecture has the same error-recovery
problem that fsck addresses. But it appears to me, at least as far as
I've taken the logic, that that problem would be better contained than
in a Unix-like system.

> Another peril is that external interfaces don't always like replay of events.

A much more serious objection, I agree.

> You still end up with a lot of your objects having checkpoint/restart aware
> methods.

Yes, I grant that's true. (The way I'd put it is that you still need
something like commit/rollback in database-land.) But this is a solvable
problem. Butler Lampson showed years ago how to do provably correct
serialization of access to shared critical regions with timestamps
even in the absence of reliable locks. So as long as your
hypothetical user can't futz with the system clock...

> Moving just some objects between systems is fun too. You then get into
> cluster checkpointing, which is a field that requires you wear a pointy hat,
> have a beard and work for SGI or Digital.

Not something I have opinions about -- or am qualified to. :-)

> Their numbers are for a microkernelish core. They are still very good, but
> that includes basically no drivers, no network stack, no graphics and 
apparentlyno real checkpoint/restart in the face of 
corruption. I may be wrong on the
> last item.

You're probably right; I'm told all EROS actually does at this point
is run its own debugging and benchmarking tools. Still, the fact that
the test kernel can be that small is IMO an argument that the design
is sound.

> That nature of I/O is no different. If you always do large sequential
> block writes tell me how it will outperform a conventional OS if only
> a small number of changes in a small number of objects occur.

No seeks to read inodes, because the map from EROS's virtual-space
blocks to disk blocks is almost trivial (essentially the disks get
treated like a honkin' big series of swap volumes). So the disk
access pattern would be quite different, I think.

> Object stores are great models for some applications, thats why libraries
> for doing persistent object stores in application space exist (eg texas)
> 
> Another way to look at this
> 
> File System Object Store
> 
> Index Inode Number Object ID
> Update Look in directory Look in an object
> Find item Find item location
> Write(maybe COW) Write(maybe COW)
> Page In Look in directory Look in an object
> Find item Find item location
> Write(maybe COW) Write(maybe COW)
> Granularity User controlled Enforced by OS
> 
> 
> So if I promise to call my inodes object ids, call the directory structure
> "objects" and I have a checkpointing scheme - what is the great new concept.

That, under most circumstances, you don't have to manage persistence
yourself (or to put it more concretely, no explicit disk I/O in most
applications). That's clearly a huge win, even if you end up having to do 
more conventional-looking things in applications that require
commit/rollback.

And it's not clear to me that you do end up there; with one single
added atomic-flush primitive, I think you could use Lampson's
timestamp trickery to do reliable journalling without having to go all
the way to fs-like namespace management.

> o I don't think the object model is the good stuff

Even if you're right...

> o The security model is very very interesting indeed.

...this is still very true.

> o They are making it hard to help them however.

This is indeed true. However, I may have some leverage on a win-win
solution. But that's a topic for another day.

What I'm thinking is this: remember RT-Linux? Suppose the kernel were
a process running over an EROS-like layer...
--- 
<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

To make inexpensive guns impossible to get is to say that you're
putting a money test on getting a gun. It's racism in its worst form.
-- Roy Innis, president of the Congress of Racial Equality (CORE), 1988

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: torva...@transmeta.com (Linus Torvalds)
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.hue96fv.126odbb@ifi.uio.no>#1/1
X-Deja-AN: 491812574
Original-Date: 20 Jun 1999 18:05:28 GMT
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <7kjah8$g4k$1@palladium.transmeta.com>
References: <fa.gtrpfuv.1ag8pgq@ifi.uio.no>
To: linux-ker...@vger.rutgers.edu
Original-References: <199906201239.IAA11...@snark.thyrsus.com>
X-Authentication-Warning: palladium.transmeta.com: 
bin set sender to n...@transmeta.com using -f
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Transmeta Corporation, Santa Clara, CA
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

In article <199906201239.IAA11...@snark.thyrsus.com>,
Eric S. Raymond <e...@snark.thyrsus.com> wrote:
>
>You're probably right; I'm told all EROS actually does at this point
>is run its own debugging and benchmarking tools. Still, the fact that
>the test kernel can be that small is IMO an argument that the design
>is sound.

Why?

What is the correlation between "small" and "good"?

There's seldom any very strong correlation. Often the correlation is
negative.

Linux started out as 10k lines of code. Was that good? It's not 1.5M
lines of code. Is that bad?

Assuming something does the same thing as another, and is more efficient
at doing it (smaller, faster, whatever), that's good. But microkernels
are based on the notion that small is good even if it is NOT capable to
do the same things: a fundamentally flawed argument.

So mind explaining why you're using that argument?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: "Eric S. Raymond" <e...@thyrsus.com>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/20
Message-ID: <fa.fsltd0v.o6231o@ifi.uio.no>#1/1
X-Deja-AN: 491745814
Original-Date: Sun, 20 Jun 1999 10:04:54 -0400
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <19990620100454.A11271@thyrsus.com>
To: a...@lxorguk.ukuu.org.uk
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
X-Eric-Conspiracy: There is no conspiracy
Organization: Eric Conspiracy Secret Labs
Mime-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

Alan:
> That depends on if you want to consistency check your object store after a
> crash. Unless you journal the object store - which btw is hard. If you have
> two thousand inter-related objects you need to dump the set of them 
> consistently and in a snapshotted state.

I've been thinking about this since your last post. Seems to me the
primitive one needs is the ability to say "This object and all its
dependents need to be written atomically". Not too hard to imagine
how to do that given you already have enough of a VM system to do
copy-on-write. OK, you end up having to allocate in two different
spaces, one with atomicity constraints and one without. But it's
solvable. (See below on why this doesn't mean you end up journaling
everything).

> Why is having persistence managed by a library that is playing guessing games
> of your intent a good idea ? It has to know about object relationships, 
> potentially it has to blindly snapshot the entire system. It has to do a lot
> of work to know in detail what has changed.

For the *exact same* reasons that automatic memory management with
garbage collection is preferable to slinging your own buffers. Perl
and Python and Tcl are on the rise because, outside the kernel, accepting
all that complexity and the potential for buffer overruns just doesn't
make any damn sense with clocks and memory as cheap as they are now.

Remember, the name of the game in OS design is really to optimize for
least complexity overhead for the *application programmer* and *user*.
If this means accepting a marginally more complex and less efficient
OS substructure (like the difference between a journaled object store
and a file system with explicit I/O) then that's fine. But in fact I
think Shapiro makes strong arguments that an object store, done
properly, is *more* efficient.

> So all you have to do is export every object that this object refers to. Like
> the windowing environment, whoops oh dear.

Now you know it's not that bad in practice. Not all object references are
pointers. Some are capabilities and cookies that are persistent without
prearrangement. That's especially likely to be true of OS services, and 
especially if you design your API with that in mind.

> Suppose Eros was just a set of persistent object libraries that ran on
> top of numerous other platforms too, could be downloaded off the net and
> pretty well within the limits of the "programmer lazy, do more work than
> worked needed" paradigm.
> 
> ftp://ftp.cs.utexas.edu/pub/garbage/texas/README
> 
> And that is demonstrably the right way up. If you put a "lazy programmer"
> system at the bottom of an environment you prevent the smart programmer doing
> smart things. If your bottom layer is fundamentally ignorant of programmer
> provided clues you cripple the smart.

If that's true, why is Perl a success?

That's not intended to be a snarky question. Your argument here is
essentially the argument for malloc(3) as opposed to unlimited-extent
types and garbage collection. And the answer is the same: there comes
a point where the value of the optimization you can do with hints no
longer pays for the complexity overhead of having to do the storage
management yourself.

The EROS papers implicitly argue that we've reached that point not
just in memory management but with respect to the entire persistence 
problem. I'm inclined to agree with them.

At the very least, it's something that I think we'd all be better off
doing a little forward thinking about. As I said at the beginning of
the thread, I'm not after changing the whole architecture of Linux
right away; that would be silly and futile. But this exchange will
have achieved my purposes if it only plants a few conceptual seeds.
-- 
<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The right of self-defense is the first law of nature: in most
governments it has been the study of rulers to confine this right
within the narrowest limits possible. Wherever standing armies
are kept up, and when the right of the people to keep and bear
arms is, under any color or pretext whatsoever, prohibited,
liberty, if not already annihilated, is on the brink of
destruction." 
-- Henry St. George Tucker (in Blackstone's Commentaries)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: sh...@us.ibm.com
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/21
Message-ID: <fa.iaqmbav.1d222p4@ifi.uio.no>#1/1
X-Deja-AN: 492165399
Original-Date: Mon, 21 Jun 1999 13:04:42 -0400
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <85256797.005DDAA9.00@D51MTA03.pok.ibm.com>
To: Steve Underwood <ste...@netpage.com.hk>
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
X-Lotus-FromDomain: IBMUS
Mime-Version: 1.0
Newsgroups: fa.linux.kernel
Content-Disposition: inline
X-Loop: majord...@vger.rutgers.edu

> Linus Torvalds wrote:
> > In short: message passing as the fundamental operation of the OS is just
> > an excercise in computer science masturbation. It may feel good, but
> > you don't actually get anything DONE. Nobody has ever shown that it
> > made sense in the real world. It's basically just much simpler and
> > saner to have a function call interface, and for operations that are
> > non-local it gets transparently _promoted_ to a message. There's no
> > reason why it should be considered to be a message when it starts out.

With due libations to the Gods here, Linus is mistaken on all counts.

Moving a message from hither to yon *does* accomplish something: it moves a unit
of work from one protection/encapsulation domain to another. This may not be
necessary in your application, but it is vitally important in some. The claim
that nobody has ever shown benefit is also inaccurate. A considerable amount of
open literature on fault tolerant software exists to support the value of
message passing in certain applications. Consider in particular all of the
research reports out of Tandem. Also, note that all of the operating systems
whose software MTBF exceeds 1 yr make heavy use of protection domains.

More important, from my perspective, is that the comment about procedure calls
confuses the API for the semantics. Let's do an example. Consider the UNIX
read call read(fd, buf, sz) [I may have gotten the arg order wrong. It
doesn't matter]. Assume for a moment that we are implementing a single machine
system.

From an implementation perspective, there is absolutely NO performance
difference between the implementation of

read(fd,buf,sz)
and
fd->CALL(OP_READ,buf,sz)

The order of demultiplexing changes -- the read() call does the operation first
and the descriptor second, while the CALL does the descriptor type first and the
operation second, but precisely the same information is passed across the
user/supervisor boundary, and several implementations exist to show that they
are equivalently efficient.

Given this, there are compelling arguments for the second API:

1. By changing the order of demultiplexing, it offers the option of remoting at
a later time.
2. It allows objects to implement non-identical system call interfaces. This is
easily abused, but sometimes extremely valuable.
3. It offers the option of depriving the program of the ability to perform I/O
calls by ensuring that it has no objects that support I/O.

So: even if you think that message passing is not the way you wish to implement
things, object based APIs offer greater flexibility of implementation, and this
is generally a good thing.

Jonathan S. Shapiro, Ph. D.
IBM T.J. Watson Research Center
Email: sh...@us.ibm.com
Phone: +1 914 784 7085 (Tieline: 863)
Fax: +1 914 784 7595

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: torva...@transmeta.com (Linus Torvalds)
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/21
Message-ID: <fa.hvdn70v.146acrg@ifi.uio.no>#1/1
X-Deja-AN: 492195717
Original-Date: 21 Jun 1999 18:10:42 GMT
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <7klv72$kgh$1@palladium.transmeta.com>
References: <fa.iaqmbav.1d222p4@ifi.uio.no>
To: linux-ker...@vger.rutgers.edu
Original-References: <85256797.005DDAA9...@D51MTA03.pok.ibm.com>
X-Authentication-Warning: palladium.transmeta.com: 
bin set sender to n...@transmeta.com using -f
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Transmeta Corporation, Santa Clara, CA
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

In article <85256797.005DDAA9...@D51MTA03.pok.ibm.com>,
<sh...@us.ibm.com> wrote:
>> Linus Torvalds wrote:
>> > In short: message passing as the fundamental operation of the OS is just
>> > an excercise in computer science masturbation. It may feel good, but
>> > you don't actually get anything DONE. Nobody has ever shown that it
>> > made sense in the real world. It's basically just much simpler and
>> > saner to have a function call interface, and for operations that are
>> > non-local it gets transparently _promoted_ to a message. There's no
>> > reason why it should be considered to be a message when it starts out.
>
>With due libations to the Gods here, Linus is mistaken on all counts.

It's happened before, it will happen again. However, you had better come
up with a better argument before I believe it happened this time.

>Moving a message from hither to yon *does* accomplish something: it moves a unit
>of work from one protection/encapsulation domain to another.

Ehh.. In real operating systems, we call that event a "system call". 
No message necessary or implied, unless you want to call the notion of
switching privilege domains "messages" (and some people do: they call
them messages just to prove that messages are as fast as system calls. 
In logic, that's equivalent to proving that liver tastes as good as ice
cream by calling ice cream liver, and is in real life called "lying"). 

The system call may be turned into a message later if that turns out to
be a good idea, but it's nothing inherent. AND IT SHOULD NOT BE.

>So: even if you think that message passing is not the way you wish to implement
>things, object based APIs offer greater flexibility of implementation, and this
>is generally a good thing.

Object-based API's are a completely different issue (I removed your
argument, because I think it is completely irrelevant to "messages"). 

I don't think object-based approaches are bad. A lot of libraries
("stdio" in C) are based on that notion, and it's often the right way to
encapsulate information in user space. 

HOWEVER: that is not an OS boundary, and should not be considered to be
one. The _definition_ of a OS boundary is the boundary of protection
domains: the OS takes over where the library no longer has the
appropriate privileges to access the object any more. Because if the
library could do the operation, it should - instead of bothering the OS
with it. 

So in effect, at the OS boundary the object has to be pretty much
completely opaque, or it shouldn't be considered an OS boundary in the
first place. QED. 

That's why the OS boundary HAS to be equivalent to

read(handle, buffer, size)

and NOT be equivalent to

handle->op(READ, buffer, size);

because by definition, if you can do the "handle->op" lookup, then it's
not a OS boundary any more - or at least it is a very BAD one. See?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/21
Message-ID: <fa.fccckfv.12husbt@ifi.uio.no>#1/1
X-Deja-AN: 492280290
Original-Date: Sun, 20 Jun 1999 22:09:55 +0200
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <19990620220955.B102@elf.ucw.cz>
References: <fa.j0u16ev.cm0d9i@ifi.uio.no>
To: Linus Torvalds <torva...@transmeta.com>, linux-ker...@vger.rutgers.edu
Original-References: <199906200457.AAA10...@snark.thyrsus.com> 
<Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org> 
<7kj9p1$fq...@palladium.transmeta.com>
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
Mime-Version: 1.0
X-Warning: Reading this can be dangerous to your mental health.
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

Hi!

> In short: message passing as the fundamental operation of the OS is just
> an excercise in computer science masturbation.  It may feel good, but
> you don't actually get anything DONE.  Nobody has ever shown that it
> made sense in the real world.  It's basically just much simpler and
> saner to have a function call interface, and for operations that are
> non-local it gets transparently _promoted_ to a message.  There's no
> reason why it should be considered to be a message when it starts out. 

Well - there is. Because function calling leads to things like
ioctl(). And ioctl() is _evil_. Yes, linux-kernel interface without
ioctl-like things would be ok with me. Even ioctl() which is _always_
given a structure which begins with its own length would be ok. But
ioctl() as it is today is evil, because you may pass horrible things
like linklist of things to do. And it is hard to marshall _that_.

Linus, do you plan some kind of clustering support into linux? If
someone gave you simple syscall-over-net forwarder for linux, would
you like it?

								Pavel
PS: Well - there is such forwarder in development around here. It does
not forward ioctl()s for obvious reasons :-). Major thing for
clustering seems to be 32bit pids just now.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Matthew Wilcox <Matthew.Wil...@genedata.com>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/22
Message-ID: <fa.lbeetfv.b66p8b@ifi.uio.no>#1/1
X-Deja-AN: 492507626
Original-Date: Tue, 22 Jun 1999 09:29:52 +0200
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <19990622092952.E30370@mencheca.ch.genedata.com>
References: <fa.fccckfv.12husbt@ifi.uio.no>
To: Pavel Machek <pa...@Elf.ucw.cz>
Original-References: <199906200457.AAA10...@snark.thyrsus.com> 
<Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org> 
<7kj9p1$fq...@palladium.transmeta.com> <19990620220955.B...@elf.ucw.cz>
Content-Type: text/plain; charset=us-ascii
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
Mime-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

On Sun, Jun 20, 1999 at 10:09:55PM +0200, Pavel Machek wrote:
> Well - there is. Because function calling leads to things like
> ioctl(). And ioctl() is _evil_. Yes, linux-kernel interface without
> ioctl-like things would be ok with me. Even ioctl() which is _always_
> given a structure which begins with its own length would be ok. But
> ioctl() as it is today is evil, because you may pass horrible things
> like linklist of things to do. And it is hard to marshall _that_.

Surely the sensible way of doing this is to define an ioctl2() system
call which is given a length. I imagine we would then add an ioctl2()
method to struct file_operations, and fall back to ioctl() (trimming
off the length word) for compatibility.

I wonder if we can do this in a clever enough way to renumber all the
old definitions of ioctl numbers.

(from:

#define LOOP_SET_FD 0x4C00

to:

#define VIDIOCGCAP _IOR('v',1,struct video_capability)

)

The alternative would be to drop ioctl altogether and replace it with a
different interface. plan9 uses ctl files -- you write strings to them
to perform commands. But I'm not sure people are willing to make that
kind of radical change (certainly not within the 2.3 timeframe).

-- 
Matthew Wilcox <wi...@bofh.ai>
"Windows and MacOS are products, contrived by engineers in the service of
specific companies. Unix, by contrast, is not so much a product as it is a
painstakingly compiled oral history of the hacker subculture." - N Stephenson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds <torva...@transmeta.com>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/22
Message-ID: <fa.ndjckfv.mmsv2t@ifi.uio.no>#1/1
X-Deja-AN: 492507629
Original-Date: Tue, 22 Jun 1999 00:41:33 -0700 (PDT)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <Pine.LNX.3.95.990622003437.8298A-100000@palladium.transmeta.com>
References: <fa.lbeetfv.b66p8b@ifi.uio.no>
To: Matthew Wilcox <Matthew.Wil...@genedata.com>
X-Authentication-Warning: palladium.transmeta.com: torvalds owned process doing -bs
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

On Tue, 22 Jun 1999, Matthew Wilcox wrote:
> 
> Surely the sensible way of doing this is to define an ioctl2() system
> call which is given a length. I imagine we would then add an ioctl2()
> method to struct file_operations, and fall back to ioctl() (trimming
> off the length word) for compatibility.

Actually, for ioctl, you definitely do want to have both a command and a
reply, so something like this would work:

int control(int fd, unsigned int code, void *in, int in_size, void *out, int out_size)

and yes, I agree that "ioctl()" and "fcntl()" as they currently stand are
just horribly ugly, and they are probably one of the worst features of
UNIX as a design.

There's a few other things that could be handled more cleanly with just a
single "control" interface - things like socket options etc (which as they
stand now are yet another special case).

Something like the above is actually what a lot of UNIX systems try to
encode in the ioctl number - the number often has the size and the
direction encoded in it. Linux tries to do it for some things, but it's
not enforced due to historical baggage.

And notice how it's not getting to be really pretty whatever you do: even
if ioctl() and friends had a nicer interface, they'd still be just a ugly
sideband channel to whatever the fd is connected to.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Rik van Riel <r...@nl.linux.org>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/22
Message-ID: <fa.nd9rqnv.1nl00bo@ifi.uio.no>#1/1
X-Deja-AN: 492663808
Original-Date: Tue, 22 Jun 1999 22:31:29 +0200 (CEST)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <Pine.LNX.4.03.9906222222210.4395-100000@mirkwood.nl.linux.org>
References: <fa.j0u16ev.cm0d9i@ifi.uio.no>
To: Linus Torvalds <torva...@transmeta.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Search-Engine-Bait: http://humbolt.nl.linux.org/
X-Orcpt: rfc822;linux-kernel-outgoing-dig
X-My-Own-Server: http://www.nl.linux.org/
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

[I've thought about this long and hard and I've finally come
up with a proper response to Linus' argument]

On 20 Jun 1999, Linus Torvalds wrote:

> In short: message passing as the fundamental operation of the OS
> is just an excercise in computer science masturbation. It may
> feel good, but you don't actually get anything DONE. Nobody has
> ever shown that it made sense in the real world.

It's not about physical message passing in the actual implementation,
what's really happening can be 'hidden' by clever programming by the
people who built the OS.

The real issue here is paradigms. The classical "everything's
a file" broke down with the advent of networking, sockets and
non-blocking reads. At the moment the file paradigm is so much
out of touch with computational reality that web servers need
to fork for each client and people are crying out for asynchronous
sendfile and other weird interfaces.

A new "everything's a message" WILL fit the current use of computers
though. One simple concept that's good enough for all our
computational needs. And because it _is_ one simple concept, it can
be implemented in a simple, clean and fast way -- unlike the myriad
of different kludges Unix has to overcome the file paradigm...

Of course, I'll be using Unix for the forseeing future -- it does
all that it needs to do and it's got all the luxuries I want :)

regards,

Rik -- Open Source: you deserve to be in control of your data.
+-------------------------------------------------------------------+
| Le Reseau netwerksystemen BV: http://www.reseau.nl/ |
| Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ |
| Nederlandse Linux documentatie: http://www.nl.linux.org/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

From: Linus Torvalds <torva...@transmeta.com>
Subject: Re: Some very thought-provoking ideas about OS architecture.
Date: 1999/06/22
Message-ID: <fa.o99vgfv.ikc6jf@ifi.uio.no>#1/1
X-Deja-AN: 492677660
Original-Date: Tue, 22 Jun 1999 14:04:32 -0700 (PDT)
Sender: owner-linux-ker...@vger.rutgers.edu
Original-Message-ID: <Pine.LNX.4.10.9906221359460.1122-100000@penguin.transmeta.com>
References: <fa.nd9rqnv.1nl00bo@ifi.uio.no>
To: Rik van Riel <r...@nl.linux.org>
X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Orcpt: rfc822;linux-kernel-outgoing-dig
Organization: Internet mailing list
MIME-Version: 1.0
Newsgroups: fa.linux.kernel
X-Loop: majord...@vger.rutgers.edu

On Tue, 22 Jun 1999, Rik van Riel wrote:
> 
> The real issue here is paradigms. The classical "everything's
> a file" broke down with the advent of networking, sockets and
> non-blocking reads. At the moment the file paradigm is so much
> out of touch with computational reality that web servers need
> to fork for each client and people are crying out for asynchronous
> sendfile and other weird interfaces.

Sure. But I think it's still a valid paradigm to consider "everything is a
stream of bytes". And that's _really_ what the UNIX paradigm has been from
the first: the whole notion of pipes etc is not all that different from
networking.

> A new "everything's a message" WILL fit the current use of computers
> though. One simple concept that's good enough for all our
> computational needs. And because it _is_ one simple concept, it can
> be implemented in a simple, clean and fast way -- unlike the myriad
> of different kludges Unix has to overcome the file paradigm...

I disagree.

The issue is not how you get the data from one place to the other:
"read()" is as good as way as "rcv()". Message passing is not the issue.

The real issue is _naming_, and that's not going away. The name space has
always been the difficult part. And that's where I agree that UNIX could
do better: I think we do want to move into a "web direction" where you can
just do a open("http://ssss.yyyyy.dd/~silly", O_RDONLY) and it does the
right thing.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/