From: "Eric S. Raymond" <e...@snark.thyrsus.com> Subject: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.gvbbgev.180mo0k@ifi.uio.no>#1/1 X-Deja-AN: 491626142 Original-Date: Sun, 20 Jun 1999 00:57:35 -0400 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <199906200457.AAA10204@snark.thyrsus.com> To: linux-ker...@vger.rutgers.edu X-Authentication-Warning: snark.thyrsus.com: esr set sender to e...@snark.thyrsus.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu (Please copy any replies to me explicitly, as I'm not presently subscribed to the linux-kernel list -- it's not practical when I'm spending so much time on the road.) Gents and ladies, I believe I have may have seen what comes after Unix. Not a half-step like Plan 9, but an advance in OS architecture as fundamental at Multics or Unix was in its day. As an old Unix hand myself, I don't make this claim lightly; I've been wrestling with it for a couple of weeks now. Nor am I suggesting we ought to drop what we're doing and hare off in a new direction. What I am suggesting is that Linus and the other kernel architects should be taking a hard look at this stuff and thinking about it. It may take a while for all the implications to sink in. They're huge. What comes after Unix will, I now believe, probably resemble at least in concept an experimental operating system called EROS. Full details are available at <http://www.eros-os.org/>, but for the impatient I'll review the high points here. EROS is built around two fundamental and intertwined ideas. One is that all data and code persistence is handled directly by the OS. There is no file system. Yes, I said *no file system*. Instead, everything is structures built in virtual memory and checkpointed out to disk every so often (every five minutes in EROS). Want something? Chase a pointer to it; EROS memory management does the rest. The second fundamental idea is that of a pure capability architecture with provably correct security. This is something like ACLs, except that an OS with ACLs on a file system has a hole in it; programs can communicate (in ways intended or unintended) through the file system that everybody shares access to. Capabilities plus checkpointing is a combination that turns out to have huge synergies. Obviously programming is a lot simpler -- no more hours and hours spent writing persistence/pickling/marshalling code. The OS kernel is a lot simpler too; I can't find the figure to be sure, but I believe EROS's is supposed to clock in at about 50K of code. Here's another: All disk I/O is huge sequential BLTs done as part of checkpoint operations. You can actually use close to 100% of your controller's bandwidth, as opposed to the 30%-50% typical for explicit-I/O operating systems that are doing seeks a lot of the time. This means the maximum I/O throughput the OS can handle effectively more than doubles. With simpler code. You could even afford the time to verify each checkpoint write... Here's a third: Had a crash or power-out? On reboot, the system simply picks up pointers to the last checkpointed state. Your OS, and all your applications, are back in thirty seconds. No fscks, ever again! And I haven't even talked about the advantages of capabilities over userids yet. I would, but I just realized I'm running out of time -- gotta get ready to fly to Seattle tomorrow to upset some stomachs at Microsoft. www.eros-os.org. Eric sez check it out. Mind-blowing stuff once you've had a few days to digest it. -- <a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a> The Bible is not my book, and Christianity is not my religion. I could never give assent to the long, complicated statements of Christian dogma. -- Abraham Lincoln - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.lhe9p7v.6gqp8q@ifi.uio.no>#1/1 X-Deja-AN: 491695539 Original-Date: Sun, 20 Jun 1999 11:40:51 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9906201138390.534-100000@mirkwood.nl.linux.org> References: <fa.gvbbgev.180mo0k@ifi.uio.no> To: "Eric S. Raymond" <e...@snark.thyrsus.com> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sun, 20 Jun 1999, Eric S. Raymond wrote: > What comes after Unix will, I now believe, probably resemble at > least in concept an experimental operating system called EROS. > Full details are available at <http://www.eros-os.org/>, but for > the impatient I'll review the high points here. Unfortunately, EROS is still based on the PC hardware as we've got it today and not modeled after a JINI-like appliances model (the network is the computer). With the death of the monolithic computer (if it happens) will come the death of Unix, Windows _and_ EROS. At the moment I can see only one Open Source system that could become ready for a world like that. Alliance OS (http://www.allos.org/). cheers, Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.j0u16ev.cm0d9i@ifi.uio.no> X-Deja-AN: 491808056 Original-Date: 20 Jun 1999 17:52:33 GMT Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <7kj9p1$fqn$1@palladium.transmeta.com> References: <fa.lhe9p7v.6gqp8q@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <199906200457.AAA10...@snark.thyrsus.com> <Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org> X-Authentication-Warning: palladium.transmeta.com: bin set sender to n...@transmeta.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu In article <Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org>, Rik van Riel <r...@nl.linux.org> wrote: > >Unfortunately, EROS is still based on the PC hardware as >we've got it today and not modeled after a JINI-like >appliances model (the network is the computer). > >With the death of the monolithic computer (if it happens) >will come the death of Unix, Windows _and_ EROS. That's a classic thing said by "OS Research People". And it's complete crap and idiocy, and I'm finally going to stand up and ask people to THINK instead of repeating the old and stinking dogma. It's _much_ better to have stand-alone appliances that can work well in a networked environment than to have a networked appliance. I don't understand people who think that "distribution" implies "collective". A distributed system should _not_ be composed of mindless worker ants that only work together with other mindless worker ants. A distributed system should be composed of individual stand-alone systems that can work together. They should be real systems in their own right, and have the power to make their own decisions. Too many distributed OS projects are thinking "bees in a hive" - while what you should aim for is "humans in society". I'll take humans over bees any day. Real OS's, with real operating systems. Monolithic, because they CAN stand alone, and in fact do most of their stuff without needing hand-holding every single minute. General-purpose instead of being able to do just one thing. >At the moment I can see only one Open Source system that >could become ready for a world like that. Alliance OS >(http://www.allos.org/). I will tell you anything based on message passing is stupid. It's very simple: - if you end up doing remote communication, the largest overhead is in the communication, not in how you initiate it. This is only going to be more true with mobile computing, not less. Ergo: optimizing for message passing is stupid. You should _always_ optimize for the local case, because it's the only case where the calling protocol really matters - once you go remote you have time to massage the arguments any which way you like. - Most operations are going to be local. Any operating system that starts out from the notion that most operations are going to be remote is going to die off as computers get more and more powerful. Things may start off distributed, but in the end network bandwidth is always going to be more expensive than CPU power. - Truly mobile computing implies that a noticeable portion of the time you do _not_ want to be in contact with any other computers. Your computer had better be a very capable one even on its own. Anybody who thinks anything else is just unbelievably misguided. This implies that your computer had better have a local filesystem, and had better be designed to work as well without any connectivity as it does _with_ connectivity. It can't communicate, but that shouldn't mean that it can't work. So right now people are pointing at PDA's, and saying that they should be running a "light" OS, all based on message passing, because obviously all the real work would be done on a server. It makes sense, no? NO. It does NOT make sense. People used to say the same thing about workstations: workstations used to be expensive and not quite powerful enough, and people wanted to have more than one. Where are those people today? Face it, the hardware just got so much better that suddenly REAL operating systems didn't have any of the alledged downsides, and while you obviously want the ability to communicate, you should not think that that is what you optimize for. The same is going to happen in the PDA space. Right now we have PalmOS. It's already doing internet connectivity, how much do you want to bet that in the not too distant future they'll want to offer more and more? There is no technical reason why a Palm in a few years won't have a few hundred megs of RAM and a CPU that is quite equipped to handle a real OS. (If they had selected the strongarm instead of a cut-down 68k it would already). In short: message passing as the fundamental operation of the OS is just an excercise in computer science masturbation. It may feel good, but you don't actually get anything DONE. Nobody has ever shown that it made sense in the real world. It's basically just much simpler and saner to have a function call interface, and for operations that are non-local it gets transparently _promoted_ to a message. There's no reason why it should be considered to be a message when it starts out. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.lgtnp7v.608p8h@ifi.uio.no>#1/1 X-Deja-AN: 491828237 Original-Date: Sun, 20 Jun 1999 21:06:59 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9906202033110.534-100000@mirkwood.nl.linux.org> References: <fa.j0u16ev.cm0d9i@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On 20 Jun 1999, Linus Torvalds wrote: > >Unfortunately, EROS is still based on the PC hardware as > >we've got it today and not modeled after a JINI-like > >appliances model (the network is the computer). > > That's a classic thing said by "OS Research People". Which I'm not :) > I don't understand people who think that "distribution" implies > "collective". > > A distributed system should be composed of individual stand-alone > systems that can work together. OK. I can agree with this. Still, Bill Joy can't be alltogether wrong, can he? This is somewhat of a dillemma -- having to choose between the paradigms of Bill Joy and Linus Torvalds... :) > >(http://www.allos.org/). > > I will tell you anything based on message passing is stupid. It's > very simple: [communications overhead is either in the actual communication channel or unneeded] The Alliance OS uses something like a shared library so that what looks like message passing from higher levels only is message passing when it needs to be -- otherwise the higher overhead is avoided. > - Most operations are going to be local. Optimizing for the local case doesn't mean that remote operations can't be made transparent. Because of latency problems, you are probably right though... > - Truly mobile computing implies that a noticeable portion of the time > you do _not_ want to be in contact with any other computers. Your > computer had better be a very capable one even on its own. It depends. If a computer is used as a way of getting at information, then you will want it to be connected. Mobile phones simply aren't very useful on the north pole, however well they might function on their own. Computing is more and more about communication and not about number-crunching or playing games -- which, I agree, can be done very well without network access. Even for things like a calendar you will want access to outside information. If you plan an appointment with someone else, you need to be able to communicate with eachother to agree on a date/time both of you are able to meet... > In short: message passing as the fundamental operation of the OS > is just an excercise in computer science masturbation. It may > feel good, but you don't actually get anything DONE. It can help achieve things we can't do with Linux: Upgrade (parts of) the OS while running. Since message passing objects are self-contained, you can replace them more easily than possible with 'classic' OSes. User process migration and other nice scalability and/or reliability tricks are also more easily done. Transparent networking. While userland can do this in a library, this feature can be very useful to achieve high availability because it makes clustering at the filesystem level extremely easy (to name a thing). Sandboxing parts of the OS. Finally it's possible to test new kernel parts without risking the rest of your system. Debugging a new networking library (kernel-level, that is) without endangering the rest of the system to stray pointers. The sandboxing protection can always be removed later when the new kernel addition has been found to work properly. While I agree with your point that computers will be both powerful enough to do anything and never powerful enough not to need the maximum level of optimization (this seemed to be what you were saying and, however paradoxical it may seem, will probably be true forever), I think you should take a more positive attitude towards new system concepts. Even if you don't like them, some very nice and useful spinoffs could come from the research in those areas. We need diversity in order to select the best alternative for the job at hand. I won't try to convince anyone to turn Linux into such a system. Not only doesn't it make sense, I wouldn't even want Linux if it became like that -- other systems can take on other roles that are to be played in the huge playing field that's out there... Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: "Eric S. Raymond" <e...@snark.thyrsus.com> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.gtrpfuv.1ag8pgq@ifi.uio.no> X-Deja-AN: 491726329 Original-Date: Sun, 20 Jun 1999 08:39:05 -0400 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-Id: <199906201239.IAA11200@snark.thyrsus.com> To: a...@lxorguk.ukuu.org.uk X-Authentication-Warning: snark.thyrsus.com: esr set sender to e...@snark.thyrsus.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu (Apologies for losing the thread ID. Alan's mail to me bounced.) Alan Cox writes: > > EROS is built around two fundamental and intertwined ideas. One is > > that all data and code persistence is handled directly by the OS. > > There is no file system. Yes, I said *no file system*. Instead, > > everything is structures built in virtual memory and checkpointed out > > to disk every so often (every five minutes in EROS). Want something? > > Chase a pointer to it; EROS memory management does the rest. > > This is actually an old idea. The problem that has never been solved well is > recovery from errors. You lose 1% of your object store. How do you tidy up. > 20% of your object store dies in a disk crash, how do you run an fscobject > tool. You can do it, but you end up back with file system complexity and > all the other fs stuff. Accepting your analysis, it still seems to me there's a difference, though. In an EROS-like world, you would only pay the complexity cost of doing fscobject-like things in the postmortem analyzer that's trying to stitch together the remaining pieces. You wouldn't have to pay that same cost in the kernel for each and every access to persistent stuff; no namespace management to worry about. So, yes. An EROS-like architecture has the same error-recovery problem that fsck addresses. But it appears to me, at least as far as I've taken the logic, that that problem would be better contained than in a Unix-like system. > Another peril is that external interfaces don't always like replay of events. A much more serious objection, I agree. > You still end up with a lot of your objects having checkpoint/restart aware > methods. Yes, I grant that's true. (The way I'd put it is that you still need something like commit/rollback in database-land.) But this is a solvable problem. Butler Lampson showed years ago how to do provably correct serialization of access to shared critical regions with timestamps even in the absence of reliable locks. So as long as your hypothetical user can't futz with the system clock... > Moving just some objects between systems is fun too. You then get into > cluster checkpointing, which is a field that requires you wear a pointy hat, > have a beard and work for SGI or Digital. Not something I have opinions about -- or am qualified to. :-) > Their numbers are for a microkernelish core. They are still very good, but > that includes basically no drivers, no network stack, no graphics and apparentlyno real checkpoint/restart in the face of corruption. I may be wrong on the > last item. You're probably right; I'm told all EROS actually does at this point is run its own debugging and benchmarking tools. Still, the fact that the test kernel can be that small is IMO an argument that the design is sound. > That nature of I/O is no different. If you always do large sequential > block writes tell me how it will outperform a conventional OS if only > a small number of changes in a small number of objects occur. No seeks to read inodes, because the map from EROS's virtual-space blocks to disk blocks is almost trivial (essentially the disks get treated like a honkin' big series of swap volumes). So the disk access pattern would be quite different, I think. > Object stores are great models for some applications, thats why libraries > for doing persistent object stores in application space exist (eg texas) > > Another way to look at this > > File System Object Store > > Index Inode Number Object ID > Update Look in directory Look in an object > Find item Find item location > Write(maybe COW) Write(maybe COW) > Page In Look in directory Look in an object > Find item Find item location > Write(maybe COW) Write(maybe COW) > Granularity User controlled Enforced by OS > > > So if I promise to call my inodes object ids, call the directory structure > "objects" and I have a checkpointing scheme - what is the great new concept. That, under most circumstances, you don't have to manage persistence yourself (or to put it more concretely, no explicit disk I/O in most applications). That's clearly a huge win, even if you end up having to do more conventional-looking things in applications that require commit/rollback. And it's not clear to me that you do end up there; with one single added atomic-flush primitive, I think you could use Lampson's timestamp trickery to do reliable journalling without having to go all the way to fs-like namespace management. > o I don't think the object model is the good stuff Even if you're right... > o The security model is very very interesting indeed. ...this is still very true. > o They are making it hard to help them however. This is indeed true. However, I may have some leverage on a win-win solution. But that's a topic for another day. What I'm thinking is this: remember RT-Linux? Suppose the kernel were a process running over an EROS-like layer... --- <a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a> To make inexpensive guns impossible to get is to say that you're putting a money test on getting a gun. It's racism in its worst form. -- Roy Innis, president of the Congress of Racial Equality (CORE), 1988 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.hue96fv.126odbb@ifi.uio.no>#1/1 X-Deja-AN: 491812574 Original-Date: 20 Jun 1999 18:05:28 GMT Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <7kjah8$g4k$1@palladium.transmeta.com> References: <fa.gtrpfuv.1ag8pgq@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <199906201239.IAA11...@snark.thyrsus.com> X-Authentication-Warning: palladium.transmeta.com: bin set sender to n...@transmeta.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu In article <199906201239.IAA11...@snark.thyrsus.com>, Eric S. Raymond <e...@snark.thyrsus.com> wrote: > >You're probably right; I'm told all EROS actually does at this point >is run its own debugging and benchmarking tools. Still, the fact that >the test kernel can be that small is IMO an argument that the design >is sound. Why? What is the correlation between "small" and "good"? There's seldom any very strong correlation. Often the correlation is negative. Linux started out as 10k lines of code. Was that good? It's not 1.5M lines of code. Is that bad? Assuming something does the same thing as another, and is more efficient at doing it (smaller, faster, whatever), that's good. But microkernels are based on the notion that small is good even if it is NOT capable to do the same things: a fundamentally flawed argument. So mind explaining why you're using that argument? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: "Eric S. Raymond" <e...@thyrsus.com> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/20 Message-ID: <fa.fsltd0v.o6231o@ifi.uio.no>#1/1 X-Deja-AN: 491745814 Original-Date: Sun, 20 Jun 1999 10:04:54 -0400 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <19990620100454.A11271@thyrsus.com> To: a...@lxorguk.ukuu.org.uk Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig X-Eric-Conspiracy: There is no conspiracy Organization: Eric Conspiracy Secret Labs Mime-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Alan: > That depends on if you want to consistency check your object store after a > crash. Unless you journal the object store - which btw is hard. If you have > two thousand inter-related objects you need to dump the set of them > consistently and in a snapshotted state. I've been thinking about this since your last post. Seems to me the primitive one needs is the ability to say "This object and all its dependents need to be written atomically". Not too hard to imagine how to do that given you already have enough of a VM system to do copy-on-write. OK, you end up having to allocate in two different spaces, one with atomicity constraints and one without. But it's solvable. (See below on why this doesn't mean you end up journaling everything). > Why is having persistence managed by a library that is playing guessing games > of your intent a good idea ? It has to know about object relationships, > potentially it has to blindly snapshot the entire system. It has to do a lot > of work to know in detail what has changed. For the *exact same* reasons that automatic memory management with garbage collection is preferable to slinging your own buffers. Perl and Python and Tcl are on the rise because, outside the kernel, accepting all that complexity and the potential for buffer overruns just doesn't make any damn sense with clocks and memory as cheap as they are now. Remember, the name of the game in OS design is really to optimize for least complexity overhead for the *application programmer* and *user*. If this means accepting a marginally more complex and less efficient OS substructure (like the difference between a journaled object store and a file system with explicit I/O) then that's fine. But in fact I think Shapiro makes strong arguments that an object store, done properly, is *more* efficient. > So all you have to do is export every object that this object refers to. Like > the windowing environment, whoops oh dear. Now you know it's not that bad in practice. Not all object references are pointers. Some are capabilities and cookies that are persistent without prearrangement. That's especially likely to be true of OS services, and especially if you design your API with that in mind. > Suppose Eros was just a set of persistent object libraries that ran on > top of numerous other platforms too, could be downloaded off the net and > pretty well within the limits of the "programmer lazy, do more work than > worked needed" paradigm. > > ftp://ftp.cs.utexas.edu/pub/garbage/texas/README > > And that is demonstrably the right way up. If you put a "lazy programmer" > system at the bottom of an environment you prevent the smart programmer doing > smart things. If your bottom layer is fundamentally ignorant of programmer > provided clues you cripple the smart. If that's true, why is Perl a success? That's not intended to be a snarky question. Your argument here is essentially the argument for malloc(3) as opposed to unlimited-extent types and garbage collection. And the answer is the same: there comes a point where the value of the optimization you can do with hints no longer pays for the complexity overhead of having to do the storage management yourself. The EROS papers implicitly argue that we've reached that point not just in memory management but with respect to the entire persistence problem. I'm inclined to agree with them. At the very least, it's something that I think we'd all be better off doing a little forward thinking about. As I said at the beginning of the thread, I'm not after changing the whole architecture of Linux right away; that would be silly and futile. But this exchange will have achieved my purposes if it only plants a few conceptual seeds. -- <a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a> The right of self-defense is the first law of nature: in most governments it has been the study of rulers to confine this right within the narrowest limits possible. Wherever standing armies are kept up, and when the right of the people to keep and bear arms is, under any color or pretext whatsoever, prohibited, liberty, if not already annihilated, is on the brink of destruction." -- Henry St. George Tucker (in Blackstone's Commentaries) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: sh...@us.ibm.com Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/21 Message-ID: <fa.iaqmbav.1d222p4@ifi.uio.no>#1/1 X-Deja-AN: 492165399 Original-Date: Mon, 21 Jun 1999 13:04:42 -0400 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <85256797.005DDAA9.00@D51MTA03.pok.ibm.com> To: Steve Underwood <ste...@netpage.com.hk> Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list X-Lotus-FromDomain: IBMUS Mime-Version: 1.0 Newsgroups: fa.linux.kernel Content-Disposition: inline X-Loop: majord...@vger.rutgers.edu > Linus Torvalds wrote: > > In short: message passing as the fundamental operation of the OS is just > > an excercise in computer science masturbation. It may feel good, but > > you don't actually get anything DONE. Nobody has ever shown that it > > made sense in the real world. It's basically just much simpler and > > saner to have a function call interface, and for operations that are > > non-local it gets transparently _promoted_ to a message. There's no > > reason why it should be considered to be a message when it starts out. With due libations to the Gods here, Linus is mistaken on all counts. Moving a message from hither to yon *does* accomplish something: it moves a unit of work from one protection/encapsulation domain to another. This may not be necessary in your application, but it is vitally important in some. The claim that nobody has ever shown benefit is also inaccurate. A considerable amount of open literature on fault tolerant software exists to support the value of message passing in certain applications. Consider in particular all of the research reports out of Tandem. Also, note that all of the operating systems whose software MTBF exceeds 1 yr make heavy use of protection domains. More important, from my perspective, is that the comment about procedure calls confuses the API for the semantics. Let's do an example. Consider the UNIX read call read(fd, buf, sz) [I may have gotten the arg order wrong. It doesn't matter]. Assume for a moment that we are implementing a single machine system. From an implementation perspective, there is absolutely NO performance difference between the implementation of read(fd,buf,sz) and fd->CALL(OP_READ,buf,sz) The order of demultiplexing changes -- the read() call does the operation first and the descriptor second, while the CALL does the descriptor type first and the operation second, but precisely the same information is passed across the user/supervisor boundary, and several implementations exist to show that they are equivalently efficient. Given this, there are compelling arguments for the second API: 1. By changing the order of demultiplexing, it offers the option of remoting at a later time. 2. It allows objects to implement non-identical system call interfaces. This is easily abused, but sometimes extremely valuable. 3. It offers the option of depriving the program of the ability to perform I/O calls by ensuring that it has no objects that support I/O. So: even if you think that message passing is not the way you wish to implement things, object based APIs offer greater flexibility of implementation, and this is generally a good thing. Jonathan S. Shapiro, Ph. D. IBM T.J. Watson Research Center Email: sh...@us.ibm.com Phone: +1 914 784 7085 (Tieline: 863) Fax: +1 914 784 7595 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: torva...@transmeta.com (Linus Torvalds) Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/21 Message-ID: <fa.hvdn70v.146acrg@ifi.uio.no>#1/1 X-Deja-AN: 492195717 Original-Date: 21 Jun 1999 18:10:42 GMT Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <7klv72$kgh$1@palladium.transmeta.com> References: <fa.iaqmbav.1d222p4@ifi.uio.no> To: linux-ker...@vger.rutgers.edu Original-References: <85256797.005DDAA9...@D51MTA03.pok.ibm.com> X-Authentication-Warning: palladium.transmeta.com: bin set sender to n...@transmeta.com using -f X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Transmeta Corporation, Santa Clara, CA Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu In article <85256797.005DDAA9...@D51MTA03.pok.ibm.com>, <sh...@us.ibm.com> wrote: >> Linus Torvalds wrote: >> > In short: message passing as the fundamental operation of the OS is just >> > an excercise in computer science masturbation. It may feel good, but >> > you don't actually get anything DONE. Nobody has ever shown that it >> > made sense in the real world. It's basically just much simpler and >> > saner to have a function call interface, and for operations that are >> > non-local it gets transparently _promoted_ to a message. There's no >> > reason why it should be considered to be a message when it starts out. > >With due libations to the Gods here, Linus is mistaken on all counts. It's happened before, it will happen again. However, you had better come up with a better argument before I believe it happened this time. >Moving a message from hither to yon *does* accomplish something: it moves a unit >of work from one protection/encapsulation domain to another. Ehh.. In real operating systems, we call that event a "system call". No message necessary or implied, unless you want to call the notion of switching privilege domains "messages" (and some people do: they call them messages just to prove that messages are as fast as system calls. In logic, that's equivalent to proving that liver tastes as good as ice cream by calling ice cream liver, and is in real life called "lying"). The system call may be turned into a message later if that turns out to be a good idea, but it's nothing inherent. AND IT SHOULD NOT BE. >So: even if you think that message passing is not the way you wish to implement >things, object based APIs offer greater flexibility of implementation, and this >is generally a good thing. Object-based API's are a completely different issue (I removed your argument, because I think it is completely irrelevant to "messages"). I don't think object-based approaches are bad. A lot of libraries ("stdio" in C) are based on that notion, and it's often the right way to encapsulate information in user space. HOWEVER: that is not an OS boundary, and should not be considered to be one. The _definition_ of a OS boundary is the boundary of protection domains: the OS takes over where the library no longer has the appropriate privileges to access the object any more. Because if the library could do the operation, it should - instead of bothering the OS with it. So in effect, at the OS boundary the object has to be pretty much completely opaque, or it shouldn't be considered an OS boundary in the first place. QED. That's why the OS boundary HAS to be equivalent to read(handle, buffer, size) and NOT be equivalent to handle->op(READ, buffer, size); because by definition, if you can do the "handle->op" lookup, then it's not a OS boundary any more - or at least it is a very BAD one. See? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/21 Message-ID: <fa.fccckfv.12husbt@ifi.uio.no>#1/1 X-Deja-AN: 492280290 Original-Date: Sun, 20 Jun 1999 22:09:55 +0200 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <19990620220955.B102@elf.ucw.cz> References: <fa.j0u16ev.cm0d9i@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com>, linux-ker...@vger.rutgers.edu Original-References: <199906200457.AAA10...@snark.thyrsus.com> <Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org> <7kj9p1$fq...@palladium.transmeta.com> Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Mime-Version: 1.0 X-Warning: Reading this can be dangerous to your mental health. Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu Hi! > In short: message passing as the fundamental operation of the OS is just > an excercise in computer science masturbation. It may feel good, but > you don't actually get anything DONE. Nobody has ever shown that it > made sense in the real world. It's basically just much simpler and > saner to have a function call interface, and for operations that are > non-local it gets transparently _promoted_ to a message. There's no > reason why it should be considered to be a message when it starts out. Well - there is. Because function calling leads to things like ioctl(). And ioctl() is _evil_. Yes, linux-kernel interface without ioctl-like things would be ok with me. Even ioctl() which is _always_ given a structure which begins with its own length would be ok. But ioctl() as it is today is evil, because you may pass horrible things like linklist of things to do. And it is hard to marshall _that_. Linus, do you plan some kind of clustering support into linux? If someone gave you simple syscall-over-net forwarder for linux, would you like it? Pavel PS: Well - there is such forwarder in development around here. It does not forward ioctl()s for obvious reasons :-). Major thing for clustering seems to be 32bit pids just now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Wilcox <Matthew.Wil...@genedata.com> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/22 Message-ID: <fa.lbeetfv.b66p8b@ifi.uio.no>#1/1 X-Deja-AN: 492507626 Original-Date: Tue, 22 Jun 1999 09:29:52 +0200 Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <19990622092952.E30370@mencheca.ch.genedata.com> References: <fa.fccckfv.12husbt@ifi.uio.no> To: Pavel Machek <pa...@Elf.ucw.cz> Original-References: <199906200457.AAA10...@snark.thyrsus.com> <Pine.LNX.4.03.9906201138390.534-100...@mirkwood.nl.linux.org> <7kj9p1$fq...@palladium.transmeta.com> <19990620220955.B...@elf.ucw.cz> Content-Type: text/plain; charset=us-ascii X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list Mime-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Sun, Jun 20, 1999 at 10:09:55PM +0200, Pavel Machek wrote: > Well - there is. Because function calling leads to things like > ioctl(). And ioctl() is _evil_. Yes, linux-kernel interface without > ioctl-like things would be ok with me. Even ioctl() which is _always_ > given a structure which begins with its own length would be ok. But > ioctl() as it is today is evil, because you may pass horrible things > like linklist of things to do. And it is hard to marshall _that_. Surely the sensible way of doing this is to define an ioctl2() system call which is given a length. I imagine we would then add an ioctl2() method to struct file_operations, and fall back to ioctl() (trimming off the length word) for compatibility. I wonder if we can do this in a clever enough way to renumber all the old definitions of ioctl numbers. (from: #define LOOP_SET_FD 0x4C00 to: #define VIDIOCGCAP _IOR('v',1,struct video_capability) ) The alternative would be to drop ioctl altogether and replace it with a different interface. plan9 uses ctl files -- you write strings to them to perform commands. But I'm not sure people are willing to make that kind of radical change (certainly not within the 2.3 timeframe). -- Matthew Wilcox <wi...@bofh.ai> "Windows and MacOS are products, contrived by engineers in the service of specific companies. Unix, by contrast, is not so much a product as it is a painstakingly compiled oral history of the hacker subculture." - N Stephenson - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds <torva...@transmeta.com> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/22 Message-ID: <fa.ndjckfv.mmsv2t@ifi.uio.no>#1/1 X-Deja-AN: 492507629 Original-Date: Tue, 22 Jun 1999 00:41:33 -0700 (PDT) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.3.95.990622003437.8298A-100000@palladium.transmeta.com> References: <fa.lbeetfv.b66p8b@ifi.uio.no> To: Matthew Wilcox <Matthew.Wil...@genedata.com> X-Authentication-Warning: palladium.transmeta.com: torvalds owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Tue, 22 Jun 1999, Matthew Wilcox wrote: > > Surely the sensible way of doing this is to define an ioctl2() system > call which is given a length. I imagine we would then add an ioctl2() > method to struct file_operations, and fall back to ioctl() (trimming > off the length word) for compatibility. Actually, for ioctl, you definitely do want to have both a command and a reply, so something like this would work: int control(int fd, unsigned int code, void *in, int in_size, void *out, int out_size) and yes, I agree that "ioctl()" and "fcntl()" as they currently stand are just horribly ugly, and they are probably one of the worst features of UNIX as a design. There's a few other things that could be handled more cleanly with just a single "control" interface - things like socket options etc (which as they stand now are yet another special case). Something like the above is actually what a lot of UNIX systems try to encode in the ioctl number - the number often has the size and the direction encoded in it. Linux tries to do it for some things, but it's not enforced due to historical baggage. And notice how it's not getting to be really pretty whatever you do: even if ioctl() and friends had a nicer interface, they'd still be just a ugly sideband channel to whatever the fd is connected to. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@nl.linux.org> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/22 Message-ID: <fa.nd9rqnv.1nl00bo@ifi.uio.no>#1/1 X-Deja-AN: 492663808 Original-Date: Tue, 22 Jun 1999 22:31:29 +0200 (CEST) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.03.9906222222210.4395-100000@mirkwood.nl.linux.org> References: <fa.j0u16ev.cm0d9i@ifi.uio.no> To: Linus Torvalds <torva...@transmeta.com> Content-Type: TEXT/PLAIN; charset=US-ASCII X-Search-Engine-Bait: http://humbolt.nl.linux.org/ X-Orcpt: rfc822;linux-kernel-outgoing-dig X-My-Own-Server: http://www.nl.linux.org/ Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu [I've thought about this long and hard and I've finally come up with a proper response to Linus' argument] On 20 Jun 1999, Linus Torvalds wrote: > In short: message passing as the fundamental operation of the OS > is just an excercise in computer science masturbation. It may > feel good, but you don't actually get anything DONE. Nobody has > ever shown that it made sense in the real world. It's not about physical message passing in the actual implementation, what's really happening can be 'hidden' by clever programming by the people who built the OS. The real issue here is paradigms. The classical "everything's a file" broke down with the advent of networking, sockets and non-blocking reads. At the moment the file paradigm is so much out of touch with computational reality that web servers need to fork for each client and people are crying out for asynchronous sendfile and other weird interfaces. A new "everything's a message" WILL fit the current use of computers though. One simple concept that's good enough for all our computational needs. And because it _is_ one simple concept, it can be implemented in a simple, clean and fast way -- unlike the myriad of different kludges Unix has to overcome the file paradigm... Of course, I'll be using Unix for the forseeing future -- it does all that it needs to do and it's got all the luxuries I want :) regards, Rik -- Open Source: you deserve to be in control of your data. +-------------------------------------------------------------------+ | Le Reseau netwerksystemen BV: http://www.reseau.nl/ | | Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ | | Nederlandse Linux documentatie: http://www.nl.linux.org/ | +-------------------------------------------------------------------+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds <torva...@transmeta.com> Subject: Re: Some very thought-provoking ideas about OS architecture. Date: 1999/06/22 Message-ID: <fa.o99vgfv.ikc6jf@ifi.uio.no>#1/1 X-Deja-AN: 492677660 Original-Date: Tue, 22 Jun 1999 14:04:32 -0700 (PDT) Sender: owner-linux-ker...@vger.rutgers.edu Original-Message-ID: <Pine.LNX.4.10.9906221359460.1122-100000@penguin.transmeta.com> References: <fa.nd9rqnv.1nl00bo@ifi.uio.no> To: Rik van Riel <r...@nl.linux.org> X-Authentication-Warning: penguin.transmeta.com: torvalds owned process doing -bs Content-Type: TEXT/PLAIN; charset=US-ASCII X-Orcpt: rfc822;linux-kernel-outgoing-dig Organization: Internet mailing list MIME-Version: 1.0 Newsgroups: fa.linux.kernel X-Loop: majord...@vger.rutgers.edu On Tue, 22 Jun 1999, Rik van Riel wrote: > > The real issue here is paradigms. The classical "everything's > a file" broke down with the advent of networking, sockets and > non-blocking reads. At the moment the file paradigm is so much > out of touch with computational reality that web servers need > to fork for each client and people are crying out for asynchronous > sendfile and other weird interfaces. Sure. But I think it's still a valid paradigm to consider "everything is a stream of bytes". And that's _really_ what the UNIX paradigm has been from the first: the whole notion of pipes etc is not all that different from networking. > A new "everything's a message" WILL fit the current use of computers > though. One simple concept that's good enough for all our > computational needs. And because it _is_ one simple concept, it can > be implemented in a simple, clean and fast way -- unlike the myriad > of different kludges Unix has to overcome the file paradigm... I disagree. The issue is not how you get the data from one place to the other: "read()" is as good as way as "rcv()". Message passing is not the issue. The real issue is _naming_, and that's not going away. The name space has always been the difficult part. And that's where I agree that UNIX could do better: I think we do want to move into a "web direction" where you can just do a open("http://ssss.yyyyy.dd/~silly", O_RDONLY) and it does the right thing. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/