Path: utzoo!attcan!uunet!husc6!uwvax!oddjob!gargoyle!att!alberta! calgary!dave From: d...@calgary.UUCP (Dave Mason) Newsgroups: comp.unix.wizards,comp.unix.questions Subject: Vax 11/780 performance vs Sun 4/280 performance Keywords: vax sun Message-ID: <1631@vaxb.calgary.UUCP> Date: 25 May 88 22:28:05 GMT Organization: U. of Calgary, Calgary, Ab. Lines: 31 We are planning to replace 2 of our Vax 11/780s with 2 Sun 4/280s. Each vax has 6 Mbytes of memory, 2 RA80 and 1 RA81, and 40 terminals. The vaxes are currently running 4.3 BSD + NFS (from Mt Xinu). Each sun is planned to have 32 Mbytes of memory, 2 of the new NEC disk drives and will be running the same 40 terminals. The vaxes are being used by undergrads doing pascal, f77 and C programming (compile and bomb). Most students use umacs (micro-emacs) as their text editor. What I was wondering is has anyone done a similiar type switchover? Is there a horendous degradation of response when the load average gets sufficiently high or does it degrade linearly with respect to load average? Is overall performance of a Sun 4/280 better/worse/the same as a similiarly loaded vax 11/780 (as configured above)? Were there any surprises when you did the switchover? My personal feeling is that we will win big, but the local DEC salesman is making noises about Sun 4/280 performance, especially with > 15 users. I just want to confirm if my opinion of the local DEC sales office is well founded :-). Please mail your responses. If there is sufficient interest I'll post a summary to the net. Thanks in advance for any comments. Dave Mason University of Calgary {ubc-cs,alberta,utai}!calgary!dave
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu! mailrus!ames!umd5!brl-adm!adm!weiser...@xerox.com From: weiser...@xerox.com Newsgroups: comp.unix.wizards Subject: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <14968@brl-adm.ARPA> Date: 27 May 88 17:08:54 GMT Sender: n...@brl-adm.ARPA Lines: 13 What your DEC salesperson may have heard, undoubtedly very indirectly, is that there is a knee in the performance curve of the Sun-4/280 at > 15 processes ready-to-run. This has nothing to do with > 15 users: more like a load average of > 15. Do your vaxes ever run with a load average of > 15? If not, ok. But, if they EVER hit 16 or 17, watch out on the Sun-4's: I can trivially get my Sun-4 completely wedged so I have to reboot with L1-A by just starting 19 little processes which sleep for 100ms, wake-up and sleep again. This doesn't even raise the load average (but amounts to a load average of 19 to the context switching mechanism, although not to the cpu). And the Sun-3's are no better: the knee there is >7 processes. -mark
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu! mailrus!ames!ncar!noao!arizona!modular!olson From: ol...@modular.UUCP (Jon Olson) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Summary: Re: Vax 11/780 performance vs Sun 3/Sun 4 performance Message-ID: <601@modular.UUCP> Date: 29 May 88 00:29:06 GMT References: <14968@brl-adm.ARPA> Organization: Modular Mining Systems, Tucson Lines: 48 > What your DEC salesperson may have heard, undoubtedly very indirectly, is that > there is a knee in the performance curve of the Sun-4/280 at > 15 processes > ready-to-run. This has nothing to do with > 15 users: more like a load average > of > 15. Do your vaxes ever run with a load average of > 15? If not, ok. But, > if they EVER hit 16 or 17, watch out on the Sun-4's: I can trivially get my > Sun-4 completely wedged so I have to reboot with L1-A by just starting 19 little > processes which sleep for 100ms, wake-up and sleep again. This doesn't even > raise the load average (but amounts to a load average of 19 to the context > switching mechanism, although not to the cpu). > > And the Sun-3's are no better: the knee there is >7 processes. > > -mark Nonsense, I just tried forking 32 copies of the following program on my Sun 3/60 workstation. Each one sleeps for 100 milliseconds, wakes up, and sleeps again. With 32 copies of it running, I could notice no difference in response time and a `ps aux' showed none of them using a significant amount of CPU time. Maybe you are just running out of memory and doing alot of swapping? What I have noticed on our Vax 11/780, running VMS, is that it is often equally slow with 1 user or 20 users. Possibly VMS avoids the `knee' by raising the priority of the NULL task when there aren't many people on the machine??? --------------------------------------------------- #include <sys/time.h> main() { struct timeval tv; tv.tv_sec = 0; tv.tv_usec = 100000; for( ;; ) select( 0, 0, 0, 0, &tv ); } -- Jon Olson, Modular Mining Systems USENET: {ihnp4,allegra,cmcl2,hao!noao}!arizona!modular!olson INTERNET: modular!ol...@arizona.edu -- Jon Olson, Modular Mining Systems USENET: {ihnp4,allegra,cmcl2,hao!noao}!arizona!modular!olson INTERNET: modular!ol...@arizona.edu
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis! tut.cis.ohio-state.edu!mailrus!ames!ncar!noao!arizona!modular! olson From: ol...@modular.UUCP (Jon Olson) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Summary: More Re: Sun 3/Sun performance Message-ID: <602@modular.UUCP> Date: 29 May 88 00:46:18 GMT References: <14968@brl-adm.ARPA> Organization: Modular Mining Systems, Tucson Lines: 9 I also tried forking 32 `for(;;) ;' loops on a 3/60 with 8-mb. Each process got about 3 percent of the CPU and the reponse was still quote good for interactive work. This stuff about a `knee' at 7 processes just isn't real... -- Jon Olson, Modular Mining Systems USENET: {ihnp4,allegra,cmcl2,hao!noao}!arizona!modular!olson INTERNET: modular!ol...@arizona.edu
Path: utzoo!attcan!uunet!husc6!mailrus!ames!umd5!brl-adm!adm! weiser...@xerox.com From: weiser...@xerox.com Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <15464@brl-adm.ARPA> Date: 31 May 88 19:36:40 GMT Sender: n...@brl-adm.ARPA Lines: 41 -------------------- Nonsense, I just tried forking 32 copies of the following program on my Sun 3/60 workstation. Each one sleeps for 100 milliseconds, wakes up, and sleeps again. With 32 copies of it running, I could notice no difference in response time and a `ps aux' showed none of them using a significant amount of CPU time. Maybe you are just running out of memory and doing alot of swapping? What I have noticed on our Vax 11/780, running VMS, is that it is often equally slow with 1 user or 20 users. Possibly VMS avoids the `knee' by raising the priority of the NULL task when there aren't many people on the machine??? #include <sys/time.h> main() { struct timeval tv; tv.tv_sec = 0; tv.tv_usec = 100000; for( ;; ) select( 0, 0, 0, 0, &tv ); } -------------------- No, not nonsense. I changed 100000 to 25000, and ran 18 of these on my Sun-4/260 with 120MB swap and 24MB ram, with very little else going on. Perfmeter shows no disk activity, ps aux shows each of the 18 using almost no cpu. (And each of the 18 has more than millisecond to get in and out of select, which is certainly enough). And the system is to its knees! (If it doesn't work for you, try 19 or 20 or 21). Window refreshes take 10's of seconds. If I kill off 3 of these, all is back to normal. I don't have a 60C to try this on. But, try reducing that delay factor and see if you don't also see a knee in the performance curve well before the cpu should be swamped. (And in any case, swamped cpu doesn't need to imply knee in the curve...) -mark
Path: utzoo!attcan!uunet!husc6!bloom-beacon!bu-cs!bzs From: b...@bu-cs.BU.EDU (Barry Shein) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <23027@bu-cs.BU.EDU> Date: 31 May 88 23:48:10 GMT References: <14968@brl-adm.ARPA> <264@sdba.UUCP> Organization: Boston U. Comp. Sci. Lines: 64 In-reply-to: stan@sdba.UUCP's message of 31 May 88 17:32:33 GMT Although I don't disagree with the original claim of Suns having knees (related to NeXT being pronounced Knee-zit? never mind) the discussion can lose sight of reality here. A 780 cost around $400K* and supported around 20 logins, a Sun4 or even Sun3/280 probably comes close to that in support for around 1/5 the price or less, and the CPU is much faster when a job gets it. If your Vax was horribly overloaded and had 32 users just buy more than one system and split the community, you'll also double the I/O paths that way and probably have at least one system up almost all the time (we NFS'd everything between our Suns in Math/Computer Science and Information Technology here so they can log into any of them although that does mean that if your home dir is on a down system you lose.) Also the cost of things like memory is so much lower that you can cheat like hell on getting performance. Who ever had a 32MB 780? That's practically a minimum config for a Sun4 server. The best use for a Sun server as a time-sharer is if a) you don't expect rapid growth in the number of logins (eg. doubling in a year) that will outgrow the machine and b) you expect a lot of the community using the system to migrate from dumb terminals to workstations in the reasonably near future, that way voila, you have the server, especially if each new workstation means one less time-sharer and it converges fairly rapidly. It's a nice way to give them time to get their financial act together to buy workstations. For example, for our CS and Math Faculty here having 3 servers worked out very well, many of the users have now grown into workstations and the server facilities were "just there". Another rationale of course is that you're looking for just a little system for perhaps a dozen or so peak load people, I don't know any system off-hand that can do that as nicely as a system like the above for the money. If your needs are much more in the domain of traditional time-sharing (eg. hordes of students that never ceases growing term to term, dumb terminals and staying that way for the next few years [typically, if you ever get them workstations you'll put an appropriate, separate, server in *that* budget]) then you probably want to look at something more expandable/upgradeable. I find Encores and (no direct experience but I hear good things) Sequents pretty close to perfect for that kind of usage. I'm sure there are others that will suffice but we don't use them so I can't comment (we have 7 Encores and over 100 Suns here.) Anyhow, seat-of-the-pants systems analysis on the net is probably a precarious thing at best, I hope I've pointed out the issues are several and small differences in two groups' needs can make any recommendation inapplicable. All I can say is we have quite a few Sun 3 servers here doing something resembling traditional time-sharing and everyone seems very happy with it. Given the right conditions it works out well, given the wrong ones no doubt it would be a nightmare, so what else is new? -Barry Shein, Boston University P.S. I have no vested interest in any of the above mentioned companies although I am on the Board of Directors of the Sun Users Group, I doubt that would be considered "vested". * Yes I realize that it's been almost 10 years since the 780 came out, but that was the original question.
Path: utzoo!dciem!nrcaer!scs!spl1!laidbak!att!osu-cis! tut.cis.ohio-state.edu!mailrus!ames!oliveb!pyramid!voder!lynx!m5 From: m...@lynx.UUCP (Mike McNally) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <3859@lynx.UUCP> Date: 3 Jun 88 23:59:15 GMT Article-I.D.: lynx.3859 References: <14968@brl-adm.ARPA> <601@modular.UUCP> <7331@swan.ulowell.edu> <2282@rpp386.UUCP> Reply-To: m...@lynx.UUCP (Mike McNally) Organization: Lynx Real-Time Systems Inc, Campbell CA Lines: 29 Summary: My $.02 Re: small processes that sleep-wakeup-sleep-wakeup... I tried this on my Integrated Solutions 68020 thing and got results similar to those of the Sun; that is, up to about 6 or 7 of them the system works fine, but after that everything gets real slow (I can't test it too much because everybody gets mad here when the machine freezes up). I tried the same thing under LynxOS, our own BSD-compatible real-time OS, and didn't notice very much degradation at all. A major difference between our machine and the Integrated Solutions is the MMU: even though our platform is a 68010, our MMU is 16K of static RAM that holds all the page tables all the time. Context switch time is thus real small. Also, I think it's possible that the mechanism for dealing with the timeout in select() is different internally under LynxOS as opposed to Unix. Of course, under the real-time OS, a high-priority CPU-bound task gets the whole CPU, no questions asked. That's a great way of degrading editor response :-). As a somewhat related side question, what does the Sun 4/SPARC MMU look like? Are lookaside buffer reloads done in software like on the MIPS R[23]000? (Is that really true about the R[23]000 anyhow?) -- Mike McNally of Lynx Real-Time Systems uucp: lynx!m5 (maybe pyramid!voder!lynx!m5 if lynx is unknown)
Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!ucsd! ucbvax!decwrl! pyramid!prls!mips!mash From: m...@mips.COM (John Mashey) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <2298@winchester.mips.COM> Date: 5 Jun 88 16:41:10 GMT References: <14968@brl-adm.ARPA> <601@modular.UUCP> <7331@swan.ulowell.edu> <2282@rpp386.UUCP> <3859@lynx.UUCP> Reply-To: m...@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 21 In article <3...@lynx.UUCP> m...@lynx.UUCP (Mike McNally) writes: ... >As a somewhat related side question, what does the Sun 4/SPARC MMU look >like? Are lookaside buffer reloads done in software like on the MIPS >R[23]000? (Is that really true about the R[23]000 anyhow?) The Sun-4 MMU, like earlier Suns, doesn't use a TLB, but has SRAMs for memory maps (16 contexts' worth, compared to 8 in Sun-3/200, for example). The R[23]000 indeed do TLB-miss refill handling in software; this is not unusual in RISC machines: HP Precision and AMD 29K (at least) do this also. The overall cost if typically 1% or less of CPU time, which is fairly competitive with hardware refill, especially since one of the larger costs on faster machines is the accumulated cache-miss penalty for fetching PTEs from memory. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR m...@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
Path: utzoo!attcan!uunet!seismo!rick From: r...@seismo.CSS.GOV (Rick Adams) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Summary: Sun 3/160 has a real knee at about 7 active processes Message-ID: <44365@beno.seismo.CSS.GOV> Date: 6 Jun 88 17:54:20 GMT References: <15875@brl-adm.ARPA> Organization: Center for Seismic Studies, Arlington, VA Lines: 9 Last year when seismo (a Sun 3/160) was still passing mail around, there was a VERY obvious performance degradation when the 8th or 9th sendmail became active. (No we didn't run out of memory. That happened at about 14 sendmails) I have always attributed it to the 7 user contexts. ---rick
Path: utzoo!attcan!uunet!husc6!mailrus!ames!elroy!cit-vax!mangler From: mang...@cit-vax.Caltech.Edu (Don Speck) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Summary: I/O throughput Message-ID: <6926@cit-vax.Caltech.Edu> Date: 13 Jun 88 08:58:03 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <7331@swan.ulowell.edu> <2282@rpp386.UUCP> Organization: California Institute of Technology Lines: 25 I am reminded of this article from comp.arch: In article <44...@beno.seismo.CSS.GOV>, r...@seismo.CSS.GOV (Rick Adams) writes: > Well, to start with I've got a Vax 11/780 with 7 6250 bpi 125 ips > tape drives on it. It performs adequately when they are all running. > I STILL haven't found anything to replace it with for a reasonable amount > of money. Nothing in the Sun price range can handle that I/O volume. I've seen a PDP-11/70 with eight tape drives, too. And as Barry Shein said, "An IBM mainframe is an awesome thing...". One weekend, noticing the 4341 spinning a pair of GCR drives at over half their rated 275 ips, I was shocked to learn that it was reading the disk file-by-file, not track at a time. BSD filesystems just can't compare to what this 2-MIPS machine could do with apparent ease. How do they get that kind of throughput? I refuse to believe that it's all hardware. Mainframe disks rotate at 3600 RPM like everybody else's and their 3 MB/s transfer rate is only slightly higher than a SuperEagle. A 2-MIPS CPU would be inadequate to run a BSD filesystem at those speeds, so obviously their software overhead is a lot lower, while at the same time wasting no disk time. What is VM doing efficiently that Unix does inefficiently? Don Speck sp...@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
Path: utzoo!attcan!uunet!yale!husc6!bu-cs!bzs From: b...@bu-cs.BU.EDU (Barry Shein) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <23288@bu-cs.BU.EDU> Date: 13 Jun 88 15:56:30 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <7331@swan.ulowell.edu> <2282@rpp386.UUCP> <6926@cit-vax.Caltech.Edu> Organization: Boston U. Comp. Sci. Lines: 78 In-reply-to: mangler@cit-vax.Caltech.Edu's message of 13 Jun 88 08:58:03 GMT >How do they get that kind of throughput? I refuse to believe that it's >all hardware. Mainframe disks rotate at 3600 RPM like everybody else's >and their 3 MB/s transfer rate is only slightly higher than a SuperEagle. >A 2-MIPS CPU would be inadequate to run a BSD filesystem at those speeds, >so obviously their software overhead is a lot lower, while at the same >time wasting no disk time. What is VM doing efficiently that Unix does >inefficiently? > >Don Speck sp...@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck I think a lot of it *is* hardware. I know the big mainframes better than the small ones. I/O devices are attached indirectly thru channel controllers. Channels have their own paths to/from memory (that's critical, multiple DMAs simultaneously.) Also, channels are intelligent, I remember people saying the channels for the 370/168 had roughly the same computing power as the 370/158 (ie. one model down, sort of like saying that Sun3/280's use Sun3/180's as disk controllers, actually the compute power is very similar in that comparison.) Channels execute channel commands directly out of memory, sort of linked list structs in C lingo, with commands, offsets etc embedded in them (this has become more common in the mini market also, the UDA is similar tho I don't know if it's quite as general.) Channels can also do things like search disks for particular keys, hi/lo/equal, without involving the central processor. I don't know how much this is used in the various filesystems, obviously a general data base thing. The channels themselves aren't all that fast, around 3MB/sec, but 16 of them pumping simultaneously to/from different blocks of memory can certainly make it feel fast. I heard IBM recently announced a new addition to the 3381 disk series (these are multi-GB disks) with 256MB (1/4 GB) of cache in the disk. Rich or poor it's better to be rich. The file systems tend to be much simpler (they avoid indirection at the lower levels), at least in OS, which I'm sure contributes to the performance, I/O is very asynchronous from a software perspective so starting multiple I/Os is a natural way to program and sit back waiting for completions. Note that RMS in VMS tries to mimic this kind of architecture, but no one ever accused a Vax of having fast I/O. A lot of what we would consider application code is in the OS I/O code, known as "access methods", so reading various file formats (zillions, actually, VSAM, ISAM, BDAM, BSAM...) and I/O disciplines (VTAM etc) can be optimized at the "kernel" level (there's also microcode assist on various machines for various operations), it also tends to push applications programmers towards "being kind" to the OS, things like pre-allocation of resources is pretty much enforced so a lot of the dynamic resource management is just not done during execution. There is little doubt that to get a lot of this speedup on Unix systems you'd have to give up niceties like tree'd directories, extending files whenever you feel like, dynamic file opening during run-time (OS tends to do deadlock avoidance rather than detection or recovery so it needs to know what files you plan to use before your jobs starts, that explains a *lot* of what JCL is all about, pre-allocation of resources), etc. You probably wouldn't like it, it would look just like MVS :-) You'd also have to give up what we call "terminals" in most cases, IBM terminals (327x's) on big systems are much more like disks, half-duplex, fill in a screen locally and then blast entire screens to/from memory in one block I/O operation, no per-char I/O. Emacs would die. It helps, especially when you have a lot of terminals. I read about an IBM transaction system with 15,000 terminals logged in, I said a lot of terminals. But don't underestimate raw, frothing, manic hardware. It's a big trade-off, large IBM mainframes are to I/O what Crays are to floating point, but you really have to have the problem to want the cure, for most folks it's unnecessary, MasterCard etc excepted. -Barry Shein, Boston University
Path: utzoo!attcan!uunet!husc6!uwvax!rutgers!bellcore!faline!thumper! ulysses!andante!alice!dmr From: d...@alice.UUCP Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <7980@alice.UUCP> Date: 14 Jun 88 04:21:17 GMT Organization: AT&T Bell Laboratories, Murray Hill NJ Lines: 35 After decribing a lot of the grot you have to go through to get 3MB/s out of an MVS system, Barry Shein wrote, > But don't underestimate raw, frothing, manic hardware. > It's a big trade-off, large IBM mainframes are to I/O what Crays are > to floating point... Crays are better at I/O, too. For example, I made a 9947252-byte file by catting 4 copies of the dictionary and read it: 3K$ time dd bs=172032 </tmp/big >/dev/null 57+1 blocks in 57+1 blocks out seconds elapsed 1.251356 user 0.000639 sys 0.300725 which is a cool 8MB/s read from an ordinary Unix file in competition with other processes on the machine. (OK, I gave it a big buffer.) The big guys would complain that they didn't get the full 10 or 12 MB/s that the disks give. They would really be annoyed that I could get only 50 MB/s when I read the file from the SSD, which runs at 1000MB/s, but to get it to go at full speed you need to resort to non-standard Unix things. The disk format on Unicos (Cray's version of SVr2) is an extent-based scheme supporting the full Unix semantics except that they don't handle files with holes (that is, the holes get filled in). In an early version, a naive allocation algorithm sometimes created files ungrowable past a certain point, but I think they've worked on the problem since then. Dennis Ritchie
Path: utzoo!attcan!uunet!husc6!bu-cs!bzs From: b...@bu-cs.BU.EDU (Barry Shein) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID: <23326@bu-cs.BU.EDU> Date: 14 Jun 88 16:39:38 GMT References: <7980@alice.UUCP> Organization: Boston U. Comp. Sci. Lines: 23 In-reply-to: dmr@alice.UUCP's message of 14 Jun 88 04:21:17 GMT Dennis Ritchie points out that his Cray observes disk I/O speeds that compare favorably to those claimed for large IBM mainframes, thus in contrast to my claim Crays may indeed be the "Crays" of I/O. I think the proper question is sort/merging a disk farm and doing 1000 transactions/sec or more while keeping 8 or 12 tapes turning at or near their rated 200 ips, not pushing bits thru a single channel (if we're talking Crays then we're talking 3090's.) If the Cray can keep pumping the I/O under those conditions (typical job stream for a JC Penney's or Mastercard) then we all better short IBM. Software or price would be no object if the Cray could do it better (and more reliably, I guess that *is* an issue, but let's skip that for now.) Then again, who knows? Old beliefs die hard, far be it for me to defend the Itsy Bitsy Machine company. Mayhaps the Amdahl crew can provide some appropriate viciousness at this point :-) Oh, please do! -Barry Shein, Boston University
Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur! ucbvax!bloom-beacon!oberon!cit-vax!mangler From: mang...@cit-vax.Caltech.Edu (Don Speck) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Keywords: readahead, striping, file mapping Message-ID: <6963@cit-vax.Caltech.Edu> Date: 16 Jun 88 06:32:08 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <23288@bu-cs.BU.EDU> <7980@alice.UUCP> <23326@bu-cs.BU.EDU> Organization: California Institute of Technology Lines: 71 In article <23...@bu-cs.BU.EDU>, b...@bu-cs.BU.EDU (Barry Shein) writes: > I think the proper question is sort/merging a disk farm and doing 1000 > transactions/sec or more while keeping 8 or 12 tapes turning at or > near their rated 200 ips, not pushing bits thru a single channel The hard part of this is getting enough disk throughput to feed even one of those 200-ips tape drives. The rest is replication. Channels sound like essentially moving the disk driver into an I/O processor, with lists of channel control blocks being analogous to lists of struct buf's. This makes it feasible to do more optimizations, even real-time stuff like scatter-gather, chaining, and rotational scheduling. Barry mentions the UDA-50 as being similar. But its processor is an 8085, and DMA speed is only 0.8 MB/s, making it much slower than a dumb controller. And the driver ends up spending as much time constructing the channel control blocks as it would spend tending a dumb controller like the Emulex SC7003. The Xylogics 450, Xylogics 472, and DEC TS11 are like this too. I find them all disappointingly slow. I suspect the real reason for channel processors is to reduce interrupts, which are so costly on big CPU's. It makes sense for terminals; people have made I/O processors that talk to Unix in clists (KMC-11's, etc) which cuts the total interrupt rate by a large fraction. But I don't think it's necessary, or necessarily desirable, to inflict this on disks & tapes, and certainly not unless the channel processor can talk in struct buf's. For all the optimizations that these I/O processors are supposed to do, Unix rarely gives them the chance. Unless there's more than two requests outstanding at once, once they finish one, there's only one request to choose from. Unix has minimal readahead, so that's as many requests as a single process can generate. Raw I/O is even worse. Asynchronous reads would be the obvious way to get enough requests in the queue to optimize, but that seems unlikely to happen. Rather, explicit read commands are giving way to memory-mapped files (in Mach and SunOS 4.0) where readahead becomes synonymous with prepaging. It remains to be seen whether much attention is put into this. Barry credits the asynchronous nature of I/O on mainframe OS's to the access methods, like RMS on VMS. People avoid those when they want speed (imagine using dbm to do sequential reads). For instance, the VMS "copy" command bypasses RMS when copying disk-to-disk, with the curious result that it's faster to copy to a disk than to the null device, because the null device is record-oriented, requiring RMS. As DMR demonstrates, parallel-transfer disks are great for big files. They're horrendously expensive though, and it's hard enough to find controllers that keep up with even 3 MB/s, much less 10 MB/s. But they can be simulated with ordinary disks by striping across multiple controllers, *if* the disks rotate as one. Does anyone know of a cost- effective disk that can phase-lock its spindle motor to that of a second disk, or perhaps with the AC line? With direct-drive electronically- controlled motors becoming common, this should be possible. The Eagle has such a motor, but no provision for external sync. I recall stories of Cray's using phase-locked disks to advantage. Of course, to get the most from high transfer rates, you need large blocksizes; DMR's example looked like about one revolution. Hence the extent-based file allocation of mainframe OS's, etc. Perhaps it's time to pester Berkeley to double MAXBSIZE to 16384 bytes? It would use 0.3% of memory for additional kernel page tables on a VAX, but proportionately less on machines with larger page sizes. 8192 is practically the *minimum* blocksize on Suns, these days. The one point that nobody mentioned is that you don't want the CPU copying the data around between kernel and user address spaces when there's a lot! (Maybe it was just too obvious). Don Speck sp...@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
Path: utzoo!attcan!uunet!husc6!uwvax!umn-d-ub!umn-cs!bungia!mn-at1!alan From: a...@mn-at1.k.mn.org (Alan Klietz) Newsgroups: comp.unix.wizards Subject: Why UNIX I/O is so slow (was VAX vs SUN 4 performance) Keywords: readahead, striping, file mapping Message-ID: <441@mn-at1.k.mn.org> Date: 17 Jun 88 19:16:32 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <23288@bu-cs.BU.EDU> <7980@alice.UUCP> <23326@bu-cs.BU.EDU> <6963@cit-vax.Caltech.Edu> Reply-To: a...@mn-at1.UUCP (0000-Alan Klietz) Organization: Minnesota Supercomputer Center Lines: 125 In article <6...@cit-vax.Caltech.Edu> mang...@cit-vax.Caltech.Edu (Don Speck) writes: <In article <23...@bu-cs.BU.EDU>, b...@bu-cs.BU.EDU (Barry Shein) writes: [why UNIX I/O is so slow compared to big mainframe OS] A useful model is to partition the time spent by every I/O request into fixed and variable length portions. tf is the fixed overhead to reset the interface hardware, queue the I/O request, wait for the data to rotate under the head (for networks, the time to process all of the headers), etc. td is the marginal cost transferring a unit of data (byte, block, whatever). The total I/O utilization of a channel in this case is characterized by n td D = ------------ tf + n td for n units of data. The lim D = 1.0. n->inf td is typically very small (microsecs), tf is typically orders of magnitude higher (millisecs). The curve usually has a knee; UNIX I/O is often on the left side of the knee while most mainframe OS's are on the right side. <For all the optimizations that these I/O processors are supposed to do, <Unix rarely gives them the chance. Unless there's more than two requests <outstanding at once, once they finish one, there's only one request to <choose from. Unix has minimal readahead, so that's as many requests as <a single process can generate. Raw I/O is even worse. Yep, Unix needs to do larger I/O transfers. Case in point: the Cray-2 has a 16 Gbyte/sec I/O throughput capability with incredibly expensive 80+ Mbit/s parallel-head disks (often stripped). And yet, typing cp bigfile bigfile2 measures a transfer performance of only 18 Mbit/s, because BUFSIZ is 4K. <Asynchronous reads would be the obvious way to get enough requests in <the queue to optimize, but that seems unlikely to happen. Rather, <explicit read commands are giving way to memory-mapped files (in Mach <and SunOS 4.0) where readahead becomes synonymous with prepaging. It <remains to be seen whether much attention is put into this. There have been comments that SunOs 4.0 I/O overhead is 2 or 3 times greater than under 3.0. Demand paged I/O introduces all of the Turing divination problems of trying to predict which pages (I/O blocks) the program will use next. IMHO, this is a step backward. <Barry credits the asynchronous nature of I/O on mainframe OS's to the <access methods, like RMS on VMS. People avoid those when they want <speed (imagine using dbm to do sequential reads). For instance, the <VMS "copy" command bypasses RMS when copying disk-to-disk, with the <curious result that it's faster to copy to a disk than to the null <device, because the null device is record-oriented, requiring RMS. RMS systems developed through evolution ("survival of the fastest?") to their current state of being I/O marvels. Hence MVS preallocation requirements, VMS, asynch channel I/O, etc. <As DMR demonstrates, parallel-transfer disks are great for big files. <They're horrendously expensive though, and it's hard enough to find <controllers that keep up with even 3 MB/s, much less 10 MB/s. Disk prices are dropping fast. 8" 1 Gb dual-head disks (6 MB/s) will be common in about a year for $5000-$9000 qty 1. The ANSI X3T9 IPI (Intelligent Peripheral Interface) is now a full standard. It starts at 10 Mb/s and goes up to 25 Mb/s in the current configurations. N.B. the vendors pushing this standard are: IBM, CDC, Unisys, Fujitsu, NEC, Hitachi, (big mainframe manufacturers). Unix in its current incarnation is unable to take advantage of this new disk technology. <they can be simulated with ordinary disks by striping across multiple <controllers, *if* the disks rotate as one. Does anyone know of a cost- <effective disk that can phase-lock its spindle motor to that of a second <disk, or perhaps with the AC line? With direct-drive electronically- <controlled motors becoming common, this should be possible. The Eagle <has such a motor, but no provision for external sync. I recall stories <of Cray's using phase-locked disks to advantage. The thesis in my paper "Turbo NFS" (*) shows how you can get good I/O performance without phase-locked disks by reorganizing the file system contiguously. Cylinders of data are prefetched from selected disks at a rate commensurate with the rate of which the data is consumed by the program. Extents are allocated contiguously by powers of 2. The organization is called a "fractal file system". Phillip Koch did the original work in this area (**). <Of course, to get the most from high transfer rates, you need large <blocksizes; DMR's example looked like about one revolution. Hence <the extent-based file allocation of mainframe OS's, etc. Perhaps <it's time to pester Berkeley to double MAXBSIZE to 16384 bytes? Berkeley should start over. The whole business with "cylinder groups" tries to keep sets of blocks relatively near each other. With the new disks today, the average SEEK TIME IS OFTEN FASTER THAN THE ROTATIONAL DELAY. You don't want to keep blocks "near" each other, instead you want to make each extent as large as possible. Sorry, but cylinder groups are archaic. <The one point that nobody mentioned is that you don't want the CPU <copying the data around between kernel and user address spaces when <there's a lot! (Maybe it was just too obvious). Here is an area where paged I/O has an advantage. The first UNIX vendor to do contiguous file systems + paged I/O + prefetching will win big in the disk I/O race. <Don Speck sp...@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck (*) "Turbo NFS: Fast Shared Access for Cray Disk Storage", A. Klietz (MN Supercomputer Center) Proceedings of the Cray User Group, Spring 1988. (**) "Disk File Allocation Based on the Buddy System", P. D. L. Koch (Dartmouth) ACM TOCS, Vol 5, No 3, November 1987. -- Alan Klietz Minnesota Supercomputer Center (*) 1200 Washington Avenue South Minneapolis, MN 55415 UUCP: a...@mn-at1.k.mn.org Ph: +1 612 626 1836 ARPA: a...@uc.msc.umn.edu (was umn-rei-uc.arpa) (*) An affiliate of the University of Minnesota
Path: utzoo!attcan!uunet!husc6!cmcl2!brl-adm!brl-smoke!gwyn From: g...@brl-smoke.ARPA (Doug Gwyn ) Newsgroups: comp.unix.wizards Subject: Re: Why UNIX I/O is so slow (was VAX vs SUN 4 performance) Message-ID: <8124@brl-smoke.ARPA> Date: 18 Jun 88 02:22:43 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <23288@bu-cs.BU.EDU> <7980@alice.UUCP> <23326@bu-cs.BU.EDU> <6963@cit-vax.Caltech.Edu> <441@mn-at1.k.mn.org> Reply-To: g...@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) Organization: Ballistic Research Lab (BRL), APG, MD. Lines: 11 In article <4...@mn-at1.k.mn.org> a...@mn-at1.UUCP (0000-Alan Klietz) writes: -Berkeley should start over. The whole business with "cylinder groups" -tries to keep sets of blocks relatively near each other. With the new -disks today, the average SEEK TIME IS OFTEN FASTER THAN THE ROTATIONAL -DELAY. You don't want to keep blocks "near" each other, instead you want -to make each extent as large as possible. Sorry, but cylinder groups are -archaic. Such considerations should lead to the conclusion that each type of filesystem may need its own access algorithms (perhaps in an I/O processor). This is easy to arrange via the File System Switch.
Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!ncar!noao!arizona!lm From: l...@arizona.edu (Larry McVoy) Newsgroups: comp.unix.wizards Subject: Re: Why UNIX I/O is so slow (was VAX vs SUN 4 performance) Keywords: actually FSS vs VNODE Message-ID: <6032@megaron.arizona.edu> Date: 29 Jun 88 01:12:28 GMT References: <22957@bu-cs.BU.EDU> <14968@brl-adm.ARPA> <601@modular.UUCP> <23288@bu-cs.BU.EDU> <7980@alice.UUCP> <23326@bu-cs.BU.EDU> <6963@cit-vax.Caltech.Edu> <441@mn-at1.k.mn.org> <8124@brl-smoke.ARPA> Reply-To: l...@megaron.arizona.edu (Larry McVoy) Organization: U of Arizona CS Dept, Tucson Lines: 9 In article <8...@brl-smoke.ARPA> g...@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >Such considerations should lead to the conclusion that each type of >filesystem may need its own access algorithms (perhaps in an I/O >processor). This is easy to arrange via the File System Switch. Do the wizards have a preference (based on logic, not religion, one presumes) between the file system switch and the vnode method of virtualizing file systems? Anyone looked into both? -- Larry McVoy laidbak...@sun.com 1-800-LAI-UNIX x286