Newsgroups: comp.arch.storage Path: sparky!uunet!zaphod.mps.ohio-state.edu!rpi!batcomputer!cornell!uw-beaver! ubc-cs!newsserver.sfu.ca!sfu.ca!vanepp From: van...@fraser.sfu.ca (Peter Van Epp) Subject: As promised, mainframe I/O (or why we have an Auspex NS5000 -:)) Message-ID: <vanepp.704170942@sfu.ca> Summary: Unix boxes can't keep up with mainframe I/O rates. Keywords: mainframe, Auspex, NS5000 Sender: ne...@sfu.ca Organization: Simon Fraser University, Burnaby, B.C., Canada Date: Sat, 25 Apr 1992 03:02:22 GMT Lines: 208 When I answered the query about dram and sram, I promised to use that to discuss mainframe I/O (and so you see how that applies to this group, why we bought an Auspex NS5000). First off, bit of background: SFU (Simon Fraser University, where I work) was up until January of this year a Mainframe shop. We ran MTS (the Michigan Terminal System) on an IBM 3081GX with 3 channels into 2 control units running around 15 gigs of 3380 Disk (the IBMese will become clear later!). In February of 91, we were told to be on Unix and off of MTS by Dec 31 of 1992, this post is mostly about my impressions as a mainframe bigot of talking to the various Unix vendors about I/O. I accepted long ago that Unix boxes have more CPU power than our poor old 11 MIP 3081, several years ago we bought some Silicon Graphics machines for the 6 or 8 users that had totally CPU bound problems (almost no I/O), the move to the SGI's bought them an 8 times increase in throughput (and I expect the happiness at this accelerated the rush to Unix -:)). However as I noted, they don't do any appreciable I/O, the rest of the users still on the 3081 were as often limited by delay in I/O to disk as lack of CPU power and that is what concerned me about a move to Unix workstations. As promised, to see why I had concern, you need to know something about the I/O subsystem on a mainframe (and as a test of whether you read the dram article, I'll sprinkle some drams and srams into this too -:)). In a 3081 (As a concrete example), there are three logically independent units (I doubt this is really true under the covers but bear with me), the instruction unit (the "CPU" if you like), main memory, and the I/O unit (called a storage director). Main memory is the meat in an electronic sandwich, both the CPU and the Storage director beat hell out of poor old main memory, the CPU demanding instructions and data, and the storage director demanding streams of sequential bytes for I/O to the channels. The CPU is just like any unix CPU, it has a huge internal cache to try and keep main memory accesses down but when it wants to flush the cache, it of course wants a high bandwidth stream of bytes to avoid stalling the CPU, not different than a RISC box, except the microcycle (ie. clock) is down in the 10 to 15 nsec range or 60 to 100 Mhz, and therefore that much more hungry for main memory bandwidth. The interesting part of this is the storage director and the channels that attach to it, lets look at a channel and the I/O devices that attach too it first: a channel is quite a bit like an industrial strength SCSI channel (I wouldn't be suprised to find out that the channel inspired SCSI), industrial strength because a channel connection consists of two 2" diameter cables filled with 10 or 12 coax wires each with a maximum length of 200 feet for the original version (at a 1 megabyte per second transfer rate) or 400 feet and 3 to 6 megabyte per second transfer rate for a newer data streaming channel. When you hear an IBM CE talking about "pulling a channel cable" he or she means just that, yanking on a 50 to couple of hundred feet of stiff, heavy 2 inch thick cable. The data path in a channel (in the "Buss cable") is 8 bits parallel and as noted coax. The control signals (also around 8 as I recall) live in the "Tag" cable. Logically a channel looks like this: since there are 8 data lines, you can have up to 256 devices on the channel (where a device can be a control Unit, one spindle of a disk drive, or a tape drive). Any device like a disk or a tape needs a control unit to attach to (it may be built in on some units, but its always there). This control unit is what the channel cables terminate in and talk to, and it works like this: when an I/O command is started at the mainframe, you specify an address (1 of the 256 possible in the 8 bit data path) and send a pulse down the select out tag line. Each control unit on the channel when it sees select out, compares it's hard wired address with the data on the Buss data lines, if it matches it eats the select out signal and decides the Storage director is talking to it. If the address doesn't match, then it sends select out to the next control unit down the line, the last control unit has a terminator (just like SCSI again!) that loops select out to select in and it then fires back up the cable to the storage director. If select in pulses some time after select out and the address have been propigated (the number of nanoseconds at the speed of light in the cable at the max 400 foot difference plus some slop!) then the addressed control unit isn't on this channel (or is offline or broken) and an error is signaled. If select in stays quiet, then the addressed control unit has been found and the storage director can begin sending data down the channel. In the older slower channel (1 megabyte per second) each data byte sent to the control unit gets acked by the control unit, in the newer data streaming channels, the speed gets doubled by skipping the ack, the director fires bytes down the channel at full speed and the control unit will return an error if something goes wrong (there are parity lines everywhere and probably more error checking besides!), so far so good, same performance as a SCSI channel except you can move it further away. Lets now wander back to the storage director on the other end of these channel cables, if there was only one channel per machine, no problem, but the 3081 (3 generations old at this point) has 16 of these channels off of its storage director (and now the comments about 3 channels to disk through two control units at the start begin to make sense), current machines can have 256 to 512 channels. This poor old 3081, had 2 more channels going to a 32 Meg solid state drum for paging (only 32 megs of main memory in the machine), another 2 channels going out to 3420 (round type 1/4" tapes) and two channels out to 3480 (square very fast tapes!) as well as the disk channels. The important point about all of this is that in theory (and at least a fair way in practice) all of these channels could be transferring data at the same time! This means that we could be reading data from 3 different disk drives (at close to 3 megabytes/ sec), two areas of the solid state drum also at 3 megs a second and 2 full speed data streams from 3480 tapes (an unusually high load, but I expect the machine would support it). All of this interesting activity is bothering the CPU not at all (other than arguing for main memory access bandwidth!) because the channel director is a simple minded cpu in its own right. It is controlled by a channel program, which is a data block in main memory that is essentially a one byte command (read, write, seek, ioctl etc) to be sent to the control unit, a couple of words of status and flags, an address in main memory where the data can be found, and a count of how much data is in this buffer. In itself, again not much, but some of the flags can be used to "chain" channel programs so the storage director can execute the first one, then without bothering the CPU, branch to the next one and process it as well, on until you run out of memory -:). This whole thing is started by an instruction (typically the Start I/O or SIO instruction) that takes a control unit address as the argument and a pointer to the channel program and boots the storage director into action. Again very much like a DMA transfer in a Unix cpu, except that it can do a lot more than 1 transfer at a time, and can do it at full speed. Lets do some simple math (cause thats the only kind I can do -:)), 16 channels at 3 megabytes/sec gets us a maximum I/O rate of 48 megabytes/sec. If we consider only the 3 disk channels we are talking an I/O rate of 9 mbytes/sec. Now lets go back and recall the dram specs, while the access time was right down there at 80 nsec or so the cycle time (the number of interest here was up at 200 nsec or so (rounded for easy calculating -:)), figuring on a 32 bit wide word, then I get 4 bytes every 200 nsec or 20 megabytes /sec (and dram speeds weren't this good in the 3081s day, it was new around 10 years ago and the cycle time was probably closer to 400nsec but we'll ignore that). Only my disk I/O has now eaten 50% of the availible main memory bandwidth, and full I/O would eat some what more than 200% of availible bandwidth. Remember that there is an instruction hungry CPU also wanting access to this same main memory bandwidth and you see we have a problem. This particular problem has a solution (it just isn't a cheap solution -:)), the solution is to increase the bandwidth of the channel into main memory, this is done in three ways, 1) increase the width of the bus from 32 bits to 128 or 256 bits, I then get 16 or 32 bytes per 200nsec dram cycle, 2) Change from dram to sram (as we remember sram is 1/4 the number of bits of dram and twice the cost but fast!), in fact to 50 nsec access (and more importantly 50nsec cycle time too!) 3) run memory interleaving. Memory interleaving occurs when more than one RAM chip is addressed at a time, and example is probably easiest: if I have a 1 meg * 8 memory made of 1 meg by 1 bit drams, then I get 1 byte every 200 nsec, if I replace this with 256K 50nsec srams (but in a 32k * 8bit orginization instead of 256k by 1 bit, which is a standard option) then I need 32 sram chips to replace the 8 dram chips, if I choose to set up the addressing such that the bottom 5 bits of the address select one of the 32 chips (which gives me one byte) then I haven't changed the logical structure of memory, but something else magic has happened, in the same 50 nsec that I fetched 1 byte from the first ram chip, I also fetched 31 other bytes in the other 31 ram chips (in the same 50 nsec), since I can enable the outputs in 3 or 4 nsec I have effectivly multiplied my memory bandwidth by 32, so lets do the math: With the dram I get 1 byte every 200 sec or 5 mbytes /sec with interleaved sram I get 32 bytes every 50 nsec or 64 mbytes/sec (and of course a much, much larger cost per mbyte of memory!). if we now multiply these by 4 (for the 32 bit word) we get our 20m/sec number for dram again, but the sram solution is doing 256mbytes/sec and my max I/O rate is becoming less of an impact on the availible main memory bandwidth. When we increase the bus width to 128 or 256 bits dram looks better, but sram still leads! I probably don't have to tell you that interleaved sram and wide data busses are the mainframe solution (and as I recall 8 megs of 3081 memory used to be $100,000 too -:)). This rather long winded diatribe brings us back to my position, I have a disk I/O rate running (worst case) about 9 mbytes/sec, I need and NFS file server that can do such a rate. Now the fun begins, I discovered that most Unix box vendors (or at least those we talked to -:)) sell cpus by the mip and the specmark, I/O is never a problem (nor mentioned!), but when you squeeze a little bit, some interesting numbers pop out, VME bus speed is 17 Mbytes/sec and all of the disk controllers are fighting over that bandwidth, the CPU is also talking to the Ethernet cards over that same VME bus and hopefully trying to drive the 9 mbytes/sec over that same 17 mbyte/sec VME bus as the disk controllers to say nothing of fighting for CPU time to process disk interrupts, ethernet interrupts DMA interrupts, and process NFS requests all aiming for the same simm dram main memory. Interesting questions like "is main memory interleaved and if so how many ways" typically got a blank look and "interleaved? whats interleaved?", is main memory on a VME board (and therefore fighting with everything else for bandwidth), got the same blank look. One major vendor produced performance numbers that suggested that we could replace our 3081 with 200 users beating it to death with a single Unix server and agreed (when pushed-:)) that indeed this wouldn't work, but couldn't say why it wouldn't work (nor justify their numbers). Then we hit Auspex (who are undoubtably used to shell shocked mainframe types cominmg from the other vendors -:)), who promptly admitted that compared to mainframe I/O rates the NS5000 isn't all that fast but its faster than the Unix competition for the following reasons (if you get a chance to get the sales pitch, go, even if you aren't planning on buying one, see what a sales pitch should be -:)): Each of the up to 8 ethernet ports in the machine has its own CPU, it processed the TCP stream right up to the decoded NFS rrequest, it then passes the request to another dedicated cpu that implememnts the file system processing, checks to see if the block is in the 16 to 64 megabyte cache and if so passes the address of the block back to the Ethernet processor for it to DMA out of cache and send down the wire. If there is a cache miss, then find a cache block and figure out where on disk the block is, pass the disk address and the cache address to the correct SCSI controller (1 for every 2 disks since that is what maximally loads the SCSI channel without performance loss) to again do its thing on its own into the cache to be passed back to the ethernet controller and sent. The modified VME bus that this all takes place across is does 55 mbytes/sec not 17 ... This presentation, and the several reference customers we called being happy with the device (and to some extent the apparent addicting effect of NS5000s, single installations were at that time at least uncommon, they tended to clump up, a company with 4 NS5000s here another with 6 NS5000s there ...) convinced us that this was a reasonable alternative. As I told my boss, its possible that the brand x server will do the job, but I sure couldn't prove it by the numbers I saw, and if the numbers that I am familiar with don't make sense to me, how can I believe what they tell me about NFS that I know very little about?. I still don't know if brand x would have done but I don't care, because I or we, convinced the bosses to buy an NS5000 we haven't had any trouble at all and our one lone Unix expert says its a lot less trouble than the multiple cross mounted NFS servers in the Compuitng Science department where he used to work. I expect as CPU clock speeds keep climbing on the risc chips that the manufacturers are going to find that doubling the density of drams without doing anything to the speed or cycle times of drams isn't going to cut it and the cost of doing proper main memory implementations that can actually use that cpu speed drive the cost up towards the price of those mainframes everyone is trying to throw away -:). Peter Van Epp / van...@sfu.ca #include <std.disclaimer> Simon Fraser University, Burnaby, B.C. Canada
Path: sparky!uunet!think.com!rpi!usenet.coe.montana.edu!ogicse!sequent! muncher.sequent.com!jjb From: j...@sequent.com (Jeff Berkowitz) Newsgroups: comp.arch.storage Subject: Re: As promised, mainframe I/O (or why we have an Auspex NS5000 -:)) Keywords: mainframe, Auspex, NS5000 Message-ID: <1992Apr26.194811.18177@sequent.com> Date: 26 Apr 92 19:48:11 GMT Article-I.D.: sequent.1992Apr26.194811.18177 References: <vanepp.704170942@sfu.ca> Sender: use...@sequent.com (usenet ) Organization: Sequent Computer Systems Inc. Lines: 144 Nntp-Posting-Host: eng3.sequent.com Since the original article was quite long, I've extensively trimmed the quotes. There might not be enough left to really understand the original article. First, the technical issues: In article <vanepp.7...@sfu.ca>, van...@fraser.sfu.ca (Peter Van Epp) writes: >I promised to...discuss mainframe I/O [...] As promised, to see >why I had concern, you need to know something about the I/O subsystem >on a mainframe... >In a 3081 (As a concrete example), there are three logically independent >units....the instruction unit (the "CPU" if you like), main memory, and >the I/O unit (called a storage director)...The interesting part of this is the storage director and the channels that attach to it...[discussion >about IBM channels elided here - jjb] As you observe, a SCSI-2 channel is quite similar, "logically", to a traditional IBM channel. This observation becomes important below. >same performance as a SCSI channel except you can move it further away. In fact, SCSI-2 compares favorably to the [outdated] IBM channel technology you discuss; it's faster, and the high production volumes make it almost infinitely less expensive (although limited to 25 meters.) Current IBM channel offerings are much more advanced (higher speeds, provision for optical interconnects carrying channel speeds over kilomenter distances, etc.) Of course, this sophistication comes with a price tag. The ANSI FibreChannel standard is one effort that may (will) make this type of technology available in open systems environments. >... current machines can have 256 to 512 channels. Here is where it begins to get interesting: * With current technology, there is no conceivable reason why anybody would need this many channels. * Justifying this statement requires understanding the so-called "RMS miss" problem. In the old days when memory was expensive, disks were unbuffered (contained no internal memory.) You sent the disk a command to seek to a sector; it did so, and then asserted a signal saying, effectively, "destination sector about to pass beneath read head! Request use of IO channel!". If the channel is busy at that moment, the disk can't transfer the sector(s) of data. The data passes inexorably beneath the head, and the transfer can't take place until a whole "spin" of the platter - 16+ MILLISECONDS with 3600 RPM disk technology! - brings the data back under the head. If you model channels like this, you'll find that the contention behavior of [even] a small number of disks is truly atrocious. THIS is why they needed so many channels: to avoid these missed spins ("RMS misses") caused by inabilty to get the channel at the critical moment, given an unbuffered disk device. TODAY we apply the solution that the technology allows: we put memory in the disk drive. This means that when the data physically passes beneath the head, it's sucked into a buffer in the drive. The drive then requests use of the channel, and when the channel is available the drive "squirts" the data over to the host. This changes the contention characteristics of the channel to those of an ordinary CPU/memory bus, more or less. And there's another advantage: the data "squirt" between the device and the host can run as fast as the drive and host electronics, and the standard interface, want them to run; the transfer isn't limited by the speed at which the bits pass beneath the head. (Spinning the disk faster, as many drive vendors are doing, is still a good idea for a whole variety of reasons, however.) > This rather long winded diatribe brings us back to my position, >I have a disk I/O rate running (worst case) about 9 mbytes/sec, I need >and NFS file server that can do such a rate. Now the fun begins... [Good explanation, elided, of why some workstation servers that weren't architected as efficient IO engines fall down on that task - jjb] >Then we hit Auspex >[...discussion of Auspex NFS file server architecture elided - jjb] >...to the correct SCSI controller (1 for every 2 disks since >that is what maximally loads the SCSI channel without performance loss) This is fallacious as a generalization. It should be possible to get significantly more than two - perhaps more than seven, with "WIDE" - disk devices on a SCSI-2 channel with minimal performance degradation (90% or more of the sum of the expected performances of the individual disk drives.) "But wait", you say, "each disk can stream about 3M/sec of data; so the maximum appears to be 3 + a fraction disks on a 10M SCSI-2 channel." My claim rests on a real world observation: hardly anybody ever streams data off a disk for long periods of time. That is, the critical metric for real systems is not bytes/second, but IO operations/second. For example, with Unix file systems "optimal" performance has been reached if you're getting one file system block per disk spin; with relational database programs, similar considerations apply, depending on the IO architecture of the RDBMS. At 3600 RPM or 60 RPS, with 8K file system blocks, that's less than 600K bytes/second. Of course, over time higher level software changes (contiguous allocation file systems, log structured file systems, new RDBMS architectures, etc.) will invalidate this whole line of reasoning. But (although your milage may vary somewhat) it's essentially true, for now. >[Long discussion of the merits of the Auspex box elided. Note: >lest I should be misunderstood I'm *not* taking issue with this >argument, which seems sound, or with Auspex!] Now, my nontechnical question: You don't say what you're usage model is, but from your mention of MTS I'm guessing it's academic timesharing. My guess is that the workstation vendor's response would be something like this: "why do you need to centralize all the IO in one huge server? Why don't you just buy inexpensive, individually weak servers until you have enough computer/IO/whatever power to satisfy your user community?" What do you say to this? On the other hand, perhaps your usage model is academic database - keeping the grades, generating the schedules, etc. In that case, why are you looking at a workstation network when there are several vendors, including Sequent (no flames please - I believe it's a reasonable objective comment) specializing in building larger "Unix boxes" that don't suffer from the bottlenecks you mention...at least not to the same extent as the IO bound machine you described. >Peter Van Epp / van...@sfu.ca #include <std.disclaimer> >Simon Fraser University, Burnaby, B.C. Canada -- Jeff Berkowitz, Sequent Computer Systems j...@sequent.com uunet!sequent!jjb "Each project starts as an intellectual exercise but ends as an athletic event" - paraphrasing international GM Korchnoi, who was talking about chess matches.
Path: sparky!uunet!zephyr.ens.tek.com!uw-beaver!ubc-cs!newsserver.sfu.ca! sfu.ca!vanepp From: van...@fraser.sfu.ca (Peter Van Epp) Newsgroups: comp.arch.storage Subject: Re: As promised, mainframe I/O (or why we have an Auspex NS5000 -:)) Keywords: mainframe, Auspex, NS5000 Message-ID: <vanepp.704329059@sfu.ca> Date: 26 Apr 92 22:57:39 GMT References: <vanepp.704170942@sfu.ca> <1992Apr26.194811.18177@sequent.com> Sender: ne...@sfu.ca Organization: Simon Fraser University, Burnaby, B.C., Canada Lines: 214 j...@sequent.com (Jeff Berkowitz) writes: >Since the original article was quite long, I've extensively trimmed >the quotes. There might not be enough left to really understand >the original article. But seeing the good points raised in this one may provide motivation to go back and read it -:) >First, the technical issues: >In article <vanepp.7...@sfu.ca>, > van...@fraser.sfu.ca (Peter Van Epp) writes: >The ANSI FibreChannel standard is one effort that may (will) make >this type of technology available in open systems environments. I couldn't agree more, but the I/O problem into conventional dram based memory is still going to be there. >>... current machines can have 256 to 512 channels. >Here is where it begins to get interesting: >* With current technology, there is no conceivable reason why anybody >would need this many channels. * To twist out a bit, with current Mainframe technology I can think of at least one, I used to be a systems programmer on an IBM TPF1 Airline reservation system which drove some 160 spindles of 3350 disk through 12 channels and 12 control units (although only 80 drives are actually unique, the other 80 are a mirror thorough a different channel and control unit of the other one). I have heard that a large US airline got some of the initial 3990(?) IBM caching disk controllers and performance went down! The reason is that that huge array of disks is actually a RAID array, data (there isn't a file system per se) is spread 1 block on this disk one on the next etc. The caching disk controller caches the whole track so TPF would read 1 block, then try and seek to a new track for a new block, and have to wait for the track read to finish (this is about 10th hand and so may not be completly accurate). For MVS and VM, the track buffering scheme of course works just fine! >TODAY we apply the solution that the technology allows: we put >memory in the disk drive. This means that when the data physically >passes beneath the head, it's sucked into a buffer in the drive. >The drive then requests use of the channel, and when the channel >is available the drive "squirts" the data over to the host. And this may be a good reason why Brand X would have indeed done the job and supported the numbers that they were quoting, the problem was that they (unlike you -:)) didn't seem to know why they could do what they claimed to be able to do. Part of our problem was that we couldn't seem to find anybody that had gone from a mainframe into an NFS environment who could say from experience, yes the numbers quoted will work (and neither could the vendors!). >>...to the correct SCSI controller (1 for every 2 disks since >>that is what maximally loads the SCSI channel without performance loss) >This is fallacious as a generalization. It should be possible to get >significantly more than two - perhaps more than seven, with "WIDE" - >disk devices on a SCSI-2 channel with minimal performance degradation >(90% or more of the sum of the expected performances of the individual >disk drives.) I believe (although you'd have to ask the Auspex folks) that this came as a result of testing on their particular box with their particular architecture, and may well be more of a marketing ploy than anything else (in this case a good marketing ploy if it is -:)), and you may well be correct with different hardware. >"But wait", you say, "each disk can stream about 3M/sec of data; so >the maximum appears to be 3 + a fraction disks on a 10M SCSI-2 channel." >My claim rests on a real world observation: hardly anybody ever streams >data off a disk for long periods of time. That is, the critical metric MVS and VM tend to, but I'll agree we are talking Unix here -:). >for real systems is not bytes/second, but IO operations/second. For >example, with Unix file systems "optimal" performance has been reached >if you're getting one file system block per disk spin; with relational >database programs, similar considerations apply, depending on the IO >architecture of the RDBMS. >At 3600 RPM or 60 RPS, with 8K file system blocks, that's less than >600K bytes/second. Of course, over time higher level software changes >(contiguous allocation file systems, log structured file systems, new >RDBMS architectures, etc.) will invalidate this whole line of reasoning. >But (although your milage may vary somewhat) it's essentially true, >for now. True, but in the Auspex case they can also stripe a logical file system across multiple disks and in theory have seeks for various users in the same file system (heading for that big cache) going on more than one disk at the same time. The particular case of an NFS server is I think somewhat different than a standard Unix file system. Hopefully some of the Auspex folks will jump in and correct any of my errors of statement or understanding of what their box does! >Now, my nontechnical question: >You don't say what you're usage model is, but from your mention of MTS >I'm guessing it's academic timesharing. Right in one (I thought the first post might get a little long so I omitted our environment. >My guess is that the workstation vendor's response would be something >like this: "why do you need to centralize all the IO in one huge server? >Why don't you just buy inexpensive, individually weak servers until you >have enough computer/IO/whatever power to satisfy your user community?" >What do you say to this? The simple answer is politics, "Timeshare is bad, Distributed Computing is good", but we want all the good features of central operation (backup, the same environment on any machine, no increase in staff etc.) Before I get taken wrong, the computing center had been moving towards Unix and a distributed environment for 5 or 6 years before this, and in many ways the change over was the only practical way to do it. Our campus has IBM type 9 cable (that will support either token ring or Ethernet) pulled to every telephone jack on campus, running back to some 50 odd wiring closets interconnected by fibre. Installing this and the network equipment in those closets (Cabletron mostly) took a lot of time and more importantly money (they don't let me do anything except tell them what I think we should buy so I don't actually know how much, but it was large). We have some 3500 Macs, PCs, and Unix workstations (and the mainframe in years past) all hooked to this network (and somewhere around a 1000 serial connections that in some cases go to the same place as the high speed ones). We had some 10 to 11 thousand active accounts (that had been used in the last 12 months) on the mainframe. The mainframe would fall over dead, when around 210 users signed on all at once (generally 3 or 4 cpu bound jobs, 10 or 20 SAS and similar package users and the rest doing E-mail and conferencing). In February of 91 we were told that E-mail had to be on Unix by Aug 31, and everything had to be on Unix Dec 31 of 1991. I believe the unstated (because we didn't need to know -:)) reason for this, is that the whole thing was being financed by the money that was supporting the mainframe (no new money) and thats all the time they could afford. The interum step (by Dec 31) was decided to be to replace the mainframe with a unix system of the same capacity, this ended up being 4 Silicon Graphics 4d320's, 3 of them for the researchers aimed at about 40 users each, an the last one as a general login server supporting 80 users for E-mail and News, and 2 Sun 470s for student instructional needs (since the Computing Science department has around 100 or so Suns). All of these machines live in our central machine room (up until a couple of weeks ago in the shadow of our mainframe, big big hole there now -:). It was our decision (well, mostly the only one of us who had any Unix experiences decision -:)) that a single large NFS file server would be better than cross mounting local disks on all those machines, which set us off on the above stated search for a Unix vendor that knew about I/O. Each of the 6 Cpus has a second Ethernet card that connects to the Auspex on one of 4 ethernet ports (one Auspex port goes to the backbone, and one supports a CS lab) so that the NFS traffic is spread across the 4 Ethernets. The last ethernet runs over fibre to a Computing Science undergrad lab of 30 (or more -:)) NeXT stations and some number of Sun servers, allowing the students to use the same Unix ID on all the machines they have access to and to see their home directory from any machine (some 25,000 home directories live on the Auspex, although I suspect that only about 10,000 of them are active). I expect that a single large Unix server (assuming it had the I/O bandwidth) would have worked, but that would have been seen (rightly or wrongly) as one Mainframe replacing the other and therefore not distributed. There are some advantages to the current setup, the researchers are all fighting each other for CPU time on the three research machines, but that fighting (unlike the MTS mainframe) doesn't impact the people using the general login server for Mail and News (where this is being written in fact!). This machine has been seen to support somewhat more than 110 users at once without any performance problems and we don't (yet!) know how many more it would support (there was a limit of 99 logins during some of the peak time). CPU bound jobs (in fact most of anything other than E-mail, NetNews and editing) get stopped and told to use one of the research machines, since E-mail demand is great here, and used to be performance bound on the Mainframe, I expect a majority of the users are happier (they can also elect to read there mail on their Mac or PC via pop and never log in to Unix at all, and many do!). I mentioned backup up there, but this is gettng pretty long again (and somewhat outside the bounds of the charter, but he asked -:)), so maybe I'll post it and see if the rest of you want me to shut up yet -:) before going on to discuss Auspex backup and the terrible performance of Unix Tape drives (to say nothing of a total lack of tape support, make sure you have a VAX around before converting from a mainframe to Unix -;)). >On the other hand, perhaps your usage model is academic database - >keeping the grades, generating the schedules, etc. In that case, >why are you looking at a workstation network when there are several >vendors, including Sequent (no flames please - I believe it's a >reasonable objective comment) specializing in building larger "Unix >boxes" that don't suffer from the bottlenecks you mention...at least >not to the same extent as the IO bound machine you described. The admin stuff runs on a VAX, and has been being converted from running under OS/MVT (i don't believe IBM has supported MVT for at least 15 years, it has been going away "in a year" ever since I got here 4 years ago, MTS came in to replace it, MTS is gone and MVT is still here ...). >-- >Jeff Berkowitz, Sequent Computer Systems j...@sequent.com uunet!sequent!jjb >"Each project starts as an intellectual exercise but ends as an athletic event" > - paraphrasing international GM Korchnoi, who was talking about chess matches. Thanks for taking the time to respond to this, I am still trying to learn how you capacity plan and/or specify things in Unix and NFS (and when it comes AFS, because I think thats the answer for full distribution here) since I expect our current config is probably only a stepping stone down the path. There is also a lot of interest at other sites out there on how do you replace a mainframe with Unix (if you aren't MIT -:)), we held a BOF at the last LISA hoping to find someone with the answers, instead we found a lot of people with the same questions that invited us to tell them how to do it next year -:) >>Peter Van Epp / van...@sfu.ca >>Simon Fraser University, Burnaby, B.C. Canada disclaimer: The above are all personel opinions and may not (probably don't in some cases -:)) reflect the opinions of the university.