As promised, mainframe I/O

Newsgroups: comp.arch.storage
Path: sparky!uunet!zaphod.mps.ohio-state.edu!rpi!batcomputer!cornell!uw-beaver!
ubc-cs!newsserver.sfu.ca!sfu.ca!vanepp
From: van...@fraser.sfu.ca (Peter Van Epp)
Subject: As promised, mainframe I/O (or why we have an Auspex NS5000 -:))
Message-ID: <vanepp.704170942@sfu.ca>
Summary: Unix boxes can't keep up with mainframe I/O rates.
Keywords: mainframe, Auspex, NS5000
Sender: ne...@sfu.ca
Organization: Simon Fraser University, Burnaby, B.C., Canada
Date: Sat, 25 Apr 1992 03:02:22 GMT
Lines: 208

When I answered the query about dram and sram, I promised to use that to 
discuss mainframe I/O (and so you see how that applies to this group,
why we bought an Auspex NS5000). First off, bit of background: SFU 
(Simon Fraser University, where I work) was up until January of this year
a Mainframe shop. We ran MTS (the Michigan Terminal System) on an IBM 3081GX
with 3 channels into 2 control units running around 15 gigs of 3380 Disk
(the IBMese will become clear later!). In February of 91, we were told to 
be on Unix and off of MTS by Dec 31 of 1992, this post is mostly about my
impressions as a mainframe bigot of talking to the various Unix vendors about
I/O. I accepted long ago that Unix boxes have more CPU power than our poor old
11 MIP 3081, several years ago we bought some Silicon Graphics machines for
the 6 or 8 users that had totally CPU bound problems (almost no I/O), the 
move to the SGI's bought them an 8 times increase in throughput (and I expect
the happiness at this accelerated the rush to Unix -:)). However as I noted,
they don't do any appreciable I/O, the rest of the users still on the 3081
were as often limited by delay in I/O to disk as lack of CPU power and that
is what concerned me about a move to Unix workstations. As promised, to see
why I had concern, you need to know something about the I/O subsystem on 
a mainframe (and as a test of whether you read the dram article, I'll
sprinkle some drams and srams into this too -:)). In a 3081 (As a concrete
example), there are three logically independent units (I doubt this is really
true under the covers but bear with me), the instruction unit (the "CPU" if
you like), main memory, and the I/O unit (called a storage director).
Main memory is the meat in an electronic sandwich, both the CPU and the 
Storage director beat hell out of poor old main memory, the CPU demanding
instructions and data, and the storage director demanding streams of 
sequential bytes for I/O to the channels. The CPU is just like any unix
CPU, it has a huge internal cache to try and keep main memory accesses
down but when it wants to flush the cache, it of course wants a high 
bandwidth stream of bytes to avoid stalling the CPU, not different than
a RISC box, except the microcycle (ie. clock) is down in the 10 to 15
nsec range or 60 to 100 Mhz, and therefore that much more hungry for
main memory bandwidth. The interesting part of this is the storage director
and the channels that attach to it, lets look at a channel and the I/O 
devices that attach too it first: a channel is quite a bit like an industrial
strength SCSI channel (I wouldn't be suprised to find out that the channel 
inspired SCSI), industrial strength because a channel connection consists
of two 2" diameter cables filled with 10 or 12 coax wires each with a 
maximum length of 200 feet for the original version (at a 1 megabyte per
second transfer rate) or 400 feet and 3 to 6 megabyte per second transfer
rate for a newer data streaming channel. When you hear an IBM CE talking 
about "pulling a channel cable" he or she means just that, yanking on
a 50 to couple of hundred feet of stiff, heavy 2 inch thick cable. The data
path in a channel (in the "Buss cable") is 8 bits parallel and as noted 
coax. The control signals (also around 8 as I recall) live in the "Tag"
cable. Logically a channel looks like this: since there are 8 data lines,
you can have up to 256 devices on the channel (where a device can be a 
control Unit, one spindle of a disk drive, or a tape drive). Any device
like a disk or a tape needs a control unit to attach to (it may be built
in on some units, but its always there). This control unit is what the
channel cables terminate in and talk to, and it works like this: when 
an I/O command is started at the mainframe, you specify an address (1
of the 256 possible in the 8 bit data path) and send a pulse down the
select out tag line. Each control unit on the channel when it sees 
select out, compares it's hard wired address with the data on the Buss
data lines, if it matches it eats the select out signal and decides the 
Storage director is talking to it. If the address doesn't match, then
it sends select out to the next control unit down the line, the last 
control unit has a terminator (just like SCSI again!) that loops 
select out to select in and it then fires back up the cable to the
storage director. If select in pulses some time after select out and the
address have been propigated (the number of nanoseconds at the speed of
light in the cable at the max 400 foot difference plus some slop!) then
the addressed control unit isn't on this channel (or is offline or broken)
and an error is signaled. If select in stays quiet, then the addressed
control unit has been found and the storage director can begin sending
data down the channel. In the older slower channel (1 megabyte per second)
each data byte sent to the control unit gets acked by the control unit, 
in the newer data streaming channels, the speed gets doubled by skipping the
ack, the director fires bytes down the channel at full speed and the 
control unit will return an error if something goes wrong (there are parity
lines everywhere and probably more error checking besides!), so far so good,
same performance as a SCSI channel except you can move it further away.
Lets now wander back to the storage director on the other end of these 
channel cables, if there was only one channel per machine, no problem, but
the 3081 (3 generations old at this point) has 16 of these channels off of
its storage director (and now the comments about 3 channels to disk through
two control units at the start begin to make sense), current machines can
have 256 to 512 channels. This poor old 3081, had 2 more channels going
to a 32 Meg solid state drum for paging (only 32 megs of main memory in the
machine), another 2 channels going out to 3420 (round type 1/4" tapes) and
two channels out to 3480 (square very fast tapes!) as well as the disk channels.
The important point about all of this is that in theory (and at least a fair
way in practice) all of these channels could be transferring data at the 
same time! This means that we could be reading data from 3 different disk
drives (at close to 3 megabytes/ sec), two areas of the solid state drum
also at 3 megs a second and 2 full speed data streams from 3480 tapes
(an unusually high load, but I expect the machine would support it).
All of this interesting activity is bothering the CPU not at all (other
than arguing for main memory access bandwidth!) because the channel director
is a simple minded cpu in its own right. It is controlled by a channel 
program, which is a data block in main memory that is essentially a one
byte command (read, write, seek, ioctl etc) to be sent to the control
unit, a couple of words of status and flags, an address in main memory 
where the data can be found, and a count of how much data is in this 
buffer. In itself, again not much, but some of the flags can be used to
"chain" channel programs so the storage director can execute the first
one, then without bothering the CPU, branch to the next one and process
it as well, on until you run out of memory -:). This whole thing is 
started by an instruction (typically the Start I/O or SIO instruction)
that takes a control unit address as the argument and a pointer to 
the channel program and boots the storage director into action. Again
very much like a DMA transfer in a Unix cpu, except that it can do a lot
more than 1 transfer at a time, and can do it at full speed. Lets do some
simple math (cause thats the only kind I can do -:)), 16 channels at  
3 megabytes/sec gets us a maximum I/O rate of 48 megabytes/sec. If we
consider only the 3 disk channels we are talking an I/O rate of 9 mbytes/sec.
Now lets go back and recall the dram specs, while the access time was right
down there at 80 nsec or so the cycle time (the number of interest here
was up at 200 nsec or so (rounded for easy calculating -:)), figuring on
a 32 bit wide word, then I get 4 bytes every 200 nsec or 20 megabytes /sec
(and dram speeds weren't this good in the 3081s day, it was new around 10
years ago and the cycle time was probably closer to 400nsec but we'll 
ignore that). Only my disk I/O has now eaten 50% of the availible main
memory bandwidth, and full I/O would eat some what more than 200% of
availible bandwidth. Remember that there is an instruction hungry CPU 
also wanting access to this same main memory bandwidth and you see we
have a problem. This particular problem has a solution (it just isn't
a cheap solution -:)), the solution is to increase the bandwidth of the
channel into main memory, this is done in three ways, 1) increase the
width of the bus from 32 bits to 128 or 256 bits, I then get 16 or 32 bytes
per 200nsec dram cycle, 2) Change from dram to sram (as we remember sram
is 1/4 the number of bits of dram and twice the cost but fast!), in fact
to 50 nsec access (and more importantly 50nsec cycle time too!) 3) run
memory interleaving. Memory interleaving occurs when more than one RAM
chip is addressed at a time, and example is probably easiest: if I have
a 1 meg * 8 memory made of 1 meg by 1 bit drams, then I get 1 byte every
200 nsec, if I replace this with 256K 50nsec srams (but in a 32k * 8bit
orginization instead of 256k by 1 bit, which is a standard option) then 
I need 32 sram chips to replace the 8 dram chips, if I choose to set up
the addressing such that the bottom 5 bits of the address select one of
the 32 chips (which gives me one byte) then I haven't changed the logical
structure of memory, but something else magic has happened, in the 
same 50 nsec that I fetched 1 byte from the first ram chip, I also fetched
31 other bytes in the other 31 ram chips (in the same 50 nsec), since I can
enable the outputs in 3 or 4 nsec I have effectivly multiplied my memory
bandwidth by 32, so lets do the math:                    
With the dram I get 1 byte every 200 sec or 5 mbytes /sec
with interleaved sram I get 32 bytes every 50 nsec or 64 mbytes/sec
(and of course a much, much larger cost per mbyte of memory!).
if we now multiply these by 4 (for the 32 bit word) we get our 20m/sec
number for dram again, but the sram solution is doing 256mbytes/sec 
and my max I/O rate is becoming less of an impact on the availible main 
memory bandwidth. When we increase the bus width to 128 or 256 bits 
dram looks better, but sram still leads! I probably don't have to tell
you that interleaved sram and wide data busses are the mainframe solution
(and as I recall 8 megs of 3081 memory used to be $100,000 too -:)).
	This rather long winded diatribe brings us back to my position,
I have a disk I/O rate running (worst case) about 9 mbytes/sec, I need
and NFS file server that can do such a rate. Now the fun begins, I discovered
that most Unix box vendors (or at least those we talked to -:)) sell cpus
by the mip and the specmark, I/O is never a problem (nor mentioned!), but
when you squeeze a little bit, some interesting numbers pop out, VME bus
speed is 17 Mbytes/sec and all of the disk controllers are fighting over
that bandwidth, the CPU is also talking to the Ethernet cards over that 
same VME bus and hopefully trying to drive the 9 mbytes/sec over that
same 17 mbyte/sec VME bus as the disk controllers to say nothing of 
fighting for CPU time to process disk interrupts, ethernet interrupts
DMA interrupts, and process NFS requests all aiming for the same simm
dram main memory. Interesting questions like "is main memory interleaved
and if so how many ways" typically got a blank look and "interleaved?
whats interleaved?", is main memory on a VME board (and therefore fighting
with everything else for bandwidth), got the same blank look. One major
vendor produced performance numbers that suggested that we could replace
our 3081 with 200 users beating it to death with a single Unix server
and agreed (when pushed-:)) that indeed this wouldn't work, but couldn't
say why it wouldn't work (nor justify their numbers). Then we hit 
Auspex (who are undoubtably used to shell shocked mainframe types cominmg
from the other vendors -:)), who promptly admitted that compared to mainframe
I/O rates the NS5000 isn't all that fast but its faster than the Unix
competition for the following reasons (if you get a chance to get the 
sales pitch, go, even if you aren't planning on buying one, see what a
sales pitch should be -:)): Each of the up to 8 ethernet ports in the 
machine has its own CPU, it processed the TCP stream right up to the 
decoded NFS rrequest, it then passes the request to another dedicated cpu
that implememnts the file system processing, checks to see if the block
is in the 16 to 64 megabyte cache and if so passes the address of the 
block back to the Ethernet processor for it to DMA out of cache and send
down the wire. If there is a cache miss, then find a cache block and 
figure out where on disk the block is, pass the disk address and the 
cache address to the correct SCSI controller (1 for every 2 disks since
that is what maximally loads the SCSI channel without performance loss)
to again do its thing on its own into the cache to be passed back to the
ethernet controller and sent. The modified VME bus that this all takes 
place across is does 55 mbytes/sec not 17 ... This presentation, and
the several reference customers we called being happy with the device
(and to some extent the apparent addicting effect of NS5000s, single
installations were at that time at least uncommon, they tended to clump
up, a company with 4 NS5000s here another with 6 NS5000s there ...) 
convinced us that this was a reasonable alternative. As I told my boss,
its possible that the brand x server will do the job, but I sure couldn't
prove it by the numbers I saw, and if the numbers that I am familiar with
don't make sense to me, how can I believe what they tell me about NFS
that I know very little about?. I still don't know if brand x would have
done but I don't care, because I or we, convinced the bosses to buy an 
NS5000 we haven't had any trouble at all and our one lone Unix expert
says its a lot less trouble than the multiple cross mounted NFS servers
in the Compuitng Science department where he used to work.
	I expect as CPU clock speeds keep climbing on the risc chips that
the manufacturers are going to find that doubling the density of drams 
without doing anything to the speed or cycle times of drams isn't going
to cut it and the cost of doing proper main memory implementations that 
can actually use that cpu speed drive the cost up towards the price of those 
mainframes everyone is trying to throw away -:). 

Peter Van Epp / van...@sfu.ca   #include <std.disclaimer>
Simon Fraser University, Burnaby, B.C. Canada

Path: sparky!uunet!think.com!rpi!usenet.coe.montana.edu!ogicse!sequent!
muncher.sequent.com!jjb
From: j...@sequent.com (Jeff Berkowitz)
Newsgroups: comp.arch.storage
Subject: Re: As promised, mainframe I/O (or why we have an Auspex NS5000 -:))
Keywords: mainframe, Auspex, NS5000
Message-ID: <1992Apr26.194811.18177@sequent.com>
Date: 26 Apr 92 19:48:11 GMT
Article-I.D.: sequent.1992Apr26.194811.18177
References: <vanepp.704170942@sfu.ca>
Sender: use...@sequent.com (usenet )
Organization: Sequent Computer Systems Inc.
Lines: 144
Nntp-Posting-Host: eng3.sequent.com

Since the original article was quite long, I've extensively trimmed
the quotes.    There might not be enough left to really understand
the original article.

First, the technical issues:

In article <vanepp.7...@sfu.ca>,
	van...@fraser.sfu.ca (Peter Van Epp) writes:

>I promised to...discuss mainframe I/O [...] As promised, to see
>why I had concern, you need to know something about the I/O subsystem
>on a mainframe...

>In a 3081 (As a concrete example), there are three logically independent
>units....the instruction unit (the "CPU" if you like), main memory, and
>the I/O unit (called a storage director)...The interesting part of this
is the storage director and the channels that attach to it...[discussion
>about IBM channels elided here - jjb]

As you observe, a SCSI-2 channel is quite similar, "logically",
to a traditional IBM channel.  This observation becomes important
below.

>same performance as a SCSI channel except you can move it further away.

In fact, SCSI-2 compares favorably to the [outdated] IBM channel
technology you discuss; it's faster, and the high production volumes
make it almost infinitely less expensive (although limited to 25 meters.)

Current IBM channel offerings are much more advanced (higher speeds,
provision for optical interconnects carrying channel speeds over
kilomenter distances, etc.)  Of course, this sophistication comes
with a price tag.

The ANSI FibreChannel standard is one effort that may (will) make
this type of technology available in open systems environments.

>... current machines can have 256 to 512 channels.

Here is where it begins to get interesting:

* With current technology, there is no conceivable reason why anybody
would need this many channels. *

Justifying this statement requires understanding the so-called "RMS
miss" problem.  In the old days when memory was expensive, disks
were unbuffered (contained no internal memory.)  You sent the disk
a command to seek to a sector; it did so, and then asserted a signal
saying, effectively, "destination sector about to pass beneath read
head!  Request use of IO channel!".

If the channel is busy at that moment, the disk can't transfer the
sector(s) of data.  The data passes inexorably beneath the head,
and the transfer can't take place until a whole "spin" of the platter -
16+ MILLISECONDS with 3600 RPM disk technology! - brings the data
back under the head.

If you model channels like this, you'll find that the contention
behavior of [even] a small number of disks is truly atrocious.
THIS is why they needed so many channels: to avoid these missed
spins ("RMS misses") caused by inabilty to get the channel at the
critical moment, given an unbuffered disk device.

TODAY we apply the solution that the technology allows: we put
memory in the disk drive.  This means that when the data physically
passes beneath the head, it's sucked into a buffer in the drive.
The drive then requests use of the channel, and when the channel
is available the drive "squirts" the data over to the host.

This changes the contention characteristics of the channel to those
of an ordinary CPU/memory bus, more or less.  And there's another
advantage: the data "squirt" between the device and the host can
run as fast as the drive and host electronics, and the standard
interface, want them to run; the transfer isn't limited by the
speed at which the bits pass beneath the head.  (Spinning the
disk faster, as many drive vendors are doing, is still a good
idea for a whole variety of reasons, however.)

>	This rather long winded diatribe brings us back to my position,
>I have a disk I/O rate running (worst case) about 9 mbytes/sec, I need
>and NFS file server that can do such a rate. Now the fun begins...

[Good explanation, elided, of why some workstation servers that weren't
architected as efficient IO engines fall down on that task - jjb]

>Then we hit Auspex
>[...discussion of Auspex NFS file server architecture elided - jjb]

>...to the correct SCSI controller (1 for every 2 disks since
>that is what maximally loads the SCSI channel without performance loss)

This is fallacious as a generalization.  It should be possible to get
significantly more than two - perhaps more than seven, with "WIDE" -
disk devices on a SCSI-2 channel with minimal performance degradation
(90% or more of the sum of the expected performances of the individual
disk drives.)

"But wait", you say, "each disk can stream about 3M/sec of data; so
the maximum appears to be 3 + a fraction disks on a 10M SCSI-2 channel."

My claim rests on a real world observation: hardly anybody ever streams
data off a disk for long periods of time.  That is, the critical metric
for real systems is not bytes/second, but IO operations/second.  For
example, with Unix file systems "optimal" performance has been reached
if you're getting one file system block per disk spin; with relational
database programs, similar considerations apply, depending on the IO
architecture of the RDBMS.

At 3600 RPM or 60 RPS, with 8K file system blocks, that's less than
600K bytes/second.   Of course, over time higher level software changes
(contiguous allocation file systems, log structured file systems, new
RDBMS architectures, etc.) will invalidate this whole line of reasoning.
But (although your milage may vary somewhat) it's essentially true,
for now.

>[Long discussion of the merits of the Auspex box elided.  Note:
>lest I should be misunderstood I'm *not* taking issue with this
>argument, which seems sound, or with Auspex!]

Now, my nontechnical question:
You don't say what you're usage model is, but from your mention of MTS
I'm guessing it's academic timesharing.

My guess is that the workstation vendor's response would be something
like this: "why do you need to centralize all the IO in one huge server?
Why don't you just buy inexpensive, individually weak servers until you
have enough computer/IO/whatever power to satisfy your user community?"

What do you say to this?

On the other hand, perhaps your usage model is academic database -
keeping the grades, generating the schedules, etc.  In that case,
why are you looking at a workstation network when there are several
vendors, including Sequent (no flames please - I believe it's a
reasonable objective comment) specializing in building larger "Unix
boxes" that don't suffer from the bottlenecks you mention...at least
not to the same extent as the IO bound machine you described.

>Peter Van Epp / van...@sfu.ca   #include <std.disclaimer>
>Simon Fraser University, Burnaby, B.C. Canada
-- 
Jeff Berkowitz, Sequent Computer Systems  j...@sequent.com  uunet!sequent!jjb
"Each project starts as an intellectual exercise but ends as an athletic event"
 - paraphrasing international GM Korchnoi, who was talking about chess matches.

Path: sparky!uunet!zephyr.ens.tek.com!uw-beaver!ubc-cs!newsserver.sfu.ca!
sfu.ca!vanepp
From: van...@fraser.sfu.ca (Peter Van Epp)
Newsgroups: comp.arch.storage
Subject: Re: As promised, mainframe I/O (or why we have an Auspex NS5000 -:))
Keywords: mainframe, Auspex, NS5000
Message-ID: <vanepp.704329059@sfu.ca>
Date: 26 Apr 92 22:57:39 GMT
References: <vanepp.704170942@sfu.ca> <1992Apr26.194811.18177@sequent.com>
Sender: ne...@sfu.ca
Organization: Simon Fraser University, Burnaby, B.C., Canada
Lines: 214

j...@sequent.com (Jeff Berkowitz) writes:

>Since the original article was quite long, I've extensively trimmed
>the quotes.    There might not be enough left to really understand
>the original article.

But seeing the good points raised in this one may provide motivation to 
go back and read it -:)

>First, the technical issues:

>In article <vanepp.7...@sfu.ca>,
>	van...@fraser.sfu.ca (Peter Van Epp) writes:

>The ANSI FibreChannel standard is one effort that may (will) make
>this type of technology available in open systems environments.

I couldn't agree more, but the I/O problem into conventional dram based
memory is still going to be there.

>>... current machines can have 256 to 512 channels.

>Here is where it begins to get interesting:

>* With current technology, there is no conceivable reason why anybody
>would need this many channels. *

To twist out a bit, with current Mainframe technology I can think of at
least one, I used to be a systems programmer on an IBM TPF1 Airline 
reservation system which drove some 160 spindles of 3350 disk through
12 channels and 12 control units (although only 80 drives are actually
unique, the other 80 are a mirror thorough a different channel and control
unit of the other one). I have heard that a large US airline got some of the
initial 3990(?) IBM caching disk controllers and performance went down!
The reason is that that huge array of disks is actually a RAID array, data
(there isn't a file system per se) is spread 1 block on this disk one on the 
next etc. The caching disk controller caches the whole track so TPF would 
read 1 block, then try and seek to a new track for a new block, and have
to wait for the track read to finish (this is about 10th hand and so may
not be completly accurate). For MVS and VM, the track buffering scheme of
course works just fine!

>TODAY we apply the solution that the technology allows: we put
>memory in the disk drive.  This means that when the data physically
>passes beneath the head, it's sucked into a buffer in the drive.
>The drive then requests use of the channel, and when the channel
>is available the drive "squirts" the data over to the host.

And this may be a good reason why Brand X would have indeed done the job
and supported the numbers that they were quoting, the problem was that
they (unlike you -:)) didn't seem to know why they could do what they 
claimed to be able to do. Part of our problem was that we couldn't seem to 
find anybody that had gone from a mainframe into an NFS environment who 
could say from experience, yes the numbers quoted will work (and neither
could the vendors!). 

>>...to the correct SCSI controller (1 for every 2 disks since
>>that is what maximally loads the SCSI channel without performance loss)

>This is fallacious as a generalization.  It should be possible to get
>significantly more than two - perhaps more than seven, with "WIDE" -
>disk devices on a SCSI-2 channel with minimal performance degradation
>(90% or more of the sum of the expected performances of the individual
>disk drives.)

I believe (although you'd have to ask the Auspex folks) that this came
as a result of testing on their particular box with their particular
architecture, and may well be more of a marketing ploy than anything
else (in this case a good marketing ploy if it is -:)), and you may well
be correct with different hardware.

>"But wait", you say, "each disk can stream about 3M/sec of data; so
>the maximum appears to be 3 + a fraction disks on a 10M SCSI-2 channel."

>My claim rests on a real world observation: hardly anybody ever streams
>data off a disk for long periods of time.  That is, the critical metric

MVS and VM tend to, but I'll agree we are talking Unix here -:).

>for real systems is not bytes/second, but IO operations/second.  For
>example, with Unix file systems "optimal" performance has been reached
>if you're getting one file system block per disk spin; with relational
>database programs, similar considerations apply, depending on the IO
>architecture of the RDBMS.

>At 3600 RPM or 60 RPS, with 8K file system blocks, that's less than
>600K bytes/second.   Of course, over time higher level software changes
>(contiguous allocation file systems, log structured file systems, new
>RDBMS architectures, etc.) will invalidate this whole line of reasoning.
>But (although your milage may vary somewhat) it's essentially true,
>for now.

True, but in the Auspex case they can also stripe a logical file system
across multiple disks and in theory have seeks for various users in the 
same file system (heading for that big cache) going on more than one disk
at the same time. The particular case of an NFS server is I think somewhat
different than a standard Unix file system. Hopefully some of the Auspex
folks will jump in and correct any of my errors of statement or understanding
of what their box does!

>Now, my nontechnical question:
>You don't say what you're usage model is, but from your mention of MTS
>I'm guessing it's academic timesharing.

Right in one (I thought the first post might get a little long so I omitted
our environment.

>My guess is that the workstation vendor's response would be something
>like this: "why do you need to centralize all the IO in one huge server?
>Why don't you just buy inexpensive, individually weak servers until you
>have enough computer/IO/whatever power to satisfy your user community?"

>What do you say to this?

The simple answer is politics, "Timeshare is bad, Distributed Computing is
good", but we want all the good features of central operation (backup, 
the same environment on any machine, no increase in staff etc.) Before 
I get taken wrong, the computing center had been moving towards Unix and
a distributed environment for 5 or 6 years before this, and in many ways
the change over was the only practical way to do it. Our campus has IBM
type 9 cable (that will support either token ring or Ethernet) pulled to
every telephone jack on campus, running back to some 50 odd wiring closets
interconnected by fibre. Installing this and the network equipment in those
closets (Cabletron mostly) took a lot of time and more importantly money
(they don't let me do anything except tell them what I think we should buy
so I don't actually know how much, but it was large). We have some 3500
Macs, PCs, and Unix workstations (and the mainframe in years past) all
hooked to this network (and somewhere around a 1000 serial connections 
that in some cases go to the same place as the high speed ones). We had
some 10 to 11 thousand active accounts (that had been used in the last 
12 months) on the mainframe. The mainframe would fall over dead, when
around 210 users signed on all at once (generally 3 or 4 cpu bound jobs,
10 or 20 SAS and similar package users and the rest doing E-mail and 
conferencing). In February of 91 we were told that E-mail had to be on
Unix by Aug 31, and everything had to be on Unix Dec 31 of 1991. I believe
the unstated (because we didn't need to know -:)) reason for this, is that
the whole thing was being financed by the money that was supporting the 
mainframe (no new money) and thats all the time they could afford.
The interum step (by Dec 31) was decided to be to replace the mainframe
with a unix system of the same capacity, this ended up being 4 Silicon
Graphics 4d320's, 3 of them for the researchers aimed at about 40 users
each, an the last one as a general login server supporting 80 users for
E-mail and News, and 2 Sun 470s for student instructional needs (since
the Computing Science department has around 100 or so Suns). All of these
machines live in our central machine room (up until a couple of weeks ago
in the shadow of our mainframe, big big hole there now -:). It was our 
decision (well, mostly the only one of us who had any Unix experiences 
decision -:)) that a single large NFS file server would be better than 
cross mounting local disks on all those machines, which set us off on the
above stated search for a Unix vendor that knew about I/O. Each of the 
6 Cpus has a second Ethernet card that connects to the Auspex on one 
of 4 ethernet ports (one Auspex port goes to the backbone, and one
supports a CS lab) so that the NFS traffic is spread across the 4 Ethernets.
The last ethernet runs over fibre to a Computing Science undergrad lab
of 30 (or more -:)) NeXT stations and some number of Sun servers, allowing
the students to use the same Unix ID on all the machines they have access
to and to see their home directory from any machine (some 25,000 home 
directories live on the Auspex, although I suspect that only about 10,000
of them are active). I expect that a single large Unix server (assuming
it had the I/O bandwidth) would have worked, but that would have been seen
(rightly or wrongly) as one Mainframe replacing the other and therefore
not distributed. There are some advantages to the current setup, the
researchers are all fighting each other for CPU time on the three research
machines, but that fighting (unlike the MTS mainframe) doesn't impact the
people using the general login server for Mail and News (where this is being
written in fact!). This machine has been seen to support somewhat more than 
110 users at once without any performance problems and we don't (yet!) know
how many more it would support (there was a limit of 99 logins during some
of the peak time). CPU bound jobs (in fact most of anything other than 
E-mail, NetNews and editing) get stopped and told to use one of the research
machines, since E-mail demand is great here, and used to be performance
bound on the Mainframe, I expect a majority of the users are happier (they
can also elect to read there mail on their Mac or PC via pop and never log
in to Unix at all, and many do!).
	I mentioned backup up there, but this is gettng pretty long again
(and somewhat outside the bounds of the charter, but he asked -:)), so maybe
I'll post it and see if the rest of you want me to shut up yet -:) before
going on to discuss Auspex backup and the terrible performance of Unix Tape
drives  (to say nothing of a total lack of tape support, make sure you have a
VAX around before converting from a mainframe to Unix -;)).

>On the other hand, perhaps your usage model is academic database -
>keeping the grades, generating the schedules, etc.  In that case,
>why are you looking at a workstation network when there are several
>vendors, including Sequent (no flames please - I believe it's a
>reasonable objective comment) specializing in building larger "Unix
>boxes" that don't suffer from the bottlenecks you mention...at least
>not to the same extent as the IO bound machine you described.

The admin stuff runs on a VAX, and has been being converted from running
under OS/MVT (i don't believe IBM has supported MVT for at least 15 years,
it has been going away "in a year" ever since I got here 4 years ago,
MTS came in to replace it, MTS is gone and MVT is still here ...).

>-- 
>Jeff Berkowitz, Sequent Computer Systems  j...@sequent.com  uunet!sequent!jjb
>"Each project starts as an intellectual exercise but ends as an athletic event"
> - paraphrasing international GM Korchnoi, who was talking about chess matches.

Thanks for taking the time to respond to this, I am still trying to learn 
how you capacity plan and/or specify things in Unix and NFS (and when it
comes AFS, because I think thats the answer for full distribution here)
since I expect our current config is probably only a stepping stone down
the path. There is also a lot of interest at other sites out there on how
do you replace a mainframe with Unix (if you aren't MIT -:)), we held a 
BOF at the last LISA hoping to find someone with the answers, instead we
found a lot of people with the same questions that invited us to tell them 
how to do it next year -:) 

>>Peter Van Epp / van...@sfu.ca   
>>Simon Fraser University, Burnaby, B.C. Canada
disclaimer: The above are all personel opinions and may not (probably 
don't in some cases -:)) reflect the opinions of the university.