Path: utzoo!yunexus!geac!syntron!jtsv16!uunet!ncrlnk!ncrcae!hubcap!gatech!
ncar!ames!vsi1!wyse!mips!mash
From: m...@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: MIPS Performance Brief 3.5, October 1988 [more than long]
Keywords: benchmarks
Message-ID: <7000@winchester.mips.COM>
Date: 26 Oct 88 05:08:06 GMT
Article-I.D.: winchest.7000
Lines: 1380

People have been beating me up for this, so here it is.  As usual,
let me know if you have better numbers for things.  We're always trying
to get things up-to-date, but it's a Sisyphean task.
(Note that this is an extract from the printed version, crunched to
reduce the size for the net.)

------------READ NO FURTHER UNLESS YOU'RE GLUTTON FOR NUMBERS

1.  Introduction

New Features of This Issue
Added to this issue are performance numbers for our new M/2000-8 systems, and
new 1.31 compiler numbers for a few benchmarks.  As a preview, a few numbers
are shown for the next version of the compilers.

Benchmarking - Caveats and Comments
While no one benchmark can fully characterize overall system performance, the
results of a variety of benchmarks can give some insight into expected real
performance.  A more important benchmarking methodology is a side-by-side com-
parison of two systems running the same real application.

We don't believe in characterizing a processor with just a single number, but
we follow (what seems to be) standard industry practice of using a mips-rating
that essentially describes overall integer performance.  Thus, we label a 20-
mips machine to be one that is about 20X (i.e., anywhere from 15X to 25X!) fas-
ter than a VAX 11/780 on integer performance, since this seems to be how most
people intuitively compute mips-ratings.  (We compare against VAX/VMS compilers
when possible, 4.3BSD otherwise.) Even within the same computer family, perfor-
mance ratios between processors vary widely.  For example, [McInnis 87] charac-
terizes a ``6 mips'' VAX 8700 as anywhere from 3X to 7X faster than the 11/780.
Floating point speed often varies more than, and scales up slower than integer
speed versus the 11/780.  In practice, we find that MIPS RISComputer mips-
ratings are grossly similar to DEC's relative performance ratings, i.e., we've
tried to make one "MIPS-mip" about equal to one DEC "VUP" (VAX Unit of Perfor-
mance).  We try to use a benchmark mix, and we compare against DEC's best com-
pilers when at all possible, just as DEC does.  Note that VUPs are not MVUPs
(MicroVAX II Units of Performance): a MicroVAX-II is not as fast as a VAX
11/780 in many cases.

This paper analyzes one aspect of overall computer system performance - user-
level CPU performance.

MIPS Computer Systems does not warrant or represent that the performance data
stated in this document will be achieved by any particular application.  (We
have to say that, sorry.)

2.  Benchmark Summary

2.1.  Choice of Benchmarks

This brief offers both public-domain and MIPS-created benchmarks.  We prefer
public domain ones, but some of the most popular ones are inadequate for accu-
rately characterizing performance.  In this section, we give an overview of the
importance we attach to the various benchmarks, whose results are summarized on
the next page.

Dhrystone [DHRY 1.1] and Stanford [STAN INT] are two popular small integer
benchmarks.  Compared with the fastest VAX 11/780 systems, the M/120-5 is 13-
16X faster than the VAX on these tests, and yet, we rate the M/120-5 as a 12-
vax-mips machine.  In fact, if we chose different VAX software to compare
against, we could call the M/120-5 17-19 mips, right now.  However, our mips-
ratings are derived from the performance of real programs, and we conclude the
artificial benchmarks are not representative.  We observe that many vendors
claim mips-ratings based on the most favorable choice of benchmarks ("Dhrystone
mips", for example) or performance estimates for machines not built.  If you're
comparing an M/120-5 against such claims, it is a 19-mips machine.

While we include Dhrystone and Stanford, we feel that the performance of large
UNIX utilities, such as grep, yacc, diff, and nroff is a better (but not per-
fect!) guide to the performance customers will receive.  These four, which make
up our [MIPS UNIX] benchmark, demonstrate that performance ratios are not sin-
gle numbers, but range here from 10X to 16X faster than the VAX, and 17-24X on
the M/2000.

Even these UNIX utilities tend to overstate performance relative to large
applications, such as CAD applications.  Our own vax-mips ratings are based on
a proprietary set of larger and more stressful real programs, such as our com-
piler, assembler, debugger, and various CAD programs.

For floating point, the public domain benchmarks are much better.  We're still
careful not to use a single benchmark to characterize all floating point appli-
cations.

The Livermore Fortran kernels [LLNL DP] give insight into both vector and non-
vector performance for scientific applications.  Linpack [LNPK DP and LNPK SP]
tests vector performance on a single scientific application, and stresses cache
performance.  Spice [SPCE 2G6] and Doduc [DDUC] test a different part of the
floating point application spectrum.  The codes are large and thus test both
instruction fetch bandwidth and scalar floating point.

2.2.  Benchmark Summary Data     This section summarizes the most important
benchmark results described in more detail throughout this document. The
numbers show performance relative to the VAX 11/780, i.e., larger numbers are
better/faster.

o    A few numbers have been estimated by interpolations from closely-related
     benchmarks and/or closely-related machines.  The methods are given in
     great detail in the individual sections.

o    Several of the columns represent summaries of multiple benchmarks.  For
     example, the MIPS UNIX column represents 4 benchmarks, the SPICE 2G6
     column 3, and LLNL DP represents 24.

o    In the Integer section, MIPS UNIX is the most indicative of real perfor-
     mance.

o    For Floating Point, we especially like LLNL DP (Livermore FORTRAN ker-
     nels), but all of these are useful, non-toy benchmarks.

o    In the following table, "Pub mips" gives the manufacturer-published mips-
     ratings.  As in all tables in this document, the machines are listed in
     increasing order of performance according to the benchmarks, in this case,
     by Integer performance.

o    The summary includes only those machines for which we could get measured
     results on almost all the benchmarks and good estimates on the results for
     the few missing data items.

o    The next few pages contain a summary table and graph.

                         Summary of Benchmark Results
                     (VAX 11/780 = 1.0, Larger is Faster)

   Integer (C)            Floating Point (FORTRAN)
MIPS   DHRY   STAN   LLNL  LNPK    LNPK    SPCE    DDUC    Publ
UNIX   1.1    INT     DP    DP      SP      2G6            mips   System
 1      1      1      1      1       1       1       1       1    VAX 11/780#

 2.1    1.9    3.0    1.9    2.9     2.5     1.6    *1.3     2    Sun-3/160 FPA

*4      4.1    4.4    2.8    3.3     3.4     2.4     1.7     4    Sun-3/260 FPA
 6.4    7.4    6.2    2.5    4.3     3.7     3.4     3.8     5    MIPS M/500

*6      5.9    6.2    5.9    7.1     5.6    *5.3     5.2     6    VAX 8700
 9.9   10.8   10.7    4.5    7.9     6.4     4.1     3.5    10    Sun-4/200

10.9   11.3   10.0    8.1    8.6    11.2     6.6     7.3     8    MIPS M/800
13.1   13.5   12.3   10.8   10.7    14.0     8.0     8.7    10    MIPS M/1000
14.9   15.6   13.3   12.1   15.0    16.0     9.7    11.1    12    MIPS M/120-5
21.9   24.1   21.3   18.2   25.7    23.6    16.0    17.0    20    MIPS M/2000-8

# VAX 11/780 runs 4.3BSD for MIPS UNIX, Ultrix 2.0 (vcc) for Stanford, VAX/VMS
  for all others.  Use of 4.3BSD (no global optimizer) probably inflates the
  MIPS UNIX column by about 10%.

* Although it is nontrivial to gather full set of numbers, it is important to
  avoid holes in benchmark tables, as it is too easy to be misleading.  Thus,
  we had to make reasoned guesses at these numbers.  The MIPS UNIX values for
  VAX 8700 and Sun-3/260 were taken from the Published mips-ratings, which are
  consistent (+/- 10%) with experience with these machines.  DDUC was guessed
  by noting that most machines do somewhat better on DDUC than on SPCE, and
  than a Sun-3/260 is usually 1.5X faster than a Sun-3/160 on floating-point
  benchmarks.

Benchmark Descriptions:

MIPS UNIX
  MIPS UNIX benchmarks: grep, diff, yacc, nroff, same 4.2BSD C source compiled
  and run on all machines. The summary number is the geometric mean of the 4
  relative performance numbers.

DHRY 1.1
  Dhrystone 1.1, any optimization except inlining.

STAN INT
  Stanford Integer.

LLNL DP
  Lawrence Livermore Fortran Kernels, 64-bit.  The summary number is the given
  as the relative performance based on the geometric mean, i.e., the "middle"
  of the 3 means.

LNPK DP
  Linpack Double Precision, FORTRAN.

LNPK SP
  Linpack Single Precision, FORTRAN.

SPCE 2G6
  Spice 2G6, 3 public-domain circuits, for which the geometric mean is shown.

DDUC
  Doduc Monte Carlo benchmark.

3.  Methodology

Tested Configurations

When we report measured results, rather than numbers published elsewhere, the
configurations were as shown below.  These system configurations do not neces-
sarily reflect optimal configurations, but rather the in-house systems to which
we had repeatable access.  When we've had faster results available, we've
quoted them in place of our own system's numbers.

DEC VAX-11/780
Main Memory:        8 Mbytes
Floating Point:     Configured with FPA board.
Operating System:   4.3 BSD UNIX.

MIPS M/800
CPU:                12.5 MHz R2000, in R2600 CPU board, 64K I-cache, 64K D-cache
Floating Point:     R2010 FPA chip (12.5MHz)
Main Memory:        8 Mbytes (2 R2350 memory boards)
Operating System:   UMIPS-BSD 2.1

MIPS M/1000
CPU:                15 MHz R2000, in R2600 CPU board, 64K I-cache, 64K D-cache
Floating Point:     R2010 FPA chip (15 MHz)
Main Memory:        16 Mbytes (4 R2350 memory boards)
Operating System:   UMIPS 3.0

MIPS M/120-5
CPU:                16.7 MHz R2000, 64K I-cache, 64K D-cache
Floating Point:     R2010 FPA chip (16.7 MHz)
Main Memory:        16 Mbytes (2 memory boards)
Operating System:   UMIPS 3.0

MIPS M/2000-8
CPU:                25 MHz R3000, 64K I-cache, 64K D-cache
Floating Point:     R3010 FPA chip (25 MHz)
Main Memory:        32 Mbytes
Operating System:   UMIPS 3.10

Test Conditions

All programs were compiled with -O (optimize), unless otherwise noted.

C is used for all benchmarks except Whetstone, LINPACK, Doduc, Spice 2G6,
Hspice, and the Livermore Fortran Kernels, which use FORTRAN.  When possible,
we've obtained numbers for VAX/VMS, and use them in place of UNIX numbers.  The
MIPS compilers are version 1.21 or 1.31.

User time was measured for all benchmarks using the /bin/time command.

Systems were tested in normal multi-user development environment, with load
factor <0.2 (as measured by uptime command).  Note that this occasionally makes
them run longer, due to slight interference from background daemons and clock
handling, even on an otherwise empty system.  Benchmarks were run at least 3
times and averaged.  The intent is to show numbers that can be reproduced on
live systems.

How to Interpret the Numbers

Times (or rates, such as for Dhrystones, Whetstones, and LINPACK KFlops) are
shown for the VAX 11/780.  Other machines' times or rates are shown, and their
relative performance ("Rel." column) normalized to the 11/780 treated as 1.0.
VAX/VMS is used whenever possible as the base.

Compilers and Operating Systems

Unless specified otherwise the M-series benchmark numbers use Release 1.31 of
the MIPS compilers and UMIPS 3.0.  Compiler release 1.31 improved many of the
FORTRAN numbers, but changed integer performance relatively little.

UMIPS 3.0 (RISC/os) is a System V, Release 3.0 port, with TCP/IP, NFS, a Berke-
ley Fast File System, and other Berkeley features.  UMIPS 3.10 added support
for the M/2000, but is otherwise similar, and both use compiler release 1.31.
Most user-level programs run at about the same speed on UMIPS-BSD 2.1 and UMIPS
3.0.

Optimization Levels

Unless otherwise specified, all benchmarks were compiled -O, i.e., with optimi-
zation.  UMIPS compilers call this level -O2, and it includes global intra-
procedural optimization.  In a few cases, we show numbers for -O3 and -O4
optimization levels, which do inter-procedural register allocation and pro-
cedure merging.

Now, let's look at the benchmarks.  Each section title includes the (CODE NAME)
that relates it back to the earlier Summary, if it is included there.

4.  Integer Benchmarks

4.1.  MIPS UNIX Benchmarks (MIPS UNIX)

The MIPS UNIX Benchmarks are fairly typical of nontrivial UNIX commands.  This
benchmark suite provides the opportunity to execute the same code across
several different machines, in contrast to the compilers and linkers for each
machine, which have substantially different code.  These benchmarks contain
UNIX source code, and so are not generally distributable.  User time is shown;
kernel time is typically 10-15% of the user time, so these are good indications
of integer/character compute-intensive programs.  The old grep and nroff bench-
marks ran too quickly on the faster machines to be meaningful; we now use
longer tests.  The old versions are still shown for reference, but no longer
summarized.  Temporary machine unavailability forced us to estimate the speeds
on the Suns for these two tests, shown in italics.

Note: the Geometric Mean of N numbers is the Nth root of the product of those
numbers.  It is necessarily used in place of the arithmetic mean when computing
the mean of performance ratios, or of benchmarks whose runtimes are quite dif-
ferent.  See [Fleming 86] for a detailed discussion.

                         MIPS UNIX Benchmarks Results
                  (4.3BSD VAX 11/780 = 1.0, Larger is Faster)
  grep      diff       yacc     nroff    Geom              old-grep  old-nroff
Secs Rel.  Sec  Rel  Secs Rel. Secs Rel. Mean    System    Secs Rel. Secs  Rel.

58.5 1.0 246.4 1.0  101.1 1.0 108.1 1.0  1.0  11/780 4.3BSD11.2 1.0  18.8  1.0
29.2 2.0 105.3 2.3   48.1 2.1  51.5 2.1  2.1  Sun-3/160M    5.6 2.0   9.0  2.1
 7.8 7.5  35.8 6.9   19.5 5.2  17.5 6.2  6.4  MIPS M/500    2.4 4.7   3.3  5.7

 5.1 11.5 25.1 9.7   11.8 8.6  10.6 10.2 9.9  Sun-4/200 -O3 1.6 7.0   2.2  8.6
 5.1 11.5 21.7 11.4  11.2 9.0   9.2 11.8 10.9 MIPS M/800    1.6 7.0   1.9  9.9
 4.2 13.9 18.0 13.7   9.3 10.9  7.6 14.2 13.1 MIPS M/1000   1.3 8.6   1.5 12.5

 3.7 15.8 15.7 15.7   8.1 12.5  6.8 15.9 14.9 MIPS M/120-5  1.1 10.2  1.3 14.5
 2.5 23.4 11.3 21.8   5.4 18.7  4.5 24.0 21.9 MIPS M/2000-8 0.7 16.0  0.9 20.9

Note: in order to assure "apples-to-apples" comparisons, we moved the same
copies of the (4.2BSD) sources for these to the various machines, compiled them
there, and ran them, to avoid surprises from different binary versions of com-
mands resident on these machines.

Note that the granularity here is at the edge of UNIX timing, i.e., tenths of
seconds make differences, especially on the faster machines, although we've
pushed this back by beefing up the grep and nroff benchmarks.  Of course, by
1989, the smaller benchmarks will be in trouble again, but by then, we hope to
replace these with a set of better ones anyway.

Sun estimates: we had to estimate the Sun performance on several benchmarks,
where numbers are shown in italics.  We assumed that the Sun-3/160M performance
ratio would stay about constant, and so kept the same ratios as the old-grep
and old-nroff columns.  For the Sun-4, we computed the grep number by improving
the performance ratio over old-grep by the largest factor of improvement found
among the MIPS machines (the M/800): (11.5/7) * 7 = 11.5.  We did the same for
nroff: (11.8/9.9) * 8.6 = 10.2.  The result seems appropriate: the Sun-4/2xx
has usually shown slightly lower integer performance than an M/800.

The new benchmarks seem better indicators of real performance than the old
ones, which exercised less code.  For example, the new nroff benchmark uses a
macro package, which is a little more realistic.  Still, the performance change
means that one should take care in running randomly-chosen UNIX commands: sim-
ple changes to improve timing have added several apparent mips!  Fortunately,
these are not the actual benchmarks we use in our own relative-performance com-
putations, so we still think the M/120 is a 12-VUPs machine, not 15-VUPs.

Note this benchmark set is run versus 4.3BSD, not versus Ultrix 2.0 with vcc.
From experience, we'd guess that subtracting 10%-15% from most of the computed
mips-ratings would give a good estimate of the Ultrix 2.0 (vcc)-relative mips-
ratings, depending on the machine's performance on more stressful benchmarks.

With this background, it is interesting to analyze [AMD 88], which supplies
diff, grep, and nroff benchmarks that are, of course, not exactly the same
benchmarks as we used.  It computed VAX-11/780 (4.3BSD)-relative performance
ratios, as we have, and included Sun-4 performance ratios from an earlier issue
of this Brief.  As seen already, there can easily be several VAX-mips' differ-
ence in choosing the specific benchmark, so we can't recommend mixing bench-
marks together this way, especially as the AMD cases have run times 15-80X
shorter than the current MIPS cases, and are simulations for the 29000s.  We
don't know whether or not the AMD simulator does cache flushing appropriate to
running something in a UNIX environment, which adds even more difficulty making
the comparison.  But, to try to get even the most tenuous comparison with the
AMD 29000, we'll use the AMD technique, combining data from our last table with
that of AMD's:
                        MIPS UNIX + AMD UNIX Benchmarks
                  (4.3BSD VAX 11/780 = 1.0, Larger is Faster)
        grep             diff             nroff          Geom
oldMIPS  newMIPS AMD   MIPS AMD   oldMIPS  newMIPS AMD   Mean  System
   1        1     1     1    1       1        1     1     1    VAX 11/780#

   -        -     3.0   -    3.6     -        -     3.1   3.2  Sun 3/60

  7.0     11.5    -     9.7  -       8.6    10.2    -     9.3  Sun-4/200
  7.0     11.5    -    11.4  -       9.9    11.8    -    10.1  MIPS M/800
            -    10.7   -   13.4     -             10.8  11.4  29K VRAM, 25MHz
  8.6     13.9    -    13.7  -      12.5    14.2    -    12.4  MIPS M/1000

 10.2     15.8    -    15.7  -      14.5    15.9    -    14.2  MIPS M/120
   -        -    14.4   -   18.4     -             13.6  15.3  29K cache, 25MHz

 16.0     23.4    -    21.8  -      20.5    24.0    -    20.9  MIPS M/2000

This is haphazard data at best, but we think this says that the cached 29000 @
25MHz acts somewhat like a MIPS M/120, or slightly faster.  (On grep, the AMD
sits between the two M/120 numbers, on diff it is faster, and on nroff it is
slower.) We'd call it 13 MIPS-mips, maybe 14 as compilers improve, or caches
are made larger.  None of these benchmarks stress the caches, so it is diffi-
cult to guess what would happen on larger benchmarks.  To be fair, that may be
irrelevant anyway, as the 29000 seems tuned more for controller environments
than large-system environments.

4.2.  Dhrystone (DHRY 1.1)

Dhrystone is a synthetic programming benchmark that measures processor and com-
piler efficiency in executing a ``typical'' benchmark. The Dhrystone results
shown below are measured in Dhrystones / second, using the 1.1 version of the
benchmark. We include Dhrystone because it is popular.  MIPS systems do
extremely well on it.  However, comparisons of systems based on Dhrystone and
especially, only on Dhrystone, are unreliable and should be avoided.  See
details at the end of this section.  Results for a given machine are typically
about 15% less for 1.1 than with 1.0, and another 10% less for 2.x.  We've
found that most unlabeled Dhrystones offered by vendors use 1.1, so we still
summarize that version, but we now include Dhrystone 2.1.

Advice for running Dhrystone has changed over time.  It used to ask people to
turn off anything but peephole optimization, as the benchmark contained a mod-
est amount of "dead" code.  (This is one of the things that Dhrystone 2
attempts to fix.) However, many people actually were submitting optimized
results, often unlabeled, confusing everyone.  Currently, any numbers can be
submitted, as long as they're appropriately labeled, as long as the avoid pro-
cedure inlining, done by only a few very advanced compilers.

We continue to include a range of numbers to show the difference optimization
technology makes on this particular benchmark, and to provide a range for com-
parison when others' cited Dhrystone figures are not clearly defined by optimi-
zation levels.  For example, -O3 does interprocedural register allocation, and
-O4 does procedure inlining; -O4 is beyond the spirit of the benchmark.  Sun's
-O3 and our -O3 do different things, but neither does inlining, so we cite
those numbers.

Compare the performance of the two Ultrix compilers.  Also, see the MIPS and
Sun-4 numbers for the performance gained by the high-powered optimizers avail-
able on these machines.

      Dhrystone (1.1, some 2.1) Benchmark Results - Optimization Effects

    No Opt             -O          -O3     -O4
NoReg    Regs    NoReg    Regs     Regs    Regs
Dhry's  Dhry's   Dhry's  Dhry's   Dhry's  Dhry's
 /Sec    /Sec     /Sec    /Sec     /Sec    /Sec    System
 1,442   1,474    1,559   1,571                    DEC VAX 11/780, 4.3BSD
 2,800   3,025    3,030   3,325                    Sun-3/160M
 4,896   5,130    5,154   5,235                    DEC VAX 8600, Ultrix 1.2
 8,800  10,200   12,300  12,300   13,000  14,200   MIPS M/500
 8,000   8,000    8,700   8,700                    DEC VAX 8550, Ultrix 2.0 cc
 9,600   9,600    9,600   9,700                    DEC VAX 8550, Ultrix 2.0 vcc
10,550  12,750   17,700  17,700   19,000           Sun-4/200, SunOS 3.2L
12,800  15,300   18,500  18,500   19,800  21,300   MIPS M/800
15,100  18,300   22,000  22,000   23,700  25,000   MIPS M/1000
18,700  21,500   25,800  25,800   27,400  29,200   MIPS M/120-5
30,700  32,400   39,700  39,700   42,300  45,300   MIPS M/2000-8
                                                        DHRYSTONE 2.1
19,000  20,400   23,200  23,200   24,700  27,000   MIPS M/120-5
31,300  33,000   36,700  36,700   38,800  42,800   MIPS M/2000-8

Other published numbers include the following, which are taken from [Richardson
87], unless otherwise noted.  Items marked * are those that we know (or have
good reason to believe) use optimizing compilers.  These are the "register"
versions of the numbers, i.e., the highest ones reported.

                        Dhrystone 1.1 Benchmark Results
Dhry's
 /Sec        Rel.       System

 1,571       0.9        VAX 11/780, 4.3BSD [in-house]
 1,757       1.0        VAX 11/780, VAX/VMS 4.2 [Intergraph 86]*
 3,850       2.2        Sun-3/100 [Muchnick 88]

 6,374       3.6        Sun-3/260, 25MHz 68020, SunOS 3.2
 6,423       3.7        VAX 8600, 4.3BSD
 6,440       3.7        IBM 4381-2, UTS V, cc 1.11
 6,896       3.9        Intergraph InterPro 32C, SYSV R3 3.0.0, Greenhills, -O*
 7,109       4.0        Apollo DN4000 -O
 7,140       4.1        Sun-3/200 [Muchnick 88] *
 7,249       4.2        Convex C-1 XP 6.0, vc 1.1
 7,409       4.2        VAX 8600, VAX/VMS in [Intergraph 86]*
 7,655       4.4        Alliant FX/8 [Multiflow]

 8,300       4.7        DG MV20000-I and MV15000-20 [Stahlman 87]
 8,309       4.7        InterPro-32C,30MHz Clipper,Green Hills[Intergraph 86]*
 9,436       5.4        Convergent Server PC, 20MHz 80386, GreenHills*
 9,920       5.6        HP 9000/840S [HP 87]
10,416       5.9        VAX 8550, VAX/VMS 4.5, cc 2.2*
10,787       6.1        VAX 8650, VAX/VMS, [Intergraph 86]*

11,215       6.4        HP 9000/840, HP-UX, full optimization*
12,639       7.2        HP 9000/825S [HP 87]*
13,000       7.4        MIPS M/500, 8MHz R2000, -O3*
13,157       7.5        HP 825SRX [Sun 87]*

14,109       8.0        Sun-4/110 * [Sun 88]
14,195       8.1        Multiflow Trace 7/200 [Multiflow]
14,820       8.4        CRAY 1S
15,007       8.5        IBM 3081, UTS SVR2.5, cc 1.5
15,576       8.9        HP 9000/850S [HP 87]

18,530       10.5       CRAY X-MP
19,000       10.8       Sun-4/200 [Muchnick 88], -O3*
19,800       11.3       MIPS M/800, 12.5MHz R2000, -O3*

23,430       13.3       HP 835S [RISC Mgmt 88]
23,700       13.5       MIPS M/1000, 15MHz R2000, -O3*
27,400       15.6       MIPS M/120-5, 16.7MHz R2000, -O3*
28,846       16.4       Amdahl 5860, UTS-V, cc1.22
31,250       17.8       IBM 3090/200
34,000       19.4       Motorola 88000, unknown configuration [RISC Mgmt 88]
35,653       20.3       AMD 29000, 25MHz, 2 8K caches (simulation) [AMD 88]

42,300       24.1       MIPS M/2000, -O3*
43,668       24.9       Amdahl 5890/300E, cc -O
53,108       30.2       CCI Power 7/64 (simulation) [Simpson 88]

Unusual Dhrystone Attributes

We've calibrated this benchmark against many more realistic ones, and we
believe that its results must be treated with care, because the detailed pro-
gram statistics are unusual in some ways.  It has an unusually low number of
instructions per function call (35-40 on our machines), where most C programs
fall in in the 50-60 range or higher.  Stated another way, Dhrystone does more
function calls than usual, which especially penalizes the DEC VAX, making this
a favored benchmark for inflating one's "VAX-mips" rating.  Any machine with a
lean function call sequence looks a little better on Dhrystone than it does on
others.

The dynamic nesting depth of function calls inside the timed part of Dhrystone
is low (3-4).  This means that most register-window RISC machines would never
even once overflow/underflow their register windows and be required to
save/restore registers.

This is not to say fast function calls or register windows are inherently bad
(they're not!), merely that this benchmark overstates their performance
effects.

Dhrystone can spend 30-40% of the time in the strcpy function, copying atypi-
cally long (30-character) strings, which happen to be alignable on word boun-
daries, unlike more typical uses.  More realistic programs don't spend this
much time in this sort of code, and when they do, they handle more shorter
strings: 6 characters would be much more typical.  Even odder, the only serious
use of partial-word operations in Dhrystone is in hand-coded routines, not in
compiler-generated code.

On our machines, Dhrystone uses 0-offset addressing for 50% of memory data
references (dynamic).  Most real programs use 0-offsets 10-15% of the time.
This, and the previous effect, make some machines look better on Dhrystone than
they would on more typical programs.  In particular, this benchmark is very
kind to the AMD 29000, as it exercises none of the architectural areas where we
believe the 29000 would lose performance on more realistic programs:

o supports only 0-offsets [Dhrystone uses 0-offsets heavily]

o expensive partial-word load/stores [Dhrystone doesn't use them]

o supports byte-comparison for trailing-zero [useful], but especially helps
  Dhrystone due to atypical use of strings.

Of course, Dhrystone is a fairly small benchmark, and thus fits into almost any
reasonable instruction cache.

In conclusion, Dhrystone gives some indication of user-level integer perfor-
mance, but is susceptible to surprises when comparing amongst architectures
that differ strongly.  Unfortunately, the industry seems to lack a good set of
widely-available integer benchmarks that are as representative as are some of
the popular floating point ones.

4.3.  Stanford Small Integer Benchmarks (STAN INT)

The Computer Systems Laboratory at Stanford University, has collected a set of
programs to compare the performance of various systems.  These benchmarks are
popular in some circles as they are small enough to simulate, and are respon-
sive to compiler optimizations.

It is well known that small benchmarks can be misleading.  In particular, on
the faster machines, the resolution of the shorter benchmarks is really not
very good, i.e., a time shown as ".05" is about 3 clock-ticks.  We definitely
think this benchmark overstates performance.

                   Stanford Small Integer Benchmark Results
Perm Tower Queen Intmm Puzzle Quick Bubble Tree  Geo   Rel.
Secs Secs  Secs  Secs  Secs   Secs  Secs   Secs  Mean  Perf System
2.34 2.30  .94   1.67  11.23  1.12  1.51   2.72  2.14  .7   VAX 11/780 4.3BSD
                                                 1.60   1.0 VAX 11/780@

.63  .63   .27   .73   2.96   .31    .44    .69   .62   2.6 VAX 8600 Ultrix1.2
.75  .95   .30   .40   1.82   .34    .39   1.24   .53   3.0 Sun-3/100
.41  .48   .18   .25   1.09   .20    .23    .70   .36   4.4 Sun-3/200 -O3
.28  .35   .17   .42   2.22   .18    .25    .35   .35   4.6 VAX 8550#

.28  .35   .13   .15    .88   .13    .17    .50   .26   6.2 VAX 8550##
.18  .24   .15   .23   1.15   .17    .19    .34   .26   6.2 MIPS M/500
                                                  .22   7.3 Sun-4/110 [Sun 88]

.12  .16   .11   .13    .61   .10    .12    .22   .16  10.0 MIPS M/800
.11  .17   .09   .15    .55   .10    .12    .20   .15  10.7 Sun-4/200 -O3

.097 .124  .067  .135   .694  .089   .124   .142  .136 11.8 29K+VRAM [AMD 88]
.10  .13   .10   .11    .51   .08    .10    .17   .13  12.3 MIPS M/1000
.096 .118  .077  .089   .458  .072   .092   .164  .118 13.3 MIPS M/120-5
.066 .096  .052  .120   .559  .077   .089   .130  .109 14.7 29K cache [AMD 88]

.065 .078  .045  .059   .303  .048   .060   .108  .075 21.3 M/2000-8

*   The Stanford's old Aggregate Weighting has been replaced by the Geometric
    Mean as a more understandable measure.  We thank people at Sun Microsystems
    for promoting this improvement.  Among other things, it brings the numbers
    closer (down) to relative performance numbers observed on more substantial
    benchmarks.

@   Estimated VAX 11/780 Ultrix 2.0 vcc -O time.  We get this by either of two
    ways:
    (11/780 BSD cc) * (VAX 8550 Ultrix vcc) / (VAX 8550 Ultrix cc)
    2.14 * .26 / .35 = 1.588 (use 1.60)
    The 8550 is rated as approximately a 6-VUP machines, and 6 * .26 = 1.56, so
    the guess is probably close, likely to be within the range 1.50-1.70.

    We estimate this number only because it's been hard for us to get, and we
    don't think this benchmark is crucially important.

#   Ultrix 2.0 cc -O

##  Ultrix 2.0 vcc -O.  The quick and bubble tests actually had errors; how-
    ever, the times were in line with expectations (these two optimize well),
    so we used them.  All 8550 numbers thanks to Greg Pavlov
    (ames!harvard!hscvax!pavlov, of Amherst, NY).

The Sun numbers are from [Muchnick 88], and reflect the latest (as of this
writing) Sun compiler technology, which has improved these numbers for all Sun
systems over the last year.

5.  Floating Point Benchmarks

5.1.  Livermore Fortran Kernels (LLNL DP)

Lawrence Livermore National Labs' workload is dominated by large scientific
calculations that are largely vectorizable.  The workload is primarily served
by expensive supercomputers.  This benchmark was designed for evaluation of
such machines, although it has been run on a wide variety of hardware, includ-
ing workstations and PCs [McMahon86].

The Livermore Fortran Kernels are 24 pieces of code abstracted from the appli-
cations at Lawrence Livermore Labs.  These kernels are embedded in a large,
carefully engineered benchmark driver.  The driver runs the kernels multiple
times on different data sets, checks for correct results, verifies timing accu-
racy, reports execution rates for all 24 kernels, and summarizes the results
with several statistics.

Unlike many other benchmarks, there is no attempt to distill the benchmark
results down to a single number.  Instead all 24 kernel rates, measured in
mflops (million floating point operations per second) are presented individu-
ally for three different vector lengths (a total of 72 results).  The minimum
and maximum rates define the performance range of the hardware.  Various
statistics of the 24 or 72 rates, such as the harmonic, geometric, and arith-
metic means give insight into general behavior.  Any one of these statistics
might suffice for comparisons of scalar machines, but multiple statistics are
necessary for comparisons involving machines with vector or parallel features.
These machines have unbalanced, bimodal performance, and a single statistic is
insufficient characterization.  McMahon asserts:

    ``When the computer performance range is very large the net Mflops rate
    of many Fortran programs and workloads will be in the sub-range between
    the equi-weighted harmonic and arithmetic means depending on the degree
    of  code  parallelism and optimization.  More accurate estimates of cpu
    workload rates depend on assigning appropriate weights for each kernel.

McMahon's analysis goes on to suggest that the harmonic mean corresponds to
approximately 40% vectorization, the geometric mean to approximately 70% vec-
torization, and the arithmetic mean to 90%+ vectorization.  These three statis-
tics can be interpreted as different benchmarks that each characterize certain
applications.  For example, there is fair agreement between the kernels' har-
monic mean and Spice performance.  LINPACK, on the other hand, is better
characterized by the geometric mean.

The complete M/120-5 data shows that MIPS performance is insensitive to vector
length.  The minimum to maximum variation is also small for this benchmark.
Both characteristics are typical of scalar machines with mature compilers.
Performance of vector and parallel machines, on the other hand, may span two
orders of magnitude on this benchmark, or more, depending on the kernel and the
vector length.

                       64-Bit Livermore FORTRAN Kernels
                 MegaFlops, L = 167, Sorted by Geometric Mean
      Harm.  Geom.   Arith.           Rel.*
Min   Mean   Mean     Mean     Max    Geom.               System
 .05    .12    .12      .13      .24     .7  VAX 780 w/FPA 4.3BSD f77 [ours]
 .06    .16    .17      .18      .28    1.0  VAX 780 w/FPA VMS 4.1
 .11    .30    .33      .37      .87    1.9  SUN 3/160 w/FPA

 .20    .42    .46      .50     1.42    2.5  MIPS M/500, f77 1.21
 .17    .43    .48      .53     1.13    2.8  SUN 3/260 w/FPA [our numbers]
 .29    .58    .64      .70     1.21    3.8  Alliant FX/1 FX 2.0.2 Scalar
 .38    .72    .77      .83     1.57    4.5  SUN 4/200 w/FPA [Hough 87]

 .39    .94   1.00     1.04     1.64    5.9  VAX 8700 w/FPA VMS 4.1
 .10    .76   1.06     1.50     5.23    6.2  Alliant FX/1 FX 2.0.2 Vector
 .33    .92   1.06     1.20     2.88    6.2  Convex C-1 F77 V2.1 Scalar
 .52   1.09   1.19     1.30     2.74    7.0  ELXSI 6420 EMBOS F77 MP=1
 .51   1.26   1.37     1.48     2.70    8.1  MIPS M/800, f77 1.21

 .65   1.63   1.83     2.03     3.50   10.8  MIPS M/1000, f77 1.30
 .11   1.06   1.94     3.33    12.79   11.4  Convex C-1 F77 V2.1 Vector
 .80   1.85   2.06     2.27     3.89   12.1  MIPS M/120-5, f77 1.31
 .28   1.24   2.32     5.11    29.20   13.7  Alliant FX/8 FX 2.0.2 MP=8*Vec

 .95   2.75   3.10     3.42     5.82   18.2  MIPS M/2000, f77 1.31

1.0    3.1    3.6      4.0      6.5    21.2  MIPS M/2000, f77 1.40#

1.51   4.93   5.86     7.00    17.43   34.5  Cray-1S CFT 1.4 scalar
1.23   4.74   6.09     7.67    21.64   35.8  FPS 264 SJE APFTN64
3.43   9.29  10.68    12.15    25.89   62.8  Cray-XMP/1 COS CFT77.12 scalar
0.97   6.47  11.94    22.20    82.05   70.2  Cray-1S CFT 1.4 vector
4.47  11.35  13.08    15.20    45.07   76.9  NEC SX-2 SXOS1.21 F77/SX24 scalar

1.47  12.33  24.84    50.18   188     146    Cray-XMP/1 COS CFT77.12 vector
4.47  19.07  43.94   140     1042     258    NEC SX-2 SXOS1.21 F77/SX24 vector


* Relative Performance, as ratio of the Geometric Mean numbers.  This is a
  simplistic attempt to extract a single figure-of-merit.  We admit this goes
  against the intent of this benchmark suite, and apologize to Mr. McMahon, but
  we ran out of space in our summaries.

# Next version of the compiler system, not yet released to production, and
  hence not carried into summaries.  However, this nicely illustrates a case
  where compiler tuning added a VAX 8600's performance, almost.

           Livermore FORTRAN Kernels - Complete MIPS M/120-5 Output
Vendor            MIPS    MIPS    MIPS    MIPS |   MIPS    MIPS    MIPS    MIPS
Model          M/120-5 M/120-5 M/120-5 M/120-5 |M/120-5 M/120-5 M/120-5 M/120-5
OSystem        V.3 3.0 V.3 3.0 V.3 3.0 V.3 3.0 |V.3 3.0 V.3 3.0 V.3 3.0 V.3 3.0
Compiler          1.31    1.31    1.31    1.31 |   1.31    1.31    1.31    1.31
OptLevel            O2      O2      O2      O2 |     O2      O2      O2      O2
Samples             72      24      24      24 |     72      24      24      24
WordSize            64      64      64      64 |     32      32      32      32
DO Span            167      19      90     471 |    167      19      90     471
Year              1988    1988    1988    1988 |   1988    1988    1988    1988
Kernel          ------  ------  ------  ------ | ------  ------  ------  ------
       1        2.8800  2.8800  2.9535  2.9459 | 3.9122  3.9122  3.9142  3.8487
       2        2.2009  2.2009  2.5339  2.5451 | 3.6809  3.6809  3.6518  2.9828
       3        2.8506  2.8506  2.9680  2.9677 | 4.1781  4.1781  4.0510  3.8582
       4        1.8240  1.8240  2.6133  2.9772 | 3.5978  3.5978  3.1571  2.2205
       5        2.0083  2.0083  2.0533  2.0438 | 3.2797  3.2797  3.2766  3.1695
       6        1.3938  1.3938  1.9006  1.9267 | 3.0934  3.0934  2.8727  1.9807
       7        3.8400  3.8400  3.8885  3.8846 | 5.0121  5.0121  4.9773  4.9192
       8        3.5009  3.5009  3.5273  3.5325 | 4.6659  4.6659  4.6961  4.6136
       9        3.5529  3.5529  3.5801  3.5833 | 4.4751  4.4751  4.4476  4.3809
      10        1.4000  1.4000  1.4017  1.4071 | 2.8119  2.8119  2.8116  2.8000
      11        1.4250  1.4250  1.4749  1.4808 | 2.7811  2.7811  2.6947  2.4806
      12        1.4410  1.4410  1.4760  1.5000 | 2.7827  2.7827  2.7007  2.6127
      13        0.8034  0.8034  0.8515  0.8684 | 1.0682  1.0682  1.0604  1.0510
      14        1.3824  1.3824  1.3555  0.9176 | 1.5660  1.5660  1.9144  1.8807
      15        1.0735  1.0735  1.0452  1.0476 | 1.4139  1.4139  1.4085  1.4494
      16        1.5332  1.5332  1.4803  1.5143 | 1.6219  1.6219  1.6097  1.6585
      17        2.6562  2.6562  2.5389  2.5452 | 3.2781  3.2781  3.2841  3.4185
      18        3.2112  3.2112  3.1598  3.1598 | 4.5896  4.5896  4.6200  4.4800
      19        2.8224  2.8224  2.9124  2.8772 | 3.5246  3.5246  3.5479  3.4005
      20        3.3757  3.3757  3.3471  2.6667 | 4.5008  4.5008  4.5147  4.5298
      21        2.0880  2.0880  2.2436  2.2955 | 3.5501  3.5501  3.4517  3.1764
      22        1.5131  1.5131  1.5228  1.5196 | 2.1875  2.1875  2.1782  2.1454
      23        3.1670  3.1670  3.3730  3.3600 | 4.3913  4.3913  4.4064  4.1833
      24        0.9238  0.9238  0.9283  0.9396 | 1.0874  1.0874  1.1057  1.0631
--------------  ------  ------  ------  ------ | ------  ------  ------  ------
Standard  Dev.  0.9189  0.9157  0.9180  0.9208 | 1.1609  1.1514  1.1537  1.1741
Median    Dev.  1.1275  1.1205  1.0009  1.0429 | 1.3376  1.4080  1.3476  1.2294

Maximum   Rate  3.8885* 3.8400  3.8885  3.8846 | 5.0121* 4.9192  4.9773  5.0121
Average   Rate  2.2670* 2.2028  2.2971  2.2711 | 3.1465* 3.0127  3.1814  3.2104
Geometric Mean  2.0628* 2.0030  2.0943  2.0608 | 2.8840* 2.7590  2.9219  2.9371
Median    Rate  2.2222  2.0481  2.3887  2.4203 | 3.2766  3.0762  3.2804  3.4022
Harmonic  Mean  1.8545* 1.8070  1.8856  1.8420 | 2.5773* 2.4784  2.6135  2.6092
Minimum   Rate  0.8034* 0.8034  0.8515  0.8684 | 1.0510* 1.0510  1.0604  1.0682

Maximum   Ratio 1.0000  0.9875  1.0000  0.9989 | 1.0000  0.9814  0.9930  1.0000
Average   Ratio 1.0000  0.9716  1.0132  1.0018 | 1.0000  0.9574  1.0110  1.0203
Geometric Ratio 1.0000  0.9710  1.0152  0.9990 | 1.0000  0.9566  1.0131  1.0184
Harmonic  Mean  1.0000  0.9743  1.0167  0.9932 | 1.0000  0.9616  1.0140  1.0123
Minimum   Rate  1.0000  1.0000  1.0598  1.0809 | 1.0000  1.0000  1.0089  1.0163

* These are the numbers brought forward into the summary section.

5.2.  LINPACK (LNPK DP and LNPK SP)

The LINPACK benchmark has become one of the most widely used single benchmarks
to predict relative performance in scientific and engineering environments.
The usual LINPACK benchmark measures the time required to solve a 100x100 sys-
tem of linear equations using the LINPACK package.  LINPACK results are meas-
ured in MFlops, millions of floating point operations per second.  All numbers
are from [Dongarra 88], unless otherwise noted.

The LINPACK package calls on a set of general-purpose utility routines called
BLAS -- Basic Linear Algebra Subroutines -- to do most of the actual computa-
tion.  A FORTRAN version of the BLAS is available, and the appropriate routines
are included in the benchmark.  However, vendors are encouraged to provide
hand-coded versions of the BLAS as a library package.  Thus LINPACK results are
usually cited in two forms: FORTRAN BLAS and Coded BLAS.  The FORTRAN BLAS
actually come in two forms as well, depending on whether the loops are 4X
unrolled in the FORTRAN source (the usual) or whether the unrolling is undone
to facilitate recognition of the loop as a vector instruction.  According to
the ground rules of the benchmark, either may be used when citing FORTRAN BLAS
results, although it is typical to note rolled loops with the annotation
``(Rolled BLAS).''

For our own numbers, we've corrected a few to follow Dongarra more closely than
we have in the past.  LINPACK output produces quite a few MFlops numbers, and
we've tended to use the fourth one in each group, which uses more iterations,
and thus is more immune to clock randomness.  Dongarra uses the highest MFlops
number that appears, then rounds to two digits.

Note that relative ordering even within families is not particularly con-
sistent, illustrating the extreme sensitivity of these benchmarks to memory
system design.

               100x100 LINPACK Results - FORTRAN and Coded BLAS
                  From [Dongarra 88], Unless Noted Otherwise
  DP      DP     SP      SP
Fortran  Coded Fortran  Coded                       System

  .10      .10    .11     .11  Sun-3/160, 16.7MHz (Rolled BLAS)+
  .11      .11    .13     .11  Sun-3/260,25MHz 68020+20MHz 68881 (Rolled BLAS)+
  .13      .16    .17     .22  DEC MicroVAX II, VAX/VMS
  .14      -      -       -    Apollo DN4000, 25MHz (68020 + 68881) [ENEWS 87]
  .14      -      .24     -    VAX 11/780, 4.3BSD, LLL Fortran [ours]
  .14      .17    .25     .34  VAX 11/780, VAX/VMS
  .20      -      .24     -    80386+80387, 20MHz, 64K cache, GreenHills

  .29      .49    .45     .69  Intergraph IP-32C,30Mz Clipper[Intergraph 86]

  .38      -      .67     -    80386+Weitek 1167,20MHz,64K cache, GreenHills
  .41      .41    .62     .62  Sun-3/160, Weitek FPA (Rolled BLAS)+
  .41      .45    .66     .79  DEC MicroVAX 3200/3500/3600, VAX/VMS
  .45      .54    .60     .74  HP9000 Model 840S [HP 87]
  .46      .46    .86     .86  Sun-3/260, Weitek FPA (Rolled BLAS)+
  .49      .66    .84    1.20  VAX 8600, VAX/VMS 4.5
  .49      .54    .62     .68  HP 9000/825S [HP 87]

  .57      .72    .86     .87  HP9000 Model 850S [HP 87]
  .60      .72    .93    1.2   MIPS M/500, f77 1.21
  .65      .76    .80     .96  VAX 8500, VAX/VMS
  .70      .96   1.3     1.9   VAX 8650, VAX/VMS

  .78      -     1.1      -    IBM 9370-90, VS FORT 1.3.0
  .86      -     1.2      -    Sun-4/110 [Sun 88]
  .99     1.2    1.4     1.7   VAX 8550/8700/8800, VAX/VMS
 1.1      1.1    1.6     1.6   SUN 4/200 (Rolled BLAS)+
 1.2      1.3    2.8     3.6   MIPS M/800, f77 1.31

 1.5      1.7    1.8     2.0   ELXSI 6420
 1.5      1.6    3.5     4.3   MIPS M/1000, f77 1.31
 1.6      2.0    1.6     2.0   Alliant FX-1 (1 CE)
 2.1       -     2.4      -    IBM 3081K H enhanced opt=3
 2.1      2.2    4.0     4.8   MIPS M/120-5, f77 1.31
 2.5!     2.5!    -       -    CCI Power 7/64 (simulation) [Simpson 88]
 3.0      3.3    4.3     4.9   CONVEX C-1/XP, Fort 2.0 (Rolled BLAS)
 3.6      3.9    5.9     7.1   MIPS M/2000-8, f77 1.31
 3.8      4.0    6.6     7.1   MIPS M/2000-8, f77 1.40 # (Rolled BLAS)

 6.0       -      -       -    Multiflow Trace 7/200 Fortran 1.4 (Rolled BLAS)
 7.6     11.0    7.6     9.8   Alliant FX-8, 8 CEs, FX Fortran, v2.0.1.9

12       23     n.a.    n.a.   CRAY 1S CFT (Rolled BLAS)
52       61     n.a.    67     ETA10-E (1 proc, 10.5ns)
56       60     n.a.    n.a.   CRAY X-MP/4 CFT (Rolled BLAS)

+ The Sun FORTRAN Rolled BLAS code appears to be optimal, so we used the same
  numbers for Coded BLAS.  The 4X unrolled numbers for Sun-4/200 are .86 (DP)
  and 1.25 (SP) [Hough 87].

! These numbers were given without specifying FORTRAN or Coded.

# Next version of compiler, not yet released.


                100x100 LINPACK Results - FORTRAN and Coded BLAS
       VAX 11/780, VAX/VMS Relative Performance For A Subset of the Systems
   Rel.    Rel.     Rel.    Rel.
    DP      DP       SP      SP
  Fortran  Coded   Fortran  Coded                      System

      .8      .6      .5       .3   Sun-3/260,25MHz 68020+20MHz 68881 (Rolled)
     1.0     1.0     1.0      1.0   VAX 11/780, VAX/VMS
     2.0     2.9     1.8      2.0   Intergraph IP-32C,30Mz Clipper[Intergraph 86

     2.7     -       2.7      -     80386+Weitek 1167,20MHz,64K cache, GreenHill
     2.9     2.4     2.5      1.8   Sun-3/160, Weitek FPA (Rolled BLAS)
     3.3     2.7     3.4      2.5   Sun-3/260, Weitek FPA (Rolled BLAS)
     3.5     3.9     3.4      3.5   VAX 8600, VAX/VMS 4.5

     4.1     4.2     3.4      2.6   HP9000/850S [HP 87]
     4.3     4.2     3.7      3.5   MIPS M/500, f77 1.21

     6.1     -       4.8      -     Sun-4/110 [Sun 88]
     7.1     7.1     5.6      5.0   VAX 8550/8700/8800, VAX/VMS
     7.9     6.5     6.4      4.7   SUN 4/200 (Rolled BLAS)
     8.6     7.6    11.2     10.6   MIPS M/800, f77 1.31

    10.7     9.0    14.0     12.6   MIPS M/1000, f77 1.31
    11.4    11.8     6.4      5.9   Alliant FX-1 (1 CE)
    15.0    12.9    16.0     14.1   MIPS M/120-5, f77 1.31

    21.4    19.4    17.2     14.4   CONVEX C-1/XP, Fort 2.0 (Rolled BLAS)
    25.7    22.9    23.6     20.9   MIPS M/2000-8, f77 1.31

    54      65      30       28.8   Alliant FX-8, 8 CEs, FX Fortran, v2.0.1.9

   400     353        -       -     CRAY XMP/4

  The following lists various M/2000 MFLOPS numbers.  Note that the numbers
  vary substantially, even on this scalar machine.  Thus, if you're buying
  unlabeled MFLOPS, caveat emptor.

                         Assorted 64-Bit MFLOPS Measures
       Livermore FORTRAN            Gaussian       Matrix
            Kernels               Elimination     Multiply   Peak
      Harm.  Geom. Arith.      Linpk Linpk  1000x  50x50   Multiply
  Min Mean   Mean   Mean  Max  FORT  Coded  1000   Coded     /Add    System
  1.0  3.1    3.6    4.0  6.5   3.8   4.0    7.0    9.1      10.0   MIPS M/2000#

                         Assorted 32-Bit MFLOPS Measures
       Livermore FORTRAN            Gaussian       Matrix
            Kernels               Elimination     Multiply   Peak
      Harm.  Geom. Arith.      Linpk Linpk  1000x  50x50   Multiply
  Min Mean   Mean   Mean  Max  FORT  Coded  1000   Coded     /Add    System
  1.6  4.2    4.8    5.3  8.4   6.6   7.1    9.6    11.8     12.5   MIPS M/2000#

# Next version of compiler (f77 1.40), not yet released.

5.3.  Spice Benchmarks (SPCE 2G6)

Spice [UCB 87] is a general-purpose circuit simulator written at U.C.  Berke-
ley.  Spice and its derivatives are widely used in the semiconductor industry.
It is a valuable benchmark because it shares many characteristics with other
real-world programs that are not represented in popular small benchmarks.  It
uses both integer and floating-point computation heavily.  The floating-point
calculations are not vector oriented, as in LINPACK.  Also, the program itself
is very large and therefore tests both instruction and data cache performance.

We have chosen to benchmark Spice version 2g.6 because of its general availa-
bility.  This is one of the later and more popular Fortran versions of Spice
distributed by Berkeley.  We felt that the circuits distributed with the Berke-
ley distribution for testing and benchmarking were not sufficiently large and
modern to serve as benchmarks.  We gathered and produced appropriate benchmark
circuits that can be distributed, and have since been posted as public domain
on Usenet.  The Spice group at Berkeley found these circuits to be up-to-date
and good candidates for Spice benchmarking.  In the table below, "Geom Mean" is
the geometric mean of the 3 "Rel." columns.

                          Spice2G6 Benchmarks Results
   digsr       bipole    comparator   Geom
  Secs Rel.   Secs Rel.   Secs Rel.   Mean System

1354.0 0.60  439.6 0.68  460.3 0.63    .6  VAX 11/780 4.3BSD, f77 V2.0
 993.5 0.81  394.3 0.76  366.9 0.80    .8  Microvax-II Ultrix 1.1, fortrel
 901.9 0.90  285.1 1.0   328.6 0.89    .9  SUN 3/160 SunOS 3.2 f77 -O -f68881
 848.0 0.95  312.6 0.96  302.9 0.96   1.0  VAX 11/780 4.3BSD, fortrel -opt
 808.1 1.0   299.1 1.0   291.7  1.0   1.0  VAX 11/780 VMS 4.4 /optimize
 744.8 1.1   221.7 1.3   266.0  1.1   1.2  SUN 3/260 SunOS 3.2 f77 -O -f68881
 506.5 1.6   170.0 1.8   189.1  1.5   1.6  SUN 3/160 SunOS 3.2 f77 -O -ffpa

 361.2 2.2   112.0 2.7   129.4  2.3   2.4  SUN 3/260 SunOS 3.2 f77 -O -ffpa
 296.5 2.7    73.4 4.1    83.0  3.5   3.4  MIPS M/500
 225.9 3.6    63.7 4.7    73.4  4.0   4.1  SUN 4/200 f77 -O3 -Qoption as -Ff0+

     -  -        -  -        -   -    5.3  VAX 8700 (estimate)
 136.5 5.9    42.6 7.0    41.4  7.0   6.6  MIPS M/800
 125.5 6.4    39.5 7.6    39.3  7.4   7.1  AMDAHL 470V7 VMSP FORTVS4.1
 114.3 7.1    35.4 8.4    34.5  8.5   8.0  MIPS M/1000
  92.4 8.7    28.5 10.5   29.7  9.8   9.7  MIPS M/120-5
  53.7 15.1   18.5 16.2   17.6 16.6   16.0 MIPS M/2000-8, f77 1.31

  48.0 16.8   12.5 23.9   17.5 16.7   18.9 FPS 20/64 VSPICE (2G6 derivative)

+   Sun numbers are from [Hough 87], who notes that the Sun-4 number was beta
    software, and that a few modules did not optimize.  Thus, these numbers
    should improve.

Benchmark descriptions:

digsr    CMOS 9 bit Dynamic shift register with parallel load capability, i.e.,
         SISO (Serial Input Serial Output) and PISO (Parallel Input Serial Out-
         put), widely used in microprocessors.  Clock period is 10 ns.  Channel
         length = 2 um, Gate Oxide = 400 Angstrom.  Uses MOS LEVEL=2.

bipole   Schottky TTL edge-triggered register used as a synchronizer.

comparator
         Analog CMOS auto-zeroed comparator, composed of Input, Differential
         Amplifier and Latch.  Input signal is 10 microvolts.  Channel Length =
         3 um, Gate Oxide = 500 Angstrom.  Uses MOS LEVEL=3.  Each part is con-
         nected by capacitive coupling, which is often used for the offset can-
         cellation.  (Sometimes called Toronto, in honor of its source).

Hspice is a commercial version of Spice offered by Meta-Software, which
recently published benchmark results for a variety of machines [Meta-software
87].  (Note that the M/800 number cited there was before the UMIPS-BSD 2.1 and
f77 1.21 releases, and the numbers have improved).  The VAX 8700 Spice number
(5.3X) was estimated by using the Hspice numbers below for 8700 and M/800, and
the M/800 Spice number:
(5.5: 8700 Hspice) / (6.9: M/800 Hspice) X (6.6: M/800 Spice) yields 5.3X.

This section indicates that the performance ratios seem to hold for at least
one important commercial version as well.

                           Hspice Benchmarks Results
                                 HSPICE-8601K
              S2T30
 Secs                         Rel.                          System

166.5                           .6                          VAX 11/780, 4.2BSD
92.2                           1.0                          VAX 11/780 VMS
91.5                           1.0                          Microvax-II VMS

29.2                           3.2                          ELXSI 6400
29.1                           3.2                          Alliant FX/1
25.3                           3.6                          HyperSPICE (EDGE)

16.8                           5.5                          VAX 8700 VMS
16.3                           5.7                          IBM 4381-12

13.4                           6.9                          MIPS M/800 [ours]
11.3                           8.2                          MIPS M/1000 [ours]
 8.7                          10.6                          MIPS M/120-5 [ours]

 5.3                          17.4                          MIPS M/2000 [ours]

 3.27                         28.2                          IBM 3090
 2.71                         34.0                          CRAY-1S

Again, as in the less-vectorizable Livermore Kernels, the M/120-5 performs
about 30% as fast as a CRAY-1S, and the M/2000 50%.

Spice and Hspice are examples of large programs where the M/2000 outperforms
the M/120 by more than the clock ratio of 1.5X, illustrating the effects of a
more block-oriented memory system.

5.4.  Digital Review

The Digital Review magazine benchmark [DR 87] is a 3300-line FORTRAN program
that includes 33 separate tests, mostly floating-point, some integer.  The
magazine reports the times for all tests, and summarizes them with the
geometric mean seconds shown below.  Most numbers below are from [DR 87].  Note
that Digital Review gives relative performance using the MicroVax II as a basis
for comparison (MVUPS).  For consistency with the rest of this document, we use
the VAX 11/780, which significantly affects the ratios.

Digital Review has substantially revised their benchmark to fix various odd and
unrepresentative behaviors, such as having many of its tests dominated by the
time to call a large initialization routine.  Many of these conspire to show
lower VUPs ratings than is typical for machines running real programs.  See the
October 10, 1988 Digital Review for detail.  We leave the table below for the
time being, but do not ascribe much weight to it, and will shift to the new,
more realistic one shortly.  We applaud DR for several reasons.  First, they
try to offer some useful benchmarks in place of empty mips-ratings, which is
more than many magazines do.  Second, they are willing to listen to input and
improve the usefulness of their benchmarks.

We believe that recent Sun compiler work has improved the Sun-4s' performance,
but we do not yet have those numbers for sure, although the number shown is
reasonably consistent.

                 Digital Review Benchmarks Results (33 Tests)
 Secs            Rel.            System

9.17             0.7             VAXstation II/GPX, VMS 4.5
6.75             1.0             VAX 11/780, VMS [DEC], 6.80 [ours]
2.90             2.3             VAXstation 3200
2.32             2.9             VAX 8600, VMS 4.5
2.32             2.9             Sun-4/110 [Sun 88]
2.09             3.2             Sun-4/200, SunOS 3.2L [OLD]
1.86             3.6             MIPS M/500, f77 1.21 [ours]
1.72             3.9             Sun-4/200 3.2 Prod (secondhand, not confirmed)

1.584            4.2             VAX 8650
1.480            4.6             Alliant FX/8, 1 CE
1.469            4.6             VAX 8700
1.200            5.6             MIPS M/800, f77 1.21 [ours]
1.193            5.7             ELXSI 6420

 .990            6.8             MIPS M/1000, f77 1.21*
 .940            7.2             MIPS M/1000, f77 1.31 [ours]
 .783            8.6             MIPS M/120-5 [ours]
 .553            12.2            MIPS M/2000 [ours]

 .487            18.8            Convex C-1 XP

* The actual run number was .99, which [DR 87] reported as 1.00.

5.5.  Doduc Benchmark (DDUC)

This benchmark [Doduc 86] is a 5300-line FORTRAN program that simulates aspects
of nuclear reactors.  It has little vectorizable code, and is thought to be
representative of Monte-Carlo simulations.  The program is offered in both sin-
gle and double precision.  The original goal of using this piece of code as a
benchmark was to offer a rapid check on the good behavior of the compiler and
intrinsic functions.  In addition it can be used as a pure CPU benchmark with
an unusually high floating point percentage.  Some caveats are necessary.

This simulation iterates until certain conditions are met.  The number of bits
in the floating point format, the rounding algorithm, and the accuracy of math
libraries on different machines all affect the number of iterations required to
converge.

More ``accurate'' machines seem to require fewer iterations to converge, and
double precision seems to converge faster than single precision, although there
is no rigorous proof for either idea.  As a consequence, one would have to
scale the timing results for a fixed number of iterations to compare the timing
between different machines.  Fortunately the time required for each iteration
is constant during the run and the variation of the total number of iterations
to convergence varies very little (about 2 percent as measured on 10 different
machines).  Refer to the author for more in-depth discussions.

Observed total number of iterations to converge:

        Single precision:  5881 (Sun_68881) to 6010 (CCI-6/32) (M1000=5906)
        Double precision:  5408 (Edge1)  to 5492 (CCI-6/32) (M1000=5479)

Performance is given as a number R, normalized to 100 (IBM 370/168-3) or 170
(IBM 3033-U):

        [ R = 48671/(Cpu_time_in_seconds) ]

Larger R's are better, and Cpu_time_in_seconds is the 64-bit version.

                        64-Bit Doduc Benchmark Results
DoDuc R      Relative
Factor        Perf.        System
     17         0.7        Sun-3/110, 16.7MHz
     19         0.7        Intel 80386+80387, 16MHz, iRMX
     22         0.8        Sun-3/260, 25MHz 68020, 20MHz 68881
     26         1.0        VAX 11/780, VMS
     33         1.3        Fairchild Clipper, 30MHz, Green Hills

     43         1.7        Sun-3/260, 25MHz, Weitek FPA
     48         1.8        Celerity C1260
     50         1.9        CCI Power 6/32
     53         2.0        Edge 1
     64         2.5        Harris HCX-7

     85         3.3        Alliant FX/1
     88         3.4        MIPS M/500, f77 1.21 -O2, runs 553 seconds
     90         3.5        IBM 4381-2
     90         3.5        Sun-4/200 [Hough 1987], SunOS 3.2L, runs 540 seconds
     91         3.5        DEC VAX 8600, VAX/VMS
     97         3.7        ELXSI 6400
     99         3.8        DG MV/20000
    100         3.8        MIPS M/500, f77 1.21 -O3, runs 488 seconds
    101         3.9        Alliant FX/8

    113         4.3        FPSystems 164
    119         4.6        Gould 32/8750
    129         5.0        DEC VAX 8650
    136         5.2        DEC VAX 8700, VAX/VMS

    150         5.7        Amdahl 470 V8, VM/UTS
    181         7.0        IBM 3081-G, F4H ext, opt=2
    190         7.3        MIPS M/800, f77 1.21 -O3, runs 256 secs
    201         7.7        HP 9000/850 [Nhuan Doduc, e-mail, 10/16/88]

    214         8.2        HP 9000/835 [Nhuan Doduc, e-mail, 10/16/88]
    227         8.7        MIPS M/1000, f77 1.31 -O3, runs 214(178) secs
    236         9.1        IBM 3081-K
    280        10.8        M120-5, f77 1.31 -O2, runs 173(148) secs
    289        11.1        M120-5, f77 1.31 -O3, runs 168(144) secs
    291        11.2        Apollo DN10000, runs 167 seconds

    438        16.8        M2000, f77 1.31 -O2, runs 111(94) secs
    443        17.0        M2000, f77 1.31 -O3, runs 109(93)
    475        18.3        Amdahl 5860
    586        22.5        CDC Cyber 990-E, Fortv2/opt=high/vector

    714        27.5        IBM 3090-200, scalar mode
    915        35.2        Fujitsu VP-200

   1080        41.6        Cray X/MP [for perspective: ... long way to go yet!]
                           [Oct 88: well, it's not as long as it was last year]

5.6.  Whetstone

Whetstone is a synthetic mix of floating point and integer arithmetic, function
calls, array indexing, conditional jumps, and transcendental functions [Curnow
76].

Whetstone results are measured in KWips, thousands of Whetstone interpreter
instructions per second.  On machines this fast, relatively few clock ticks are
actually counted, and UNIX timing includes some variance.  We increased the
loop counts from 10 to 1000 to increase the total running time to reduce the
variance.  Our experiences show some general uncertainty about the numbers
reported by anybody, as source code versions differ.

                          Whetstone Benchmark Results
  DP    DP   SP    SP
KWips  Rel.Kwips  Rel. System
   410  0.5   500  0.4 VAX 11/780, 4.3BSD, f77 [ours]
   715  0.9 1,083  0.9 VAX 11/780, LLL compiler [ours]
   830  1.0 1,250  1.0 VAX 11/780 VAX/VMS [Intergraph 86]
   924  1.1 1,039  0.8 Sun-3/160C, 68881 [Wilson 88]
 1,230  1.5 1,250  1.0 Sun-3/260, 25MHz 68020, 20MHz 68881

 1,581  1.9 1,886  1.5 Apollo DN4000, 25MHz 68020, 25MHz 68881 [Wilson 88]
 1,730  2.1 1,860  1.5 Intel 80386+80387, 20MHz, 64K cache, GreenHills
 1,740  2.1 2,980  2.4 Intergraph InterPro-32C, 30MHz Clipper [Intergraph 86]
 1,863  2.2 2,433  1.9 Sun-3/160, FPA [Wilson 88]

 2,092  2.5 3,115  2.5 HP 9000/840S [HP 87]
 2,433  2.9 3,521  2.8 HP 9000/825S [HP 87]
 2,590  3.1 4,170  3.3 Intel 80386+Weitek 1167, 20MHz, Green Hills
 2,673  3.2 3,569  2.9 Sun-3/260, Weitek FPA [Wilson 1988]
 2,670  3.2 4,590  3.7 VAX 8600, VAX/VMS [Intergraph 86]
 2,907  3.5 4,202  3.4 HP 9000/850S [HP 87]
 2,940  3.5 4,215  3.4 Sun-4/110 [Sun 88]

 3,885  4.7 5,663  4.5 Sun-4/200 [Wilson 1988]
 3,950  4.8 6,670  5.3 VAX 8700, VAX/VMS, Pascal(?) [McInnis, 1987]
 4,000  4.8 6,900  5.5 VAX 8650, VAX/VMS [Intergraph 86]
 4,120  5.0 4,930  3.9 Alliant FX/8  (1 CE) [Alliant 86]
 4,200  5.1     -  -   Convex C-1 XP [Multiflow]
 4,220  5.1 5,430  4.3 MIPS M/500
 4,400  5.315,000 12.0 Motorola 88000 [RISC Mgmt 88, Simpson 88]

 6,600  8.0     -  -   HP 835S [RISC Mgmt 88]
 6,930  8.0 8,570  6.9 MIPS M/800
 7,960  9.610,280  8.2 MIPS M/1000
 9,100 11.011,400  9.1 MIPS M/120-5
12,605 15.2     -  -   Multiflow Trace 7/200 [Multiflow]
13,600 16.417,300 13.8 MIPS M/2000-8
14,069 17.0     -  -   CCI Power 7/64 (simulation) [Simpson 88]
16,300 19.620,500 16.4 MIPS M/2000-8, f77 1.31, -O4 (inlining, not quite fair!)

25,000 30       -  -   IBM 3090-200 [Multiflow]
35,000 42       -  -   Cray X-MP/12

6.  Acknowledgements

Some people have noted that they seldom believe the numbers that come from cor-
porations, unless accompanied by names of people who take responsibility for
the numbers.  Many people at MIPS have contributed to this document.  Particu-
lar contributors to this issue include Earl Killian, Mark Johnson, Dr. James
Mannos, and Pat LeFevre.  As usual, the editor, John Mashey, is finally respon-
sible for all of the numbers.

We thank Cliff Purkiser of Intel, who posted the Intel 80386 Whetstone and LIN-
PACK numbers on Usenet.

We also thank Greg Pavlov, who ran hordes of Stanford and Dhrystone benchmarks
for us on a VAX 8550, Ultrix 2.0 system.

7.  References

[Alliant 86]
   Alliant Computer Systems Corp, "FX/Series Product Summary", October 1986.

[AMD 88]
   Advanced Micro Devices, "Am29000 Performance Analysis", May 1988.

[Curnow 76]
   Curnow, H. J., and Wichman, B. A., ``A Synthetic Benchmark'', Computing
   Journal, Vol. 19, No. 1, February 1976, pp. 43-49.

[Doduc 87]
   Doduc, N., FORTRAN Central Processor Time Benchmark, Framentec, June 1986,
   Version 13.  Newer numbers were received 03/17/87, and we used them where
   different.
   E-mail: uunet!inria!ftc!ndoduc

[Dongarra 88]
   Dongarra, J., ``Performance of Various Computers Using Standard Linear Equa-
   tions in a Fortran Environment'', Argonne National Laboratory, February 16,
   1988.

[Dongarra 87b]
   Dongarra, J., Marin, J., Worlton, J., "Computer Benchmarking: paths and pit-
   falls", IEEE Spectrum, July 1987, 38-43.

[DR 87]
   "A New Twist: Vectors in Parallel", June 29, 1987, "The M/1000: VAX 8800
   Power for Price of a MicroVAX II", August 24, 1987, and "VAXstation 3200
   Benchmarks: CVAX Eclipses MicroVAX II", September 14, 1987.  Digital Review,
   One Park Ave., NY, NY 10016.

[DR 88]
   "RISC-Based Systems Shatter the 10-MIPS Threshold", and "Widening the Lead",
   Digital Review, May 16, 1988.

[ENEWS 87]
   Electronic News, ``Apollo Cuts Prices on Low-End Stations'', July 6, 1987,
   p. 16.

[Fleming 86]
   Fleming, P.J. and Wallace, J.J.,``How Not to Lie With Statistics: The
   Correct Way to Summarize Benchmark Results'', Communications of the ACM,
   Vol. 29, No. 3, March 1986, 218-221.

[HP 87]
   Hewlett Packard, ``HP 9000 Series 800 Performance Brief'', 5954-9903, 5/87.
   (A comprehensive 40-page characterization of 825S, 840S, 850S).

[Hough 86,1]
   Hough, D., ``Weitek 1164/5 Floating Point Accelerators'', Usenet, January
   1986.

[Hough 86,2]
   Hough, D., ``Benchmarking and the 68020 Cache'', Usenet, January 1986.

[Hough 86,3]
   Hough, D., ``Floating-Point Programmer's Guide for the Sun Workstation'',
   Sun Microsystems, September 1986. [an excellent document, including a good
   set of references on IEEE floating point, especially on micros, and good
   notes on benchmarking hazards].  Sun-3/260 Spice numbers are from later
   mail.

[Hough 87]
   Hough, D., ``Sun-4 Floating-Point Performance'', Usenet, 08/04/87.

[IBM 87]
   IBM, ``IBM RT Personal Computer (RT PC) New Models, Features, and Software
   Overview, February 17, 1987.

[Intergraph 86]
   Intergraph Corporation, ``Benchmarks for the InterPro 32C'', December 1986.

[Meta-Software 87]
   Meta-Software, ``HSPICE Performance Benchmarks'', June 1987.  50 Curtner
   Avenue, Suite 16, Campbell, CA 95008.

[McInnis 87]
   McInnis, D., Kusik, R., Bhandarkar, D., ``VAX 8800 System Overview'', Proc.
   IEEE COMPCON, March 1987, San Francisco, 316-321.

[McMahon 86]
   ``The Livermore Fortran Kernels: A Computer Test of the Numerical Perfor-
   mance Range'', December 1986, Lawrence Livermore National Labs.

[MIPS 87]
   MIPS Computer Systems, "A Sun-4 Benchmark Analysis", and "RISC System Bench-
   mark Comparison: Sun-4 vs MIPS", July 23, 1987.

[Muchnick 88]
   Muchnick, S.S., "Optimizing Compilers for SPARC", SunTechnology, Summer
   1988, Sun Microsystems.

[Purkiser 87]
   Purkiser, C., ``Whetstone and LINPACK Numbers'', Usenet, March 1987.

[Richardson 87]
   Richardson, R., ``9/20/87 Dhrystone Benchmark Results'', Usenet, Sept. 1987.
   Rick publishes the source several times a year.  E-mail address:
   ...!seismo!uunet!pcrat!rick

[Serlin 87a]
   Serlin, O., ``MIPS, DHRYSTONES, AND OTHER TALES'', Reprinted with revisions
   from SUPERMICRO Newsletter, April 1986, ITOM International, P.O. Box 1450,
   Los Altos, CA 94023.
   Analyses on the perils of simplistic benchmark measures.

[Serlin 87b]
   Serlin, O., SUPERMICRO #69, July 31, 1987. pp. 1-2.
   Offers good list of attributes customers should demand of vendor benchmark-
   ing.

[Simpson 88]
   Simpson, David, "OEMS Cheer Motorola's 88000", Mini-Micro Systems, August
   1988, 83-91.  (Note that LINPACK numbers were not specified as FORTRAN or
   Coded, and that no configuration information is given; LINPACK numbers can
   be heavily influenced by cache sizes, so the published numbers are difficult
   to calibrate.  Also, no 64-bit Whetstone numbers are provided.)

[Stahlman 87]
   Stahlman, M., "The Myth of Price/performance", Sanford C. Bernstein & Co,
   Inc, NY, NY, March 17, 1987.

[Sun 86]
   SUN Microsystems, ``The SUN-3 Family: A Hardware Overview'', August 1986.

[Sun 87]
   SUN Microsystems, SUN-4 Product Introduction Material, July 7, 1987.

[Sun 88]
   SUN Microsystems, ``Sun-4/110 Preliminary Benchmark Results'', WSD Perfor-
   mance Group, 01/28/88.

[UCB 87]
   U. C. Berkeley, CAD/IC group, ``SPICE2G.6'', March 1987. Contact: Cindy
   Manly, EECS/ERL Industrial Liason Program, 479 Cory Hall, University of Cal-
   ifornia, Berkeley, CA 94720.

[Weicker 84]
   Weicker, R. P., ``Dhrystone: A Synthetic Systems Programming Benchmark'',
   Communications of the ACM, Vol. 27, No. 10, October 1984, pp.  1013-1030.

[Wilson 88]
   Wilson, David, "The Sun 4/260 RISC-Based Technical Workstation", UNIX Review
   6, 7 (July 1988), 91-101.
________
RISComputer is a trademark of MIPS Computer Systems.  UNIX is a Registered
Trademark of AT&T.  DEC, VAX, Ultrix, and VAX/VMS are trademarks of Digital
Equipment Corp.  Sun-3, Sun-4 are Trademarks on Sun Microsystems.  Many others
are trademarks of their respective companies.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  m...@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086