Jolitz 386BSD-0.1 -- floating point perform

Path: sparky!uunet!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!
sdd.hp.com!mips!darwin.sura.net!uvaarpa!cv3.cv.nrao.edu!laphroaig!cflatter
From: cfla...@nrao.edu (Chris Flatters)
Newsgroups: comp.unix.bsd
Subject: Re: Jolitz 386BSD-0.1 -- floating point perform
Message-ID: <1992Jul22.152854.27730@nrao.edu>
Date: 22 Jul 92 15:28:54 GMT
References: <l6qc51INN1gu@neuro.usc.edu>
Sender: ne...@nrao.edu
Reply-To: cfla...@nrao.edu
Organization: NRAO
Lines: 22

In article l6qc51...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes:
>I have most of the US Army BRLCAD three dimensional CSG modeling and
>distributed ray tracing system ported to the Jolitz 386BSD-0.1.  But,
>I am getting only about one fifth of the floating point performance
>previously measured using AT&T pcc and GNU gcc 1.4x on ATT UNIX SYSV.
>
>Does the compiler default to '387 emulation?  Is there some flag which
>needs to be set to actually use the coprocessor?  Or are there reasons
>386BSD-0.1 would exhibit relatively poor floating point performance?

The problem is that there is a mismatch between gcc 1.4 and Intel coprocessors.
gcc expects floating-point registers while the 80x87s have a stack.  This
leads to a fairly large performance hit.  gcc 2.x can produce optimised code
for the 80x87.  Is anyone working on porting gcc 2.x to 386BSD? Come to
think of it, is there anything that needs to be done to do this (wouldn't
the BSD/386 configuration work)?

NB: the coprocessor shows up as device psx0 during the bootstrap sequence.  If
this doesn't show up 386BSD thinks you don't have a coprocessor.

	Chris Flatters
	cfla...@nrao.edu

Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!torvalds
From: torv...@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups: comp.unix.bsd
Subject: Re: Jolitz 386BSD-0.1 -- floating point perform
Message-ID: <1992Jul23.010341.22292@klaava.Helsinki.FI>
Date: 23 Jul 92 01:03:41 GMT
References: <l6qc51INN1gu@neuro.usc.edu> <1992Jul22.152854.27730@nrao.edu>
Organization: University of Helsinki
Lines: 22

In article <1992Jul22.1...@nrao.edu> cfla...@nrao.edu writes:
> [ deleted ]  Is anyone working on porting gcc 2.x to 386BSD? Come to
>think of it, is there anything that needs to be done to do this (wouldn't
>the BSD/386 configuration work)?

One thing that migth be a problem with porting gcc-2.2.2 to 386BSD is
that gcc-2.2.2 has added the 'fsqrt' command to the list of floating
point instructions that gcc can create code for - and I don't know if
the 386bsd math emulator emulates that particular command yet.  I've
already written the code (it's in linux), but it wasn't available back
when people ported the emulator to 386bsd. 

If 386bsd 0.1 does indeed use the linux math-emulator (I haven't even
checked: maybe they found something better) it shouldn't be difficult to
add the fsqrt support in there (the linux math-emulator may not be fast
or complete, but it's simple and relatively modular).  The code can be
gotten from any linux site: while the linux source is generally
copylefted, the math-code can be used freely for 386bsd (but /only/ for
386bsd - if you want to use it for something else and cannot accept the
copyleft conditions, contact me). 

		Linus

Newsgroups: comp.unix.bsd
Path: sparky!uunet!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!wupost!
darwin.sura.net!uvaarpa!cv3.cv.nrao.edu!laphroaig!cflatter
From: cfla...@nrao.edu (Chris Flatters)
Subject: Re: Jolitz 386BSD-0.1 -- floating point perform
Message-ID: <1992Jul24.161646.22896@nrao.edu>
Sender: ne...@nrao.edu
Reply-To: cfla...@nrao.edu
Organization: NRAO
References: <l6qc51INN1gu@neuro.usc.edu>
Date: Fri, 24 Jul 1992 16:16:46 GMT
Lines: 50

In article l6qc51...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes:
>I have most of the US Army BRLCAD three dimensional CSG modeling and
>distributed ray tracing system ported to the Jolitz 386BSD-0.1.  But,
>I am getting only about one fifth of the floating point performance
>previously measured using AT&T pcc and GNU gcc 1.4x on ATT UNIX SYSV.
>
>Does the compiler default to '387 emulation?  Is there some flag which
>needs to be set to actually use the coprocessor?  Or are there reasons
>386BSD-0.1 would exhibit relatively poor floating point performance?

I ran some checks last night and 386BSD is certainly exploiting the coprocessor.
These are the results from the Plum2 benchmark (See section 8.2 of "C++
Programming Guidelines" by Thomas Plum and Dan Saks.  The results are
the average time for a register int, auto short, auto long and auto float
operation and the average time to call and return from an empty function.
Times are in nominal milliseconds (CLOCKS_PER_SEC was missing from <time.h>
so I guessed a value of 100 --- I now think that it should have been 60.
The tests were performed on a CompuAdd 325s (25MHz 80387SX CPU) with a
Cyrix 83S87 FasMath coprocessor.

                       register      auto      auto  function      auto
                            int     short      long  call+ret    double
            386BSD gcc    0.178     0.448     0.474      1.62      4.94  
         386BSD gcc -O    0.159     0.207     0.159      1.75      3.37  

The ration of floating-point time to auto long is 21.2 (with optimization)
which is in the correct ball park for a 386SX/387SX system but a little
on the long size.

As a control, I made a copy of the dist.fs disk with a compiled version of
bench2 on it and booted it on my portable: a 16 MHz 80386SX system without
a coprocessor.  The results were

                       register      auto      auto  function      auto
                            int     short      long  call+ret    double
         386BSD gcc -O    0.240     0.317     0.242      2.32       346  

Note that the ratio of of f-p time to auto long is now 1429.8 --- in other
words emulation is more than 60 times slower than the coprocessor.  Unless
BRLCAD uses very little floating-point I believe that the coprocessor is
active on Alexander-James Annala's machine too (If Alexander-James wants to
try these tests I'll send him the source code if he drops me a line).

For final comparison, I have some old figures from Linux with gcc 2.1.
Using the register int time to place the results on the same scale as
the 25MHz results above the mean time for a f-p operation was 2.09 usec
without optimization and 0.936 usec at -O1 and above.

	Chris Flatters
	cfla...@nrao.edu

Newsgroups: comp.unix.bsd
Path: sparky!uunet!darwin.sura.net!uvaarpa!cv3.cv.nrao.edu!laphroaig!cflatter
From: cfla...@nrao.edu (Chris Flatters)
Subject: Re: 386BSD-0.1/BRLCAD4.0 benchmark -- poor floa
Message-ID: <1992Jul29.152914.2508@nrao.edu>
Sender: ne...@nrao.edu
Reply-To: cfla...@nrao.edu
Organization: NRAO
References: <l7ctu0INN880@neuro.usc.edu>
Date: Wed, 29 Jul 1992 15:29:14 GMT
Lines: 21

In article l7ctu0...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes:
>It looks like the floating point processor is used for simple floating
>point operations (+, -, /, *) but not for higher functions -- these go
>to /usr/src/lib/libm/ieee/support.c where they get emulated (ie _sqrt)
>very slowly.  Perhaps someone has a redistributable i386 library which
>could be plugged in place of the default 4.3BSD mathematics library.
>
>This is likely why BRLCAD 4.0 runs correctly but very slowly at the
>present time.  Any help or suggestions which would fix this problem
>would be very much appreciated.  
>
>Does gcc 2.2 actually use 80387 inline code for transcedental functions?

No. I just looked :-(.  (Unless it is very well hidden).

It could probably be made to though.  There appears to be some support for
inlining transcendental functions for the Moto 68881.

	Chris Flatters
	cfla...@nrao.edu

Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!torvalds
From: torv...@klaava.Helsinki.FI (Linus Benedict Torvalds)
Newsgroups: comp.unix.bsd
Subject: Re: 386BSD-0.1/BRLCAD4.0 benchmark -- poor floa
Message-ID: <1992Jul29.222708.4315@klaava.Helsinki.FI>
Date: 29 Jul 92 22:27:08 GMT
References: <l7ctu0INN880@neuro.usc.edu> <1992Jul29.152914.2508@nrao.edu>
Organization: University of Helsinki
Lines: 31

In article <1992Jul29....@nrao.edu> cfla...@nrao.edu writes:
>In article l7ctu0...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes:
>>
>>Does gcc 2.2 actually use 80387 inline code for transcedental functions?
>
>No. I just looked :-(.  (Unless it is very well hidden).

gcc-2.2.2 only uses the basic math + fsqrt (fsqrt is new to gcc-2.x). 
The basic math is handled pretty well, though: much better than with
earlier versions.  gcc-2 knows about the fpu stack, resulting in fpu
code that looks almost hand-optimized (at least sometimes).  gcc-1.40
isn't very good at handling the 387 stack (it seems to use mostly
ready-made templates). 

>It could probably be made to though.  There appears to be some support for
>inlining transcendental functions for the Moto 68881.

I don't think inlining is as easy on the 387 as it's on a 68881: the 387
transcendental functions have enough special cases (argument limitations
etc) that it's probably easier to do it in a dedicated function.  But
those functions should certainly use the 387 instructions instead of
using series arithmetic or whatever portable C code the current 386bsd
libraries seem to use. 

The linux libraries might be useful: linux has two different math
libraries, one for soft-float (ie using only the normal arithmetic
functions that are emulated), and one for hard-float.  With shared
libraries (under linux), the same binary can use either, but under
386bsd you'd have to decide at link-time which to use. 

		Linus