Path: sparky!uunet!usc!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu! sdd.hp.com!mips!darwin.sura.net!uvaarpa!cv3.cv.nrao.edu!laphroaig!cflatter From: cfla...@nrao.edu (Chris Flatters) Newsgroups: comp.unix.bsd Subject: Re: Jolitz 386BSD-0.1 -- floating point perform Message-ID: <1992Jul22.152854.27730@nrao.edu> Date: 22 Jul 92 15:28:54 GMT References: <l6qc51INN1gu@neuro.usc.edu> Sender: ne...@nrao.edu Reply-To: cfla...@nrao.edu Organization: NRAO Lines: 22 In article l6qc51...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes: >I have most of the US Army BRLCAD three dimensional CSG modeling and >distributed ray tracing system ported to the Jolitz 386BSD-0.1. But, >I am getting only about one fifth of the floating point performance >previously measured using AT&T pcc and GNU gcc 1.4x on ATT UNIX SYSV. > >Does the compiler default to '387 emulation? Is there some flag which >needs to be set to actually use the coprocessor? Or are there reasons >386BSD-0.1 would exhibit relatively poor floating point performance? The problem is that there is a mismatch between gcc 1.4 and Intel coprocessors. gcc expects floating-point registers while the 80x87s have a stack. This leads to a fairly large performance hit. gcc 2.x can produce optimised code for the 80x87. Is anyone working on porting gcc 2.x to 386BSD? Come to think of it, is there anything that needs to be done to do this (wouldn't the BSD/386 configuration work)? NB: the coprocessor shows up as device psx0 during the bootstrap sequence. If this doesn't show up 386BSD thinks you don't have a coprocessor. Chris Flatters cfla...@nrao.edu
Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!torvalds From: torv...@klaava.Helsinki.FI (Linus Benedict Torvalds) Newsgroups: comp.unix.bsd Subject: Re: Jolitz 386BSD-0.1 -- floating point perform Message-ID: <1992Jul23.010341.22292@klaava.Helsinki.FI> Date: 23 Jul 92 01:03:41 GMT References: <l6qc51INN1gu@neuro.usc.edu> <1992Jul22.152854.27730@nrao.edu> Organization: University of Helsinki Lines: 22 In article <1992Jul22.1...@nrao.edu> cfla...@nrao.edu writes: > [ deleted ] Is anyone working on porting gcc 2.x to 386BSD? Come to >think of it, is there anything that needs to be done to do this (wouldn't >the BSD/386 configuration work)? One thing that migth be a problem with porting gcc-2.2.2 to 386BSD is that gcc-2.2.2 has added the 'fsqrt' command to the list of floating point instructions that gcc can create code for - and I don't know if the 386bsd math emulator emulates that particular command yet. I've already written the code (it's in linux), but it wasn't available back when people ported the emulator to 386bsd. If 386bsd 0.1 does indeed use the linux math-emulator (I haven't even checked: maybe they found something better) it shouldn't be difficult to add the fsqrt support in there (the linux math-emulator may not be fast or complete, but it's simple and relatively modular). The code can be gotten from any linux site: while the linux source is generally copylefted, the math-code can be used freely for 386bsd (but /only/ for 386bsd - if you want to use it for something else and cannot accept the copyleft conditions, contact me). Linus
Newsgroups: comp.unix.bsd Path: sparky!uunet!cis.ohio-state.edu!zaphod.mps.ohio-state.edu!wupost! darwin.sura.net!uvaarpa!cv3.cv.nrao.edu!laphroaig!cflatter From: cfla...@nrao.edu (Chris Flatters) Subject: Re: Jolitz 386BSD-0.1 -- floating point perform Message-ID: <1992Jul24.161646.22896@nrao.edu> Sender: ne...@nrao.edu Reply-To: cfla...@nrao.edu Organization: NRAO References: <l6qc51INN1gu@neuro.usc.edu> Date: Fri, 24 Jul 1992 16:16:46 GMT Lines: 50 In article l6qc51...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes: >I have most of the US Army BRLCAD three dimensional CSG modeling and >distributed ray tracing system ported to the Jolitz 386BSD-0.1. But, >I am getting only about one fifth of the floating point performance >previously measured using AT&T pcc and GNU gcc 1.4x on ATT UNIX SYSV. > >Does the compiler default to '387 emulation? Is there some flag which >needs to be set to actually use the coprocessor? Or are there reasons >386BSD-0.1 would exhibit relatively poor floating point performance? I ran some checks last night and 386BSD is certainly exploiting the coprocessor. These are the results from the Plum2 benchmark (See section 8.2 of "C++ Programming Guidelines" by Thomas Plum and Dan Saks. The results are the average time for a register int, auto short, auto long and auto float operation and the average time to call and return from an empty function. Times are in nominal milliseconds (CLOCKS_PER_SEC was missing from <time.h> so I guessed a value of 100 --- I now think that it should have been 60. The tests were performed on a CompuAdd 325s (25MHz 80387SX CPU) with a Cyrix 83S87 FasMath coprocessor. register auto auto function auto int short long call+ret double 386BSD gcc 0.178 0.448 0.474 1.62 4.94 386BSD gcc -O 0.159 0.207 0.159 1.75 3.37 The ration of floating-point time to auto long is 21.2 (with optimization) which is in the correct ball park for a 386SX/387SX system but a little on the long size. As a control, I made a copy of the dist.fs disk with a compiled version of bench2 on it and booted it on my portable: a 16 MHz 80386SX system without a coprocessor. The results were register auto auto function auto int short long call+ret double 386BSD gcc -O 0.240 0.317 0.242 2.32 346 Note that the ratio of of f-p time to auto long is now 1429.8 --- in other words emulation is more than 60 times slower than the coprocessor. Unless BRLCAD uses very little floating-point I believe that the coprocessor is active on Alexander-James Annala's machine too (If Alexander-James wants to try these tests I'll send him the source code if he drops me a line). For final comparison, I have some old figures from Linux with gcc 2.1. Using the register int time to place the results on the same scale as the 25MHz results above the mean time for a f-p operation was 2.09 usec without optimization and 0.936 usec at -O1 and above. Chris Flatters cfla...@nrao.edu
Newsgroups: comp.unix.bsd Path: sparky!uunet!darwin.sura.net!uvaarpa!cv3.cv.nrao.edu!laphroaig!cflatter From: cfla...@nrao.edu (Chris Flatters) Subject: Re: 386BSD-0.1/BRLCAD4.0 benchmark -- poor floa Message-ID: <1992Jul29.152914.2508@nrao.edu> Sender: ne...@nrao.edu Reply-To: cfla...@nrao.edu Organization: NRAO References: <l7ctu0INN880@neuro.usc.edu> Date: Wed, 29 Jul 1992 15:29:14 GMT Lines: 21 In article l7ctu0...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes: >It looks like the floating point processor is used for simple floating >point operations (+, -, /, *) but not for higher functions -- these go >to /usr/src/lib/libm/ieee/support.c where they get emulated (ie _sqrt) >very slowly. Perhaps someone has a redistributable i386 library which >could be plugged in place of the default 4.3BSD mathematics library. > >This is likely why BRLCAD 4.0 runs correctly but very slowly at the >present time. Any help or suggestions which would fix this problem >would be very much appreciated. > >Does gcc 2.2 actually use 80387 inline code for transcedental functions? No. I just looked :-(. (Unless it is very well hidden). It could probably be made to though. There appears to be some support for inlining transcendental functions for the Moto 68881. Chris Flatters cfla...@nrao.edu
Path: sparky!uunet!mcsun!news.funet.fi!hydra!klaava!torvalds From: torv...@klaava.Helsinki.FI (Linus Benedict Torvalds) Newsgroups: comp.unix.bsd Subject: Re: 386BSD-0.1/BRLCAD4.0 benchmark -- poor floa Message-ID: <1992Jul29.222708.4315@klaava.Helsinki.FI> Date: 29 Jul 92 22:27:08 GMT References: <l7ctu0INN880@neuro.usc.edu> <1992Jul29.152914.2508@nrao.edu> Organization: University of Helsinki Lines: 31 In article <1992Jul29....@nrao.edu> cfla...@nrao.edu writes: >In article l7ctu0...@neuro.usc.edu, mer...@neuro.usc.edu (merlin) writes: >> >>Does gcc 2.2 actually use 80387 inline code for transcedental functions? > >No. I just looked :-(. (Unless it is very well hidden). gcc-2.2.2 only uses the basic math + fsqrt (fsqrt is new to gcc-2.x). The basic math is handled pretty well, though: much better than with earlier versions. gcc-2 knows about the fpu stack, resulting in fpu code that looks almost hand-optimized (at least sometimes). gcc-1.40 isn't very good at handling the 387 stack (it seems to use mostly ready-made templates). >It could probably be made to though. There appears to be some support for >inlining transcendental functions for the Moto 68881. I don't think inlining is as easy on the 387 as it's on a 68881: the 387 transcendental functions have enough special cases (argument limitations etc) that it's probably easier to do it in a dedicated function. But those functions should certainly use the 387 instructions instead of using series arithmetic or whatever portable C code the current 386bsd libraries seem to use. The linux libraries might be useful: linux has two different math libraries, one for soft-float (ie using only the normal arithmetic functions that are emulated), and one for hard-float. With shared libraries (under linux), the same binary can use either, but under 386bsd you'd have to decide at link-time which to use. Linus