Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ncar!ames!pasteur!ucbvax!adt.UUCP!madd From: m...@adt.UUCP (jim frost) Newsgroups: comp.sys.sgi Subject: SGI's interesting idea of a "speedup" Message-ID: <8812162201.AA24738@adt.uucp> Date: 16 Dec 88 22:01:31 GMT Sender: dae...@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 82 Quoted from "Porting Applications to the IRIS-4D Family": -- begin quote -- 5.3 New Drawing Subroutines Software reliease 4D1-3.0 introduced several new Graphics Library subroutines for drawing and pixel access. Silicon Graphics recommends converting old style routines to the new ones for three reasons: * Your code will be more portable. * On the GT and future products, the new subroutines will run up to 10 times faster than their old counterparts. * The new subroutines simplify the Graphics Library and allow for future expansion. In most cases, the convertion is simple -- just substitute the new subroutines for the old ones. Unfortunately, the new subroutines do not work in display lists, so if your code is based primarily on display lists, the solution is not so simple. This table gives a comparison of old and new subroutines. ---------------------------------------------------------------------- Technique Old Subroutines New Subroutines ---------------------------------------------------------------------- draw connected move,draw,draw bgnline,v3f,v3f, line segments endline draw closed move,draw,draw bgnclosedline,v3f,v3f, hollow polygons or poly endclosedline draw filled pmv,pdr,pdr,pclos bgnpolygon,v3f,v3f, polygons polf or splf endpolygon draw points pnt,pnt bgnpoint,v3f,v3f, endpoint read pixels readpixels,readRGB rectread,lrectread write pixels writepixels,writeRGB rectwrite,lrectwrite draw triangular new bgntmesh,v3f,v3f, meshes endtmesh color(vector) RGBcolor cpack or c3i surface normal normal n3f clear screen, clear,zclear czclear Z-buffer create RGB RGBwritemask wmpack writemask ---------------------------------------------------------------------- -- end quote -- Interestingly, the 10x factor seems to be correct as one of our customers reported that our product "ran ten times slower" on the GT. We happily followed the SGI guide to speed them up. At one point we changed all our readpixel() calls to rectread() calls, a non-trivial task because they don't have the same arguments at all. To our great surprise, the following was printed when the new call was made: <rectread> is not implemented. We were impressed at just how fast their new function didn't work, as I'm sure you can guess. Curious, we investigated. Making use of "strings", we found that libgl_s.a contained the string "<%s> is not implemented.". Just how many functions might call whatever routine has that string is something that scares me. Jim Frost Associative Design Technology (508) 366-9166 m...@bu-it.bu.edu
Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!pasteur!ucbvax! DDATHD21.BITNET!XBR2D96D From: XBR2D...@DDATHD21.BITNET (Knobi der Rechnerschrat) Newsgroups: comp.sys.sgi Subject: Misc. Message-ID: <8812200602.aa17057@SMOKE.BRL.MIL> Date: 19 Dec 88 06:43:53 GMT Sender: dae...@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 71 Hello Netlanders, I've a few questions about SGI's new GTX architecture. They are based on the 3.1 release notes and a document called "IRIS GTX: A Technical Report, Rev 2": - which type of CPU (16 MHZ or 25 MHZ) and how many of them do I need to get the full graphics speed (100.000 Z-buffered 4-sided, G-shaded, P-lighted, independent polygons). I ask this question, because one of SGI's competitors (they have a vector/parallel-oriented Workstation with up to 4 CPU's, Graphics computations done in the CPU) had to admit (after applying some spanish inqusition tools) that they need 4 CPU's to reach their maximum graphics performance and that there may exist situations, where graphics can consume all resources of the system. - Chapter "8.2 Graphics Notes" in the 4D-3.1 release notes states that some of the graphics routines (c3*, c4*, n3f, v2*, v3*, v4*) should be called with quadword-aligned data to get full GTX performance. Does this mean all the variables have to be "double" (which I don't beleave) or that the first byte of a "float x[3]" vector has to start on a quadword-address? In the latter case I only have to rearrange our data-structures. - does shademodel(FLAT) work again under 3.1? As a last point I want to comment on Jim Frost who wrotes a note about > Subject: SGI's interesting idea of a "speedup" . . . . >Interestingly, the 10x factor seems to be correct as one of our >customers reported that our product "ran ten times slower" on the GT. > >We happily followed the SGI guide to speed them up. At one point we >changed all our readpixel() calls to rectread() calls, a non-trivial >task because they don't have the same arguments at all. To our great >surprise, the following was printed when the new call was made: > > <rectread> is not implemented. > >We were impressed at just how fast their new function didn't work, as >I'm sure you can guess. > >Curious, we investigated. Making use of "strings", we found that >libgl_s.a contained the string "<%s> is not implemented.". Just how >many functions might call whatever routine has that string is >something that scares me. > >Jim Frost >Associative Design Technology >(508) 366-9166 >m...@bu-it.bu.edu Did you get your "not implemented" on a G or GT. If its on a G (as I suspect) how can you expect routines to be implemented that make only sense on the GT architecture (another example is smoothline())? I think its a good idea to allow you to use the calls, but to tell you that they don't work. Have a merry Christmas and a happy new year 89 Martin Knoblauch TH-Darmstadt Physical Chemistry 1 Petersenstrasse 20 D-6100 Darmstadt West-Germany BITNET: <XBR2D96D@DDATHD21>
Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!sgi!...@patton.SGI.COM From: j...@patton.SGI.COM (Jim Barton) Newsgroups: comp.sys.sgi Subject: Re: Misc. Summary: The Answer Man Message-ID: <23835@sgi.SGI.COM> Date: 21 Dec 88 17:44:35 GMT References: <8812200602.aa17057@SMOKE.BRL.MIL> Sender: dae...@sgi.SGI.COM Organization: Silicon Graphics, Inc., Mountain View, CA Lines: 65 In article <8812200602.aa17...@SMOKE.BRL.MIL>, XBR2D...@DDATHD21.BITNET (Knobi der Rechnerschrat) writes: > Hello Netlanders, > > I've a few questions about SGI's new GTX architecture. They are based > on the 3.1 release notes and a document called "IRIS GTX: A Technical > Report, Rev 2": > > - which type of CPU (16 MHZ or 25 MHZ) and how many of them do I need > to get the full graphics speed (100.000 Z-buffered 4-sided, G-shaded, > P-lighted, independent polygons). I ask this question, because one of > SGI's competitors (they have a vector/parallel-oriented Workstation > with up to 4 CPU's, Graphics computations done in the CPU) had to admit > (after applying some spanish inqusition tools) that they need 4 CPU's > to reach their maximum graphics performance and that there may exist > situations, where graphics can consume all resources of the system. ALL GTX class machines can reach full graphics performance with a single CPU driving the graphics. In a 4-popper, this means you get >3 CPU's of compute performance to use as you wish. (Unlike the competition, a GTX has 100 MFlops dedicated to graphics; the CPU performance is yours to use or abuse as you wish). Part of this is the result of a custom bus cycle and small block DMA facility which the processor uses to send geometry to the pipeline. We call this feature the "3-way-transfer". More below ... > - Chapter "8.2 Graphics Notes" in the 4D-3.1 release notes states that > some of the graphics routines (c3*, c4*, n3f, v2*, v3*, v4*) should be > called with quadword-aligned data to get full GTX performance. > Does this mean all the variables have to be "double" (which I don't > beleave) or that the first byte of a "float x[3]" vector has to start > on a quadword-address? In the latter case I only have to rearrange our > data-structures. As you surmised, the quadword alignment is just for the first byte of the data structure you are sending. The reason for doing this to get full performance is related to the 3-way-transfer and the MP backplane. As in most multiprocessors, memory data is transferred in large blocks for efficiency, and then cached at each CPU. The POWERSeries uses a 4-word (16-byte) cache line, which is also the basic unit of transfer to the graphics pipeline. The 3-way-transfer is designed to allow the programmer to lay out his data in an arbitrary way without alignment restrictions. Thus, if your vertex crosses a 4-word boundary, two bus cycles will be necessary to send the data (thus the "3-way": the first part of the data may come from cache or memory, and the second part may come from some other cache or memory, or the initiating CPU may own none of the data, in which case other cache(s) or memory will supply the data). [Sorry if this is confusing; remember that the POWERSeries uses write-back cacheing, so the "real" memory image is distributed between caches and memory.] Quad word aliging the vertex assures that the transfer happens in a single bus cycle, giving you the best performance (but remember, your code will still work, no matter how the data is aligned). > - does shademodel(FLAT) work again under 3.1? I hope so. -- Jim Barton Silicon Graphics Computing Systems "UNIX: Live Free Or Die!" j...@sgi.sgi.com, sgi!...@decwrl.dec.com, ...{decwrl,sun}!sgi!jmb "I used to be disgusted, now I'm just amused." - Elvis Costello, 'Red Shoes' --