Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site osu-eddie.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!ihnp4!cbosgd!apr!osu-eddie!elwell From: elw...@osu-eddie.UUCP (Clayton M. Elwell) Newsgroups: net.micro.68k,net.micro.pc Subject: Intel processors vs. Motorola processors Message-ID: <344@osu-eddie.UUCP> Date: Fri, 31-May-85 09:56:44 EDT Article-I.D.: osu-eddi.344 Posted: Fri May 31 09:56:44 1985 Date-Received: Sat, 1-Jun-85 02:41:11 EDT Distribution: net Organization: Ohio State Univ., CIS Dept., Cols, Oh. Lines: 57 I have done extensive programming in C and assembly language on both the 8086 family and the 68000 family. I have found that in all respects, the 68000 is much easier to use and allows me to write faster code. Speed is important to me, because a lot of what I write is graphics and screen handling. (Note: I will use '8086' and '68000' in a generic fashion, i.e. referring to the entire families in question) o Register complement Although occasionally the 68K address/data register distinction can be annoying, it's nothing compared to the hassle of the 8086. ``General purpose registers? Why would we want any of those?'' ``Put a full address into a register? Nobody ever does that!'' Argh. Registers have one big advantage for both assembly language and compiled code: SPEED. It's also annoying to have to push/pop/exchange/etc. just to move things around so you can execute a SHIFT instruction (to pick an example out of a hat). Orthogonality isn't just for the benefit of the hardware designers... o Address space and treatment I realize this is a religious issue, but I'll risk it anyway. First, real programs DO use more than 64K of code and data. Some even want it all accessible at once. Take text editors, formatters, spreadsheets, graphics systems, compilers [you know, the stuff no one ever actually uses :-)]. I'm sorry, segment registers do not constitute an 'advanced segmented architecture'. If you want position-independent code, supply a PC-relative addressing mode. If you want memory management, use a real MMU. An smart bank switch (oops, I mean segment register) is only useful at all when you want to use 8080/Z80 style code at any offset in your block of memory. I give the 8086 one thing: It is far better than a Z80, and makes it real easy to port CP/M software without thinking to hard. I don't consider this to be useful anymore. o Memory speed and resource usage This one is a tie as far as I can tell. Both use a 4-clock memory cycle, both have prefetch queues, the bigger ones (286 & 020) both have a cache (although the 68020 has a 3-clock memory cycle as I understand it (though I may be wrong), and the 286 has a 4-clock cycle (from the 286 hardware reference manual crouching near my left elbow)). From a pure bus speed standpoint, it seems to be a tie. On the whole, I have found it a much less stressful task to write for the 68000. When using C, the compiler can generate good enough code that I don't have to resort to assembly language as often, and when I do, it's MUCH more straightforward. Programming should not always be an adventure in processor peculiarities. -- Clayton Elwell
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/13/84; site intelca.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!pesnta!amd! intelca!kds From: k...@intelca.UUCP (Ken Shoemaker) Newsgroups: net.micro.68k,net.micro.pc Subject: Re: all sorts of things + Intel processors vs. Motorola processors Message-ID: <600@intelca.UUCP> Date: Wed, 5-Jun-85 16:45:34 EDT Article-I.D.: intelca.600 Posted: Wed Jun 5 16:45:34 1985 Date-Received: Thu, 6-Jun-85 06:29:22 EDT References: <344@osu-eddie.UUCP> Distribution: net Organization: Intel, Santa Clara, Ca. Lines: 66 > o Memory speed and resource usage > > This one is a tie as far as I can tell. Both use a 4-clock memory > cycle, both have prefetch queues, the bigger ones (286 & 020) both > have a cache (although the 68020 has a 3-clock memory cycle as I > understand it (though I may be wrong), and the 286 has a 4-clock > cycle (from the 286 hardware reference manual crouching near my left > elbow)). From a pure bus speed standpoint, it seems to be a tie. > A little confusion, I'm afraid. The 286 accepts a 2X clock, and the clock speed from which things are drawn is a divide by 2 of that clock. So, the 286 really has a 2 clock bus cycle. Of course, the other option is that what we call a 12MHz 286 is what you would call a 24MHz 286. Also, with pipelined address/data, the 286 provides more generous access times, even if you don't use interleaved memories, since the address output delays from clock on MOS devices aren't nearly as fast as that of an F or AS latch. Also, this means that you can pre-select memories before a cycle, or whatever. This kinda falls into the AT vs Z150 battle, too. If you believe that most modern microprocessors are bus limited to some extent, then performance is closely tied to the bus bandwidth of the processor. Even with 1 wait state, the 286 has a 3 clock bus as opposed to a 4 clock bus of the 8088. Thus, at 6MHz, the 286 with 1 wait state has a maximum bus bandwidth of 4MBytes/second (=6MHz/3 * 2) while at 8MHz, and 8088 with 0 wait states has a maximum bus bandwidth of 2Mbytes/second (=8Mhz/4). Even for byte reads, the latency time from the bus for the 286 with one wait state is the same as the 8088 with no wait states. Besides all this, the 286 does execute instructions inside the chip faster than the 8088, so you are going to have to look a little farther than just a processor(286)/processor(88) comparison for an explaination of your results. The nature of the 286 bus as opposed to the 8088 bus also follows through to the 286 bus vs the 68{000,010,020} busses: I still don't understand why since Mot gives you seperate address and data busses that they don't use them better, i.e., present early addresses to memory systems that could use them to an advantage. This really does allow faster operation with slower memories at the cost of more pins on your package. For what it's worth, it seems to me that Mot is wasting money providing seperate address/data pins with the utilization that they provide (unless they are not pad limited on their die, and their yields are such that package costs are insignificant). I mean, all those extra drivers do take up die space, and those extra pads could mean that the chip is not as small or as cheap as it could be.... Finally, I have another question to pose to the net. I believe Mot uses a two level microcode in the 68k and its followons... (can someone verify this?) Does anyone have any idea what this means to its performance (with respect jumps and having to fill up the instruction queue). Do they take two clocks to do a complete microcode lookup (the first to the first level, the second to the second level)? RISCs that are out there have NO microcode, and present this as one reason for their faster performance. Also, this was one reason Zilog presented way back when for the performance of the Z8000 (they said it was good). If you think about it, it does make sense, since you have to wait at least 1 clock to go through your microcode lookup. Any thoughts? -- It looks so easy, but looks sometimes deceive... Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca. {pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds ---the above views are personal. They may not represent those of Intel.
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax! decwrl!sun!gnu From: g...@sun.uucp (John Gilmore) Newsgroups: net.micro.68k,net.micro.pc,net.arch Subject: Re: x86/68x buses ; two-level microcode Message-ID: <2275@sun.uucp> Date: Thu, 6-Jun-85 06:46:19 EDT Article-I.D.: sun.2275 Posted: Thu Jun 6 06:46:19 1985 Date-Received: Sat, 8-Jun-85 02:10:53 EDT References: <344@osu-eddie.UUCP> <600@intelca.UUCP> Organization: Sun Microsystems, Inc. Lines: 72 > Ken Shoemaker says: I still don't > understand why since Mot gives you seperate address and data busses > that they don't use them better, i.e., present early addresses to > memory systems that could use them to an advantage. This really > does allow faster operation with slower memories at the cost of > more pins on your package. For what it's worth, it seems to me > that Mot is wasting money providing seperate address/data pins > with the utilization that they provide (unless they are not > pad limited on their die, and their yields are such that package > costs are insignificant). I mean, all those extra drivers do take > up die space, and those extra pads could mean that the chip is not > as small or as cheap as it could be.... Well, I was just reading a trade rag that quoted Intel and AMD as having REDUCED the price of the 80186 by 50% to $15-20 for 25K. We got quotes of about $10 for 10MHz 68000's in quantity last month. All those pins are really driving up the price... One advantage of the 680x0 approach is that you don't have to surround your CPU with glue to latch the addresses. You can just wire address pins straight to where they're going and they stay good for the entire cycle. I agree that there might be potential for speed improvement here, so just think -- in a few years when the 68020 seems like a slow machine, they'll have a few more tricks they can pull. Here's some detail on memory cycle and address-to-data times for 68Ks: Part ClkCyc Clk/Mem MemCyc Addr->data 68000L4 250ns 4 1000ns 630ns 68000L10 100ns 4 400ns 230ns 68010L10 100ns 4 400ns 235ns 68000L12 80ns 4 320ns 175ns 68010L12 80ns 4 320ns 175ns 68020R12 80ns 3 240ns 150ns 68020R16 60ns 3 180ns 115ns Note that the 68000L4 was the first to be announced and the 68020R16 is the last to be announced. There's a factor of 5 between the two just in bus cycle times. [I don't think you can buy 68000L4 anymore; just about any die that runs at 4MHz also runs at 8 or more...] > I believe > Mot uses a two level microcode in the 68k and its followons... > (can someone verify this?) Does anyone have any idea what this > means to its performance (with respect jumps and having to fill > up the instruction queue). Do they take two clocks to do a complete > microcode lookup (the first to the first level, the second to the > second level)? The 68000, 68008, and 68010 have the same two-level ucode. There is no jump penalty though. Basically they got tricky and noticed that if they just made the microwords 197 bits wide that it would take up a lot of chip area. Instead, they figured out which bits really HAD to be different for each microinstruction, and which bits might occur in combinations that would occur more than once in the microcode. It turned out that they needed 544 different 17-bit microinstructions to implement the 68000, but by sharing they only needed 336 180-bit nano instructions. There is no "pointer" from the microcode to the nanocode; they are both addressed with the same address (the micro-PC). The trick is that the nanorom is decoded funny and a single row can respond to multiple addresses. These addresses have to be only a few bits different from each other, so they had to be careful about where each microinstruction went in the ROMs. You can read all about it in US Patent #4,325,121 by Tom Gunter and Harry "Nick" Tredennick. I heard that most of the effort in making the 68010 was a large microcode rewrite; the rest of the chip was reputedly similar to a 68000. The patent should be out by now but I haven't tracked it down. I don't know what the microcode for 68020 looks like. Got any similar tricks up your sleeve for the 386, Ken?
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!henry From: he...@utzoo.UUCP (Henry Spencer) Newsgroups: net.micro.68k,net.micro.pc Subject: Re: all sorts of things + Intel processors vs. Motorola processors Message-ID: <5674@utzoo.UUCP> Date: Thu, 6-Jun-85 12:31:34 EDT Article-I.D.: utzoo.5674 Posted: Thu Jun 6 12:31:34 1985 Date-Received: Thu, 6-Jun-85 12:31:34 EDT References: <344@osu-eddie.UUCP>, <600@intelca.UUCP> Organization: U of Toronto Zoology Lines: 18 > ... I believe > Mot uses a two level microcode in the 68k and its followons... > (can someone verify this?) Yup, that's correct. Don't know how it affects the speed. Remember that microcode fetch time normally is pipelined out, since the execution of microcode is predictable (barring bizarre architectures with micro- interrupts) and techniques like delayed branches are routine. > ...this was one reason Zilog presented way back when for the performance > of the Z8000 (they said it was good). It's also one reason why the (hardwired) Z8000 took a lot longer to debug than the (microcoded) 68000. I believe Zilog has since admitted that not using microprogramming was a mistake. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/13/84; site intelca.UUCP Path: utzoo!utcs!lsuc!pesnta!amd!intelca!kds From: k...@intelca.UUCP (Ken Shoemaker) Newsgroups: net.micro.68k,net.micro.pc,net.arch Subject: Re: Re: x86/68x buses ; two-level microcode Message-ID: <611@intelca.UUCP> Date: Tue, 11-Jun-85 19:08:02 EDT Article-I.D.: intelca.611 Posted: Tue Jun 11 19:08:02 1985 Date-Received: Wed, 12-Jun-85 10:26:18 EDT References: <344@osu-eddie.UUCP> <600@intelca.UUCP> <2275@sun.uucp> Organization: Intel, Santa Clara, Ca. Lines: 55 > Well, I was just reading a trade rag that quoted Intel and AMD as > having REDUCED the price of the 80186 by 50% to $15-20 for 25K. We got > quotes of about $10 for 10MHz 68000's in quantity last month. > All those pins are really driving up the price... This is comparing apples and oranges, for two reasons: the first is date of introduction of the two products, and the second is that price has little to do with cost. But think about it, how can a 64 pin package ever be cheaper than a 48 pin package? It takes more material, for sure, but in addition to that, it requires more board space and a tester for the device would require additional lines for the extra pins (which usually means a more expensive tester). > > One advantage of the 680x0 approach is that you don't have to surround > your CPU with glue to latch the addresses. You can just wire address > pins straight to where they're going and they stay good for the entire > cycle. I agree that there might be potential for speed improvement > here, so just think -- in a few years when the 68020 seems like a slow > machine, they'll have a few more tricks they can pull. Sure, you are going to drive 2Mbytes of static RAMs (or ROMs?) directly off the pins of the processor? Surely you need an address buffer in there somewhere, or are those not considered "glue"? > > Here's some detail on memory cycle and address-to-data times for 68Ks: > > Part ClkCyc Clk/Mem MemCyc Addr->data > 68000L4 250ns 4 1000ns 630ns > 68000L10 100ns 4 400ns 230ns > 68010L10 100ns 4 400ns 235ns > 68000L12 80ns 4 320ns 175ns > 68010L12 80ns 4 320ns 175ns > 68020R12 80ns 3 240ns 150ns > 68020R16 60ns 3 180ns 115ns > > Note that the 68000L4 was the first to be announced and the 68020R16 > is the last to be announced. There's a factor of 5 between the two > just in bus cycle times. [I don't think you can buy 68000L4 anymore; > just about any die that runs at 4MHz also runs at 8 or more...] Is a factor of 5 good news? Just think, your whole memory system has to be sped up by 5 times! Memory designers may be good, but they aren't THAT good! If Mot had gone with pipelined address/data on the 68020R16, I'd guess that their memory access times (addr->data) would go from 115ns to 170ns. However, they may use pipelining internally to access their cache, so they can never allow this extra margin for system designers (does anyone know if this is true?). -- It looks so easy, but looks sometimes deceive... Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca. {pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds ---the above views are personal. They may not represent those of Intel.
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp Path: utzoo!linus!decvax!decwrl!sun!gnu From: g...@sun.uucp (John Gilmore) Newsgroups: net.micro.68k,net.micro.pc,net.arch Subject: Re: x86/68x buses Message-ID: <2306@sun.uucp> Date: Sat, 15-Jun-85 05:13:57 EDT Article-I.D.: sun.2306 Posted: Sat Jun 15 05:13:57 1985 Date-Received: Tue, 18-Jun-85 03:39:35 EDT References: <344@osu-eddie.UUCP> <600@intelca.UUCP> <2275@sun.uucp> <611@intelca.UUCP> Organization: Sun Microsystems, Inc. Lines: 27 > From Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca: > If Mot had gone with pipelined address/data on the 68020R16, > I'd guess that their memory access times (addr->data) would go from > 115ns to 170ns. However, they may use pipelining internally to access > their cache, so they can never allow this extra margin for system > designers (does anyone know if this is true?). I think the 68020 drives the address of prefetches (if there's not already a cycle on the bus) but will not assert address strobe if it hits the cache. AS doesn't come out until the addresses are stable anyway, so the cache lookup is overlapped with the address driver propagation delay (and setup time on whoever's receiving the addresses). Serious MMUs start to translate the address before AS anyway, so it actually helps to not have to latch the address, since as fast as the CPU can drive it, the MMU can start looking it up, rather than having it sit on the wrong side of a latch until a strobe comes out. In a 180ns memory cycle it's VERY hard (both for CPU and for memory subsystem) to run with Ken's proposed 170ns addr->data times. It's clear that the 68020 can access memory faster than dynamic ram can respond. There are plenty of solutions developed for mainframes (which have had the same problem for a long time); the on-chip instruction cache is one of them. Ken's overlapping technique may be one that the 68020 design precludes. Got any stats on how many 286 designs use the technique, and how much time is really saved (e.g. is addr->data really the bottleneck)?
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/13/84; site intelca.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!pesnta! amd!intelca!kds From: k...@intelca.UUCP (Ken Shoemaker) Newsgroups: net.micro.68k,net.arch Subject: Re: Re: x86/68x buses Message-ID: <9@intelca.UUCP> Date: Mon, 1-Jul-85 14:05:08 EDT Article-I.D.: intelca.9 Posted: Mon Jul 1 14:05:08 1985 Date-Received: Thu, 4-Jul-85 00:28:16 EDT References: <344@osu-eddie.UUCP> <600@intelca.UUCP> <2275@sun.uucp> <611@intelca.UUCP> <2306@sun.uucp> Organization: Intel, Santa Clara, Ca. Lines: 50 > I think the 68020 drives the address of prefetches (if there's not > already a cycle on the bus) but will not assert address strobe if it > hits the cache. AS doesn't come out until the addresses are stable > anyway, so the cache lookup is overlapped with the address driver > propagation delay (and setup time on whoever's receiving the > addresses). Serious MMUs start to translate the address before AS > anyway, so it actually helps to not have to latch the address, since regardless of the timing of the address strobe, you really can't start looking up addresses for either your cache, or for your MMU until the addresses are guaranteed stable on the address bus, or have I missed some major advance in non-deterministic logic? > In a 180ns memory cycle it's VERY hard (both for CPU and for memory > subsystem) to run with Ken's proposed 170ns addr->data times. It's > clear that the 68020 can access memory faster than dynamic ram can > respond. There are plenty of solutions developed for mainframes (which I agree that it is not the easiest thing in the world to build a 170ns memory system. I would think that it is obvious that it is even more difficult to build a system that allowed only 115ns to do the same thing... > have had the same problem for a long time); the on-chip instruction > cache is one of them. Ken's overlapping technique may be one that the > 68020 design precludes. Got any stats on how many 286 designs use the > technique, and how much time is really saved (e.g. is addr->data really > the bottleneck)? I don't know how many 286 designs actually use the technique of running two memory cycles at the same time (although the 8207 DRAM controller supports doing this), but my main point was that by providing addresses earlier in a bus cycle (i.e., before the bus cycle even begins!) that you gain the address bus drive time (from the CPU) in the address to data time, since once in the bus cycle, you don't have to wait for the CPU to drive the capacitive loads on the address pins. Although addr->data may not be the ONLY bottleneck in a system, it is a very significant one, and by providing more of it while running the bus at the same speed you can't help but get a faster system, since the alternative is to add wait states. -- ...and I'm sure it wouldn't interest anybody outside of a small circle of friends... Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm {pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds ---the above views are personal. They may not represent those of the employer of its submitter.