Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/5/84; site osu-eddie.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!ihnp4!cbosgd!apr!osu-eddie!elwell
From: elw...@osu-eddie.UUCP (Clayton M. Elwell)
Newsgroups: net.micro.68k,net.micro.pc
Subject: Intel processors vs. Motorola processors
Message-ID: <344@osu-eddie.UUCP>
Date: Fri, 31-May-85 09:56:44 EDT
Article-I.D.: osu-eddi.344
Posted: Fri May 31 09:56:44 1985
Date-Received: Sat, 1-Jun-85 02:41:11 EDT
Distribution: net
Organization: Ohio State Univ., CIS Dept., Cols, Oh.
Lines: 57

I have done extensive programming in C and assembly language on both the
8086 family and the 68000 family.  I have found that in all respects, the
68000 is much easier to use and allows me to write faster code.  Speed is
important to me, because a lot of what I write is graphics and screen
handling.

(Note: I will use '8086' and '68000' in a generic fashion, i.e. referring to
 the entire families in question)

    o Register complement

      Although occasionally the 68K address/data register distinction can
      be annoying, it's nothing compared to the hassle of the 8086.
      ``General purpose registers? Why would we want any of those?''
      ``Put a full address into a register? Nobody ever does that!''
      Argh.  Registers have one big advantage for both assembly language
      and compiled code: SPEED.

      It's also annoying to have to push/pop/exchange/etc. just to move
      things around so you can execute a SHIFT instruction (to pick an
      example out of a hat).  Orthogonality isn't just for the benefit
      of the hardware designers...

    o Address space and treatment

      I realize this is a religious issue, but I'll risk it anyway.
      First, real programs DO use more than 64K of code and data.  Some
      even want it all accessible at once.  Take text editors, formatters,
      spreadsheets, graphics systems, compilers [you know, the stuff no
      one ever actually uses :-)].  I'm sorry, segment registers do not
      constitute an 'advanced segmented architecture'.  If you want
      position-independent code, supply a PC-relative addressing mode.
      If you want memory management, use a real MMU.  An smart bank switch
      (oops, I mean segment register) is only useful at all when you want
      to use 8080/Z80 style code at any offset in your block of memory.

      I give the 8086 one thing:  It is far better than a Z80, and makes
      it real easy to port CP/M software without thinking to hard.  I
      don't consider this to be useful anymore.

    o Memory speed and resource usage

      This one is a tie as far as I can tell.  Both use a 4-clock memory
      cycle, both have prefetch queues, the bigger ones (286 & 020) both
      have a cache (although the 68020 has a 3-clock memory cycle as I
      understand it (though I may be wrong), and the 286 has a 4-clock
      cycle (from the 286 hardware reference manual crouching near my left
      elbow)).  From a pure bus speed standpoint, it seems to be a tie.


On the whole, I have found it a much less stressful task to write for the
68000.  When using C, the compiler can generate good enough code that I
don't have to resort to assembly language as often, and when I do, it's MUCH
more straightforward.  Programming should not always be an adventure in
processor peculiarities.

				-- Clayton Elwell

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/13/84; site intelca.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!pesnta!amd!
intelca!kds
From: k...@intelca.UUCP (Ken Shoemaker)
Newsgroups: net.micro.68k,net.micro.pc
Subject: Re: all sorts of things + Intel processors vs. Motorola processors
Message-ID: <600@intelca.UUCP>
Date: Wed, 5-Jun-85 16:45:34 EDT
Article-I.D.: intelca.600
Posted: Wed Jun  5 16:45:34 1985
Date-Received: Thu, 6-Jun-85 06:29:22 EDT
References: <344@osu-eddie.UUCP>
Distribution: net
Organization: Intel, Santa Clara, Ca.
Lines: 66

>     o Memory speed and resource usage
> 
>       This one is a tie as far as I can tell.  Both use a 4-clock memory
>       cycle, both have prefetch queues, the bigger ones (286 & 020) both
>       have a cache (although the 68020 has a 3-clock memory cycle as I
>       understand it (though I may be wrong), and the 286 has a 4-clock
>       cycle (from the 286 hardware reference manual crouching near my left
>       elbow)).  From a pure bus speed standpoint, it seems to be a tie.
> 
A little confusion, I'm afraid.  The 286 accepts a 2X clock, and the
clock speed from which things are drawn is a divide by 2 of that clock.
So, the 286 really has a 2 clock bus cycle.  Of course, the other
option is that what we call a 12MHz 286 is what you would call a 24MHz
286.  Also, with pipelined address/data, the 286 provides more generous
access times, even if you don't use interleaved memories, since the
address output delays from clock on MOS devices aren't nearly as fast
as that of an F or AS latch.  Also, this means that you can pre-select
memories before a cycle, or whatever.  This kinda falls into the
AT vs Z150 battle, too.  If you believe that most modern microprocessors
are bus limited to some extent, then performance is closely tied
to the bus bandwidth of the processor.  Even with 1 wait state, the
286 has a 3 clock bus as opposed to a 4 clock bus of the 8088.
Thus, at 6MHz, the 286 with 1 wait state has a maximum bus bandwidth
of 4MBytes/second (=6MHz/3 * 2) while at 8MHz, and 8088 with 0 wait
states has a maximum bus bandwidth of 2Mbytes/second (=8Mhz/4).
Even for byte reads, the latency time from the bus for the 286 with
one wait state is the same as the 8088 with no wait states.  Besides
all this, the 286 does execute instructions inside the chip faster
than the 8088, so you are going to have to look a little farther
than just a processor(286)/processor(88) comparison for an explaination
of your results.

The nature of the 286 bus as opposed to the 8088 bus also follows
through to the 286 bus vs the 68{000,010,020} busses: I still don't
understand why since Mot gives you seperate address and data busses
that they don't use them better, i.e., present early addresses to
memory systems that could use them to an advantage.  This really
does allow faster operation with slower memories at the cost of
more pins on your package.  For what it's worth, it seems to me
that Mot is wasting money providing seperate address/data pins
with the utilization that they provide (unless they are not 
pad limited on their die, and their yields are such that package
costs are insignificant).  I mean, all those extra drivers do take
up die space, and those extra pads could mean that the chip is not
as small or as cheap as it could be....

Finally, I have another question to pose to the net.  I believe
Mot uses a two level microcode in the 68k and its followons...
(can someone verify this?)  Does anyone have any idea what this
means to its performance (with respect jumps and having to fill
up the instruction queue).  Do they take two clocks to do a complete
microcode lookup (the first to the first level, the second to the
second level)?  RISCs that are out there have NO microcode, and
present this as one reason for their faster performance.  Also,
this was one reason Zilog presented way back when for the performance
of the Z8000 (they said it was good).  If you think about it, it
does make sense, since you have to wait at least 1 clock to
go through your microcode lookup.  Any thoughts?

-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!
decwrl!sun!gnu
From: g...@sun.uucp (John Gilmore)
Newsgroups: net.micro.68k,net.micro.pc,net.arch
Subject: Re: x86/68x buses ; two-level microcode
Message-ID: <2275@sun.uucp>
Date: Thu, 6-Jun-85 06:46:19 EDT
Article-I.D.: sun.2275
Posted: Thu Jun  6 06:46:19 1985
Date-Received: Sat, 8-Jun-85 02:10:53 EDT
References: <344@osu-eddie.UUCP> <600@intelca.UUCP>
Organization: Sun Microsystems, Inc.
Lines: 72

> Ken Shoemaker says:                                   I still don't
> understand why since Mot gives you seperate address and data busses
> that they don't use them better, i.e., present early addresses to
> memory systems that could use them to an advantage.  This really
> does allow faster operation with slower memories at the cost of
> more pins on your package.  For what it's worth, it seems to me
> that Mot is wasting money providing seperate address/data pins
> with the utilization that they provide (unless they are not 
> pad limited on their die, and their yields are such that package
> costs are insignificant).  I mean, all those extra drivers do take
> up die space, and those extra pads could mean that the chip is not
> as small or as cheap as it could be....

Well, I was just reading a trade rag that quoted Intel and AMD as
having REDUCED the price of the 80186 by 50% to $15-20 for 25K.  We got
quotes of about $10 for 10MHz 68000's in quantity last month.
All those pins are really driving up the price...

One advantage of the 680x0 approach is that you don't have to surround
your CPU with glue to latch the addresses.  You can just wire address
pins straight to where they're going and they stay good for the entire
cycle.  I agree that there might be potential for speed improvement
here, so just think -- in a few years when the 68020 seems like a slow
machine, they'll have a few more tricks they can pull.

Here's some detail on memory cycle and address-to-data times for 68Ks:

	Part		ClkCyc	Clk/Mem	MemCyc	Addr->data
	68000L4	 	250ns	4      1000ns	630ns
	68000L10	100ns	4	400ns	230ns
	68010L10	100ns	4	400ns	235ns
	68000L12	 80ns	4	320ns	175ns
	68010L12	 80ns	4	320ns	175ns
	68020R12	 80ns	3	240ns	150ns
	68020R16	 60ns	3	180ns	115ns

Note that the 68000L4 was the first to be announced and the 68020R16
is the last to be announced.  There's a factor of 5 between the two
just in bus cycle times.  [I don't think you can buy 68000L4 anymore;
just about any die that runs at 4MHz also runs at 8 or more...]

>                                                       I believe
> Mot uses a two level microcode in the 68k and its followons...
> (can someone verify this?)  Does anyone have any idea what this
> means to its performance (with respect jumps and having to fill
> up the instruction queue).  Do they take two clocks to do a complete
> microcode lookup (the first to the first level, the second to the
> second level)?

The 68000, 68008, and 68010 have the same two-level ucode.  There is no
jump penalty though.  Basically they got tricky and noticed that if
they just made the microwords 197 bits wide that it would take up a lot
of chip area.  Instead, they figured out which bits really HAD to be
different for each microinstruction, and which bits might occur in
combinations that would occur more than once in the microcode.  It
turned out that they needed 544 different 17-bit microinstructions to
implement the 68000, but by sharing they only needed 336 180-bit nano
instructions.  There is no "pointer" from the microcode to the
nanocode; they are both addressed with the same address (the
micro-PC).  The trick is that the nanorom is decoded funny and a single
row can respond to multiple addresses.  These addresses have to be only
a few bits different from each other, so they had to be careful about
where each microinstruction went in the ROMs.  You can read all about
it in US Patent #4,325,121 by Tom Gunter and Harry "Nick" Tredennick.

I heard that most of the effort in making the 68010 was a large microcode
rewrite; the rest of the chip was reputedly similar to a 68000.  The
patent should be out by now but I haven't tracked it down.

I don't know what the microcode for 68020 looks like.

Got any similar tricks up your sleeve for the 386, Ken?

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!henry
From: he...@utzoo.UUCP (Henry Spencer)
Newsgroups: net.micro.68k,net.micro.pc
Subject: Re: all sorts of things + Intel processors vs. Motorola processors
Message-ID: <5674@utzoo.UUCP>
Date: Thu, 6-Jun-85 12:31:34 EDT
Article-I.D.: utzoo.5674
Posted: Thu Jun  6 12:31:34 1985
Date-Received: Thu, 6-Jun-85 12:31:34 EDT
References: <344@osu-eddie.UUCP>, <600@intelca.UUCP>
Organization: U of Toronto Zoology
Lines: 18

> ...  I believe
> Mot uses a two level microcode in the 68k and its followons...
> (can someone verify this?)

Yup, that's correct.  Don't know how it affects the speed.  Remember that
microcode fetch time normally is pipelined out, since the execution of
microcode is predictable (barring bizarre architectures with micro-
interrupts) and techniques like delayed branches are routine.

> ...this was one reason Zilog presented way back when for the performance
> of the Z8000 (they said it was good).

It's also one reason why the (hardwired) Z8000 took a lot longer to debug
than the (microcoded) 68000.  I believe Zilog has since admitted that not
using microprogramming was a mistake.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/13/84; site intelca.UUCP
Path: utzoo!utcs!lsuc!pesnta!amd!intelca!kds
From: k...@intelca.UUCP (Ken Shoemaker)
Newsgroups: net.micro.68k,net.micro.pc,net.arch
Subject: Re: Re: x86/68x buses ; two-level microcode
Message-ID: <611@intelca.UUCP>
Date: Tue, 11-Jun-85 19:08:02 EDT
Article-I.D.: intelca.611
Posted: Tue Jun 11 19:08:02 1985
Date-Received: Wed, 12-Jun-85 10:26:18 EDT
References: <344@osu-eddie.UUCP> <600@intelca.UUCP> <2275@sun.uucp>
Organization: Intel, Santa Clara, Ca.
Lines: 55

> Well, I was just reading a trade rag that quoted Intel and AMD as
> having REDUCED the price of the 80186 by 50% to $15-20 for 25K.  We got
> quotes of about $10 for 10MHz 68000's in quantity last month.
> All those pins are really driving up the price...

This is comparing apples and oranges, for two reasons: the first is
date of introduction of the two products, and the second is that price
has little to do with cost.  But think about it, how can a 64 pin
package ever be cheaper than a 48 pin package?  It takes more material,
for sure, but in addition to that, it requires more board space and
a tester for the device would require additional lines for the extra
pins (which usually means a more expensive tester).
> 
> One advantage of the 680x0 approach is that you don't have to surround
> your CPU with glue to latch the addresses.  You can just wire address
> pins straight to where they're going and they stay good for the entire
> cycle.  I agree that there might be potential for speed improvement
> here, so just think -- in a few years when the 68020 seems like a slow
> machine, they'll have a few more tricks they can pull.

Sure, you are going to drive 2Mbytes of static RAMs (or ROMs?) directly
off the pins of the processor?  Surely you need an address buffer in there
somewhere, or are those not considered "glue"?

> 
> Here's some detail on memory cycle and address-to-data times for 68Ks:
> 
> 	Part		ClkCyc	Clk/Mem	MemCyc	Addr->data
> 	68000L4	 	250ns	4      1000ns	630ns
> 	68000L10	100ns	4	400ns	230ns
> 	68010L10	100ns	4	400ns	235ns
> 	68000L12	 80ns	4	320ns	175ns
> 	68010L12	 80ns	4	320ns	175ns
> 	68020R12	 80ns	3	240ns	150ns
> 	68020R16	 60ns	3	180ns	115ns
> 
> Note that the 68000L4 was the first to be announced and the 68020R16
> is the last to be announced.  There's a factor of 5 between the two
> just in bus cycle times.  [I don't think you can buy 68000L4 anymore;
> just about any die that runs at 4MHz also runs at 8 or more...]

Is a factor of 5 good news?  Just think, your whole memory system has to
be sped up by 5 times!  Memory designers may be good, but they aren't THAT
good!  If Mot had gone with pipelined address/data on the 68020R16,
I'd guess that their memory access times (addr->data) would go from
115ns to 170ns.  However, they may use pipelining internally to access
their cache, so they can never allow this extra margin for system 
designers (does anyone know if this is true?).
-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp
Path: utzoo!linus!decvax!decwrl!sun!gnu
From: g...@sun.uucp (John Gilmore)
Newsgroups: net.micro.68k,net.micro.pc,net.arch
Subject: Re: x86/68x buses
Message-ID: <2306@sun.uucp>
Date: Sat, 15-Jun-85 05:13:57 EDT
Article-I.D.: sun.2306
Posted: Sat Jun 15 05:13:57 1985
Date-Received: Tue, 18-Jun-85 03:39:35 EDT
References: <344@osu-eddie.UUCP> <600@intelca.UUCP> <2275@sun.uucp> 
<611@intelca.UUCP>
Organization: Sun Microsystems, Inc.
Lines: 27

> From Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca:
> If Mot had gone with pipelined address/data on the 68020R16,
> I'd guess that their memory access times (addr->data) would go from
> 115ns to 170ns.  However, they may use pipelining internally to access
> their cache, so they can never allow this extra margin for system 
> designers (does anyone know if this is true?).

I think the 68020 drives the address of prefetches (if there's not
already a cycle on the bus) but will not assert address strobe if it
hits the cache.  AS doesn't come out until the addresses are stable
anyway, so the cache lookup is overlapped with the address driver
propagation delay (and setup time on whoever's receiving the
addresses).  Serious MMUs start to translate the address before AS
anyway, so it actually helps to not have to latch the address, since
as fast as the CPU can drive it, the MMU can start looking it up, rather
than having it sit on the wrong side of a latch until a strobe comes out.

In a 180ns memory cycle it's VERY hard (both for CPU and for memory
subsystem) to run with Ken's proposed 170ns addr->data times.  It's
clear that the 68020 can access memory faster than dynamic ram can
respond.  There are plenty of solutions developed for mainframes (which
have had the same problem for a long time); the on-chip instruction
cache is one of them.  Ken's overlapping technique may be one that the
68020 design precludes.  Got any stats on how many 286 designs use the
technique, and how much time is really saved (e.g. is addr->data really
the bottleneck)?

Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/13/84; site intelca.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!pesnta!
amd!intelca!kds
From: k...@intelca.UUCP (Ken Shoemaker)
Newsgroups: net.micro.68k,net.arch
Subject: Re: Re: x86/68x buses
Message-ID: <9@intelca.UUCP>
Date: Mon, 1-Jul-85 14:05:08 EDT
Article-I.D.: intelca.9
Posted: Mon Jul  1 14:05:08 1985
Date-Received: Thu, 4-Jul-85 00:28:16 EDT
References: <344@osu-eddie.UUCP> <600@intelca.UUCP> <2275@sun.uucp> 
<611@intelca.UUCP> <2306@sun.uucp>
Organization: Intel, Santa Clara, Ca.
Lines: 50

> I think the 68020 drives the address of prefetches (if there's not
> already a cycle on the bus) but will not assert address strobe if it
> hits the cache.  AS doesn't come out until the addresses are stable
> anyway, so the cache lookup is overlapped with the address driver
> propagation delay (and setup time on whoever's receiving the
> addresses).  Serious MMUs start to translate the address before AS
> anyway, so it actually helps to not have to latch the address, since

regardless of the timing of the address strobe, you really can't start
looking up addresses for either your cache, or for your MMU until the
addresses are guaranteed stable on the address bus, or have I missed
some major advance in non-deterministic logic?

> In a 180ns memory cycle it's VERY hard (both for CPU and for memory
> subsystem) to run with Ken's proposed 170ns addr->data times.  It's
> clear that the 68020 can access memory faster than dynamic ram can
> respond.  There are plenty of solutions developed for mainframes (which

I agree that it is not the easiest thing in the world to build a
170ns memory system.  I would think that it is obvious that it is even
more difficult to build a system that allowed only 115ns to do the
same thing...

> have had the same problem for a long time); the on-chip instruction
> cache is one of them.  Ken's overlapping technique may be one that the
> 68020 design precludes.  Got any stats on how many 286 designs use the
> technique, and how much time is really saved (e.g. is addr->data really
> the bottleneck)?

I don't know how many 286 designs actually use the technique of running
two memory cycles at the same time (although the 8207 DRAM controller
supports doing this), but my main point was that by providing addresses
earlier in a bus cycle (i.e., before the bus cycle even begins!) that
you gain the address bus drive time (from the CPU) in the address
to data time, since once in the bus cycle, you don't have to wait
for the CPU to drive the capacitive loads on the address pins.  Although
addr->data may not be the ONLY bottleneck in a system, it is a very
significant one, and by providing more of it while running the bus at
the same speed you can't help but get a faster system, since the
alternative is to add wait states.
-- 
...and I'm sure it wouldn't interest anybody outside of a small circle
of friends...

Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm

{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of the
	employer of its submitter.