Jump to content
IGNORED

Z80 vs. 6502


BillyHW

Recommended Posts

In the case of the C128, doesn't all I/O go through the 8502 even in Z80 mode, so you'd effectively swap CPU all the time you want something on screen, sound, input or output? I figure a such benchmark would put the Z80 to big shame, in particular as the 8502 natively runs at 2 MHz anyway.

... And considering also that commodore engineers had halfed the speed of z80 that in pratice is running @ the very slow speed of 2Mhz, because of the sloppy implementation of the bus DMA and the "not so smart" design that involved to hard divide RAM CPU slot access time with others slots reserved to devices like the VIC-II.

 

So, the 8502 had to give up the ability to use the VIC-II at 2mhz. Pratically, the VIC-II/6502 @1Mhz design is so flawed that @ higher speed is not possible, unless you use faster ram.

 

Pratically the is this the main limit: to bind the CPU speed to the speed of others devices like the VIC-II. When you need more speed you are forced to maintain this relationship, but obviusly a VIC-II is not a CPU where you can choose whatever clock you want because video timing is stardardized.

So engineers were forced to disable VIC @ full speed and "turn" the architechure to a more common schema where vram is external to CPU like in most z80 / 6502 systems of that era.

 

To be honest, the z80 in the C128 could be a bit more smartly interfaced, reaching an effective speed of 3Mhz, but commodore engineers only want to achieve CP/M compatibility not performances...

Link to comment
Share on other sites

Some of the OS routines are executed by the 8502 in CP/M mode with the Z-80 put to sleep at the time.
More a ROM/development saving measure than anything I would think.

 

What does hurt the performance Z80 in the 128 most is that it shares the 2 MHz speed of the 8502 - by the mid 80s typical Z80 speeds were 3.59 MHz or better.

Link to comment
Share on other sites

Some of the OS routines are executed by the 8502 in CP/M mode with the Z-80 put to sleep at the time.

More a ROM/development saving measure than anything I would think.

 

What does hurt the performance Z80 in the 128 most is that it shares the 2 MHz speed of the 8502 - by the mid 80s typical Z80 speeds were 3.59 MHz or better.

A few Z80 machines like the Lobo Max 80 ran at 5MHz. That was introduced in 1982.

I know many CP/M machines ran at 4MHz earlier than that.

All these were more business oriented than budget or game oriented but it's clear 2MHz was really slow by the time the C128 came out.

Budget machines like the Spectrum and VZ/Laser ran in the 3.59 MHz range.

 

*edit*

I'm sure I mentioned it before but the Hitachi HD64180 came out in the mid 80s.

Accelerators based on it were advertised for the TRS-80 and some CP/M CPU boards used the HD64180.

A few MSX machines included them as an 2nd CPU you could switch to for greater speed with compatible apps.

As I've said before, the HD64180 was about 20% faster when in native mode.

It also allowed smaller compiler output due to the extra instructions.

 

If you are going to start looking at machines after 1984, a plain old Z80 isn't the fastest Z80 compatible CPU available.

The later you go, the faster Z80 compatible CPUs become because of the switch from a 4 bit ALU to an 8 bit and then 16 bit.

Several manufacturers also used a pipelined architecture which drastically cut the number of clock cycles required per instruction.

The 6502 couldn't really make such gains in speed do to the way it's designed.

Any comparison of 6502 vs Z80 pretty much should be pre 1985 or it starts to loose meaning.

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

Fair enough on the comparison - but with 6502 you have the 65C02 which can be faster thanks to the extra instructions. And 65816 which is almost a new architecture, but upgrades for 6502 machines using it are rare, so it's not really fair to include it.

 

My line of thinking is just stock 6502 as per C64/Atari/BBC vs early 80s Z80 as per Spectrum/Amstrad.

Link to comment
Share on other sites

Some of the OS routines are executed by the 8502 in CP/M mode with the Z-80 put to sleep at the time.

More a ROM/development saving measure than anything I would think.

 

What does hurt the performance Z80 in the 128 most is that it shares the 2 MHz speed of the 8502 - by the mid 80s typical Z80 speeds were 3.59 MHz or better.

I Agree, a z80 clocked @ 2Mhz is uncommon. What most people does not understand, about the 6502 / z80 is that for a z80 it's *normal* to be clocked @3Mhz like for a 6502 is to be clocked @ 1Mhz. The reason is simply an architecture choice made by the engineers, it's not by accident that 6502 has two phase clock while the z80 only 1 clock src. To be able to perform all micro-steps involved in executing a machine instruction you can take two approaches:

 

- use a faster clock more divided ( the z80 approach )

- use a double phased clock as it is ( the 6502 approach )

 

this *HAS* nothing to do with the PURE COMPUTATIONAL POWER, it's only a matter of architechure.

 

As a prove of this, consider that basically a z80 @3Mhz can work with the same crappy-old-slow-as-a-dead-snail DRAM chips that can be used with a 6502. So no a real big difference in terms of memory access speed.

Link to comment
Share on other sites

Fair enough on the comparison - but with 6502 you have the 65C02 which can be faster thanks to the extra instructions. And 65816 which is almost a new architecture, but upgrades for 6502 machines using it are rare, so it's not really fair to include it.

 

My line of thinking is just stock 6502 as per C64/Atari/BBC vs early 80s Z80 as per Spectrum/Amstrad.

the comparison should be done on chips available at that era. There are also z80 derivates that are more pipelined or can work at faster speed. So like the 65816 there are expanded versions of the original (even on x86 NEC selled faster version of original 8086 chip) but at this point the comparison does not make any sense.

Today we can synthetize a 8 bit core in FPGA able to run at 100Mhz, but who used it in the '80?

Link to comment
Share on other sites

Fair enough on the comparison - but with 6502 you have the 65C02 which can be faster thanks to the extra instructions. And 65816 which is almost a new architecture, but upgrades for 6502 machines using it are rare, so it's not really fair to include it.

The 65C02 additions are definitely a welcome improvement but I'm not sure they would amount to as significant % speed increase overall as the HD64180.

At the very least, you don't get a speed increase on existing binary code and you have to reassemble your code for the 65C02.

 

The additional direct addressing mode of some instructions will probably offer the biggest speed improvement.

The new stack instructions actually make it more desirable to use the stack and 65C02 code requires fewer tricks.

Overall I think the added modes/instructions are most beneficial in reducing code size.

The code to use a jump table takes less than half as many instructions. You want to speed up a 6502 interpreter, that would be an easy patch.

If the 6502 had supported these additions to begin with, you probably would have seen support for a few more commands in Microsoft BASIC just because there would have been more room in the ROM.

Come to think of it, rebuilding Microsoft BASIC for the 65C02 could be an interesting project.

The Apple II series would be a likely target since most machines have a 65C02 and the language card built in.

 

The 65816 isn't really faster than the 65C02 for existing code but 65816 code would certainly be faster.

The 24 bit address buss support is a bit ugly but it works.

I think the 65C02 really should have been more like the 65802.

How would the 65816 compare vs the 64180? It's tough to say. The 16 bit support would make a lot of difference but the mode switching to go between 8 & 16 bit with the accumulator means you will try to stick with one or the other as long as you can to reduce mode switches. This is bound to result in a lot of extra cycles accessing 16 bits.

 

My line of thinking is just stock 6502 as per C64/Atari/BBC vs early 80s Z80 as per Spectrum/Amstrad.

 

 

That would certainly be what most people are interested it.

Edited by JamesD
Link to comment
Share on other sites

the comparison should be done on chips available at that era. There are also z80 derivates that are more pipelined or can work at faster speed. So like the 65816 there are expanded versions of the original (even on x86 NEC selled faster version of original 8086 chip) but at this point the comparison does not make any sense.

Today we can synthetize a 8 bit core in FPGA able to run at 100Mhz, but who used it in the '80?

The JVC MSX2 machine with a 64180 runs at 6MHz and it was definitely introduced in the 80s since the TurboR was introduced in 1990.

I'm guessing around 1986 based on when MSX2+ was introduced.

Link to comment
Share on other sites

I discovered the design for the SB180 single board computer appeared in the 1985 September and October issues of BYTE in a Steve Ciarcia article.
It used an HD64180 clocked at 6MHz.

I found a claim in the SB180 docs that an HD64180 clocked at 6MHz is about twice as fast as a Z80 at 4MHz.
My own estimate would was at least 80% faster in native mode using Z80 code so that's probably pretty close.

Link to comment
Share on other sites

Sounds fair enough - although I've not programmed Z80 I've seen some of the cycle counts and they do seem inflated in many cases.

6502 is similar for some instructions - the so-called pipelining seems a bit of a crock at times.

With some it's easily explained, e.g. extra cycle for some branches and indexing as the ALU is only 8-bits.

For others, not so. Stuff like JSR/RTS, 6 cycles - for RTS it seems like double what should actually be needed.

Link to comment
Share on other sites

I can possibly understand the extra cycle on JSR but the RTS is a bit of a puzzle. I think the RTS is a good 2 cycles longer than I can account for even if I allow for an extra cycle just like JSR.
JSR - 1 cycle to load instruction, 2 cycles to push PC, 2 cycles to load new address to buffer, 1 cycle to transfer buffer to PC?
RTS - 1 cycle to load instruction, 2 cycles to pop return address to buffer, 1 cycle to transfer buffer to PC, 2 cycles to sit there doing I have no idea what.

Link to comment
Share on other sites

There's other cases as well, 2 cycles is the minimum - e.g. NOP does nothing and in theory should be 1 cycle where other 1-byte instructions like INX still take 2 cycles but are actually doing something on the 2nd cycle.

 

PHA/PLA/JSR/RTS/RTI - I think part of the confusing delay is that there are cycles where the Stack Pointer is inc/decremented and nothing else done.

So RTS we might have:

1 read instruction

2 pull PC

2 inc S

1 unknown

 

I don't think the PC is ever buffered as such. When it's in a half-baked state, the memory access that the 6502 performs is pulling the other byte off the stack.

 

I'd think visual6502 or 6502.org probably has the answers, the inner workings have been documented pretty well.

Link to comment
Share on other sites

I thought about adjusting the stack pointer, that makes sense for RTS but what about the JSR?

 

1 read instruction, 1 store byte on stack, 1 decrement stack pointer, 1 store byte on stack, 1 decrement stack pointer, 1 to load byte of new address and you are out of cycles. The stack operation has to combine store and decrement in one cycle or it doesn't work.

Link to comment
Share on other sites

Possibly PC is buffered or has an alternate (taking back my previous statement).

 

Because for JSR, using the PC would give a half-baked value if we're loading the subroutine address from the instruction operand - once the first byte is loaded, the PC is no longer valid.

 

Actually, that might explain RTS - when a JSR is executed, the stack entry is actually <next instruction - 1> so the PC has to be incremented on return.

 

Back to JSR, maybe it's something like:

1 ins fetch

2 push PC to stack

2 fetch subroutine address from operand

1 transfer sub address to PC

 

But that creates a conflict in my RTS description since there's cycles to change SP there but not here.

 

I can vaguely remember reading a document that properly described what happens in these circumstances but can't remember where it was.

Edited by Rybags
  • Like 1
Link to comment
Share on other sites

The 65C02 additions are definitely a welcome improvement but I'm not sure they would amount to as significant % speed increase overall as the HD64180.

At the very least, you don't get a speed increase on existing binary code and you have to reassemble your code for the 65C02.

 

The additional direct addressing mode of some instructions will probably offer the biggest speed improvement.

The new stack instructions actually make it more desirable to use the stack and 65C02 code requires fewer tricks.

Overall I think the added modes/instructions are most beneficial in reducing code size.

The code to use a jump table takes less than half as many instructions. You want to speed up a 6502 interpreter, that would be an easy patch.

If the 6502 had supported these additions to begin with, you probably would have seen support for a few more commands in Microsoft BASIC just because there would have been more room in the ROM.

Come to think of it, rebuilding Microsoft BASIC for the 65C02 could be an interesting project.

The Apple II series would be a likely target since most machines have a 65C02 and the language card built in.

 

I take back what I said about it being an easy patch. Microsoft BASIC pushes the jump address onto the stack but then calls the CHARGET routine which preloads the next character in the program and then returns which ends up performing the jump. It keeps each routine from having to call CHARGET for the next character in the program.

Putting the code inline would be faster as long as the parser dumped whitespace. That last bit is very important because having to skip spaces would slow the routine to the point where savings by using 65C02 instructions and eliminating costly JSR and RTS instructions is lost.

 

The code would look something like this:

 

; Microsoft 6502 BASIC patch to use 65C02 JMP (TABLE,X) instruction to execute tokens
; assuming this will be faster.
; This has never been assembled or tested so there may be syntax errors and if X is supposed to hold a value it won't work
;
; This is based on Applesoft II disassembly found here:
; http://www.txbobsc.com/scsc/scdocumentor/

*--------------------------------
*      EXECUTE A STATEMENT
*
*      (A) IS FIRST CHAR OF STATEMENT
*      CARRY IS SET
*--------------------------------
EXECUTE.STATEMENT
	BEQ	RTS.3		;END OF LINE, NULL STATEMENT
EXECUTE.STATEMENT.1
	SBC	#$80		;FIRST CHAR A TOKEN?
	BCC	.1		;NOT TOKEN, MUST BE "LET"
	CMP	#$40		;STATEMENT-TYPE TOKEN?
	BCS	SYNERR.1	;NO, SYNTAX ERROR
	ASL			;DOUBLE TO GET INDEX
;	TAY			;INTO ADDRESS TABLE
;	LDA	TOKEN.ADDRESS.TABLE+1,Y
;	PHA			;PUT ADDRESS ON STACK
;	LDA	TOKEN.ADDRESS.TABLE,Y
;	PHA
;	JMP	CHRGET		;GET NEXT CHR & RTS TO ROUTINE	
	TAX
	LDA	#0		;Clear Y
	TAY
*--------------------------------
*      GENERIC COPY OF CHRGET SUBROUTINE, WHICH
*      IS COPIED INTO $00B1...$00C8 DURING INITIALIZATION
*
*      CORNELIS BONGERS DESCRIBED SEVERAL IMPROVEMENTS 
*      TO CHRGET IN MICRO MAGAZINE OR CALL A.P.P.L.E.
*      (I DON'T REMEMBER WHICH OR EXACTLY WHEN)
*--------------------------------
;GENERIC.CHRGET
	INC	TXTPTR
	BNE	.2
	INC	TXTPTR+1
;.2	LDA	$EA60		;<<< TXTPTR... ACTUAL ADDRESS FILLED IN LATER >>>
.2	LDA	($TXTPTR),Y	;Page zero TXTPTR from CHARGET routine
	CMP	#':'		;EOS, ALSO TOP OF NUMERIC RANGE
	BCS	.3		;NOT NUMBER, MIGHT BE EOS
;	CMP	#' '		;IGNORE BLANKS
;	BEQ	GENERIC.CHRGET
	SEC			;TEST FOR NUMERIC RANGE IN WAY THAT
	SBC	#'0'		;CLEARS CARRY IF CHAR IS DIGIT
	SEC			;AND LEAVES CHAR IN A-REG
	SBC	#-'0'
.3
;	RTS
	JMP	(TABLE,X)
.1	JMP	LET		;MUST BE <VAR> = <EXP>

Link to comment
Share on other sites

Apple IIgs runs the 65816 at 2.8 MHz. If you compare that to 1 Mhz of earlier Apple II then maybe you could claim 400% speedup for some tasks.

 

Realistically a 65C02 vs standard 6502 with the OS and Basic interpreter reworked to use the new instructions, you'd get maybe 4% speedup.

 

Where/how anyone could come to an 800% figure, who knows? An original Mac vs old A2 it'd be a realistic claim.

Link to comment
Share on other sites

In some Apple promotional material they claim the 65C02 would give an 800% speed-up. I wonder what corner-case they were referring to?

Someone was a real optimist. An 800% speedup is in the territory of fundamental algorithm changes.

 

I think that little piece of code I posted cuts over 20 clock cycles per BASIC token but I didn't count and most of that was achieved by putting CHARGET inline.

When you consider how many tokens are executed per second that sounds like it adds up in a hurry, but with as many clock cycles as there are per second, that's nothing.

It's not bad for one change though and there are some instructions that dropped a clock cycle on the 65C02... but then a few gain a clock cycle.

But hey, with enough optimizations like that you might speed up Microsoft BASIC by at least 1%! :P

Maybe Microsoft BASIC isn't the best example to use since it's so tied to the 6502. It's obvious the 65C02 has some potential.

I'm not sure what % speedup the 65C02 could give you overall but even with code changes, I don't see the 65C02 matching the speedup the 64180 offers over the Z80 on identical code.

The 4MHz 6502 in the IIc Plus is another matter though.

Link to comment
Share on other sites

Apple IIgs runs the 65816 at 2.8 MHz. If you compare that to 1 Mhz of earlier Apple II then maybe you could claim 400% speedup for some tasks.

 

Realistically a 65C02 vs standard 6502 with the OS and Basic interpreter reworked to use the new instructions, you'd get maybe 4% speedup.

 

Where/how anyone could come to an 800% figure, who knows? An original Mac vs old A2 it'd be a realistic claim.

I really doubt the 400% on the IIgs with the same code but a 4MHz 65C02 would definitely do it.

The speed improvement for Ahl's benchmark on the IIgs and IIc Plus pretty much reflect than the IIgs is slower for 6502 code than the IIc Plus.

The 65816 could do it at a slower speed if you can use 16 bit instructions but it's an even bigger rewrite than for the 65C02.

 

Reworking the OS and interpreter would certainly help but the interpreter has some fundamental design issues that slow it down.

If you can rework the interpreter you will get much more than a 4% speedup even on the 6502.

But then similar improvements could possibly be made to the Z80 version and it really doesn't help with the comparison.

 

Link to comment
Share on other sites

I never studied the extra op-codes for the 65C02. Do you think those generally are more useful than the undocumented ones? I know the undocumented ones are a bit of hit and miss, but at least some of them seem reliable enough to be used across several batches of 6502 from various manufacturers. Of course they won't run on a 65C02 which has proper instructions on locations previously used for the undocumented ones.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...