Jump to content
IGNORED

Z80 vs. 6502


BillyHW

Recommended Posts

65C02 supposedly changes the undocumented ones to NOP.

 

Reliability of undocumented on old NMOS 6502 seems not really an issue. Used by Atari 2600 (6507), mainstream Atari 8-bit (6502B and 6502C "Sally") as well as C64 (6510) . C128 and Plus4 also use them, both those machines have later 6510 descendants.

 

I'd probably rate the 65C02 as more useful than the undocumented ones. Stuff like unconditional branch, store zero,test and set/clear bits, jump indirect with index, push/pull X/Y to stack.

 

Just the push/pull X/Y to stack in a time-critical interrupt scenario should grab back those cycles otherwise lost by not being able to use an undocumented instruction.

  • Like 3
Link to comment
Share on other sites

When I assemble my Mockingboard music player for the 65C02, it saves about 3% on code size.
That's just by using STZ, BRA, PHY, PHX, PLX and PLY.
That saves at least 9 clock cycles off the interrupt handler if it has to output data to the sound card during an interrupt.

Not huge results but I'll take it. Undocumented opcodes probably wouldn't save that much.

If I enable the command processing code within the player it would save a little more since it uses a jump table for commands and that piece is several instructions smaller on the 65C02.

I have my doubts about getting to 4% savings even with that and I don't see any other optimizations I can make based on the 65C02.

Not saying there aren't any, just that I can't see any. I found several 6502 optimizations when I looked at the code this morning.

Link to comment
Share on other sites

If you avoid instructions setting the V (overflow) flag, or at least situations that would signal it, you can CLV at the beginning of a routine and use BVC as a make-shift BRA instruction.

 

For comparison I checked my 6502 music player, mainly used on VIC-20, Plus/4 and lately CreatiVision. It doesn't push X or Y once, nor does it store those registers to temporary locations. I'm using the combination LDA #0; STA nnnn,X thrice, but one of these includes a jump to a subroutine that uses the current value of the accumulator (zero or otherwise) so 4 bytes to save. I have one JMP that could be replaced with BRA, so another byte. As the total executable is 514 bytes, that means these mentioned instructions would save me 0.97% if ran on a 65C02. Dunno about speed improvement, perhaps a cycle or two. I'm sure the 65C02 has more to offer than so, but these were the ones mentioned here.

Link to comment
Share on other sites

I used BBR7 to save another byte and at least another clock cycle. That code executes every interrupt.

BBR/BBS instructions may come in very handy if (when?) I finish the sound effects support.

The player and demo code which includes several small tunes takes 329 bytes and the IRQ handler takes 113 bytes for one AY chip.
With the code to handle the 2nd AY chip enabled it jumps to 365 total bytes and the IRQ handler is 141 bytes.
I could shrink the code a little but it would run slower and it's pretty small already.
FWIW, this player just dumps values to AY registers and does simple timing. It's great for the background music for Donkey Kong but complex song data would be huge unless I complete some commands for repeats and such. I also need to write a tracker for it. I had to write the Donkey Kong tunes by hand. That was painful.

Link to comment
Share on other sites

Ah, my player is based on blocks of duration based music, where blocks are combined into tracks (or playlists if you like) for each sound channel. Each block can be repeated or transposed as required. It makes the player a bit bigger, but for longer pieces of music one saves a lot of data space, in particular songs where certain patterns repeat and/or appear at different transposition (e.g. bass line). I've written my music in OctaMED (other trackers would be fine too, as long as they can print text to file) and then use an own tool to split tracks and convert them to duration based syntax. It takes a bit of hand polishing to get the final result, but most of the data gets there in the first place.

  • Like 1
Link to comment
Share on other sites

The latest 65C02 version with support for 2 AY chips is now 5.2% smaller than the 6502 version.
I really didn't think I could shrink the code that much just using the 65C02 CPU.
About half of the savings are in the interrupt handler but I had most of those at least 2 years ago just by putting in the new stack instructions.
A few instructions in the code have one less clock cycle but I'm not going to do cycle counts to see how much time is saved.

Link to comment
Share on other sites

Ah, my player is based on blocks of duration based music, where blocks are combined into tracks (or playlists if you like) for each sound channel. Each block can be repeated or transposed as required. It makes the player a bit bigger, but for longer pieces of music one saves a lot of data space, in particular songs where certain patterns repeat and/or appear at different transposition (e.g. bass line). I've written my music in OctaMED (other trackers would be fine too, as long as they can print text to file) and then use an own tool to split tracks and convert them to duration based syntax. It takes a bit of hand polishing to get the final result, but most of the data gets there in the first place.

The player started out as a port from a program on the Aquarius. I wanted to compare code size between Z80, 6803, 6809 and 6502.

The code may play the same data as the original Aquarius version but even the Z80 code is very different now.

I ported it to the Oric because it was the only 6502 computer I knew with a built in AY chip. It just so happens the Oric interfaces it's AY chip the same way as the Mockingboard so I ported it when someone suggested adding Mockingboard support to Donkey Kong.

 

I could add repeats and transposition through commands and register setting modifiers but it would require some work to implement.

The command extensions are basically pieces of code that get called based on data embedded in the music. I could build an entire interpreter using that if I wished. Counters, jumps, setting signals for the main program, reading signals from the main program... but the tracker ends up being part compiler or I need an external compiler to convert the output from a tracker. Yup, I invented smart music. I also decided nobody would use it so I dropped the idea. But hey, if anyone wants their music to play at random octaves, I know how to do that!

Link to comment
Share on other sites

AppleWin does not support 65C02 instructions that are only on Rockwell and WDC 65C02 chips. I suppose that means the 65C02 that came with the Enhanced //e doesn't either.
The 65802/65816 do support the other instructions so it would work on a IIgs.
A small change to the code source wise but that cut the number of optimizations that could be used on a //e.

Link to comment
Share on other sites

The one in my Platinum //e is a 65SC02, which means it doesn't have the Rockwell extensions.

I checked the docs on the Western Design site and they show this:

*Except for the BBRx, BBSx, RMBx, and SMBx bit manipulation instructions which do not exist for the W65C816S

 

That pretty much makes those instructions useless unless you buy a CPU that has them.

If you want to write non-portable code, that's a good way to do it.

 

I can understand wanting to drop them for the 65816, each of those instructions has 8 separate opcodes.

TRB and TSB will replace a few of the changes I made but BBR and BBS were kinda handy.

Link to comment
Share on other sites

  • 4 weeks later...

I may be a bit late to this thread, but one thing that wasn't mentioned is that the Z-80 had built-in support for DRAM refresh, which saved a few chips.

 

Unfortunately, it was only 7-bit refresh, so it became mostly useless when 64K DRAMs came out. At some point in the past couple of years I came across a web page where someone explained the weird counters used in the Z-80. It explained why the R register only counted 7 bits because of how it was connected with the I register. I was really surprised and sad that it was due to such an obscure reason.

Link to comment
Share on other sites

I may be a bit late to this thread, but one thing that wasn't mentioned is that the Z-80 had built-in support for DRAM refresh, which saved a few chips.

 

Unfortunately, it was only 7-bit refresh, so it became mostly useless when 64K DRAMs came out. At some point in the past couple of years I came across a web page where someone explained the weird counters used in the Z-80. It explained why the R register only counted 7 bits because of how it was connected with the I register. I was really surprised and sad that it was due to such an obscure reason.

 

I think that in a lot of cases, the designers of a Chip ( e.g. CPU, DMA, I/O, PIA/VIA, etc... ) did the best they could in for seeing how their device would be used at the time and in the future.... Occasionally there is a limitation, such as space on the Silicone Die or Costs that would push the device beyond the Target Price, but in reading history about most the designers, they wanted to make the Best Product available....Sometime their devices were used in ways they never imagined....

 

The WOZ, knew that ALL those slots in the Apple ][ would be important, in the future, even though they had no purpose when the Apple ][ was shipped.. Jobs was thinking they needed maybe Two of the Slots.. It was a good thing that Woz got his way, because the Apple ][ was chosen for all kinds of purposes that were never envisioned when it was released.. In fact the Original IBM PC came with so many Slots, because the Apple ][ did..

Link to comment
Share on other sites

I may be a bit late to this thread, but one thing that wasn't mentioned is that the Z-80 had built-in support for DRAM refresh, which saved a few chips.

I'm pretty sure that was mentioned in post #16

 

Unfortunately, it was only 7-bit refresh, so it became mostly useless when 64K DRAMs came out. At some point in the past couple of years I came across a web page where someone explained the weird counters used in the Z-80. It explained why the R register only counted 7 bits because of how it was connected with the I register. I was really surprised and sad that it was due to such an obscure reason.

I think several of the custom chips added the 8th bit for the refresh based on comments I've read but I can't personally verify it.

That would still cheaper than a full controller.

I know some of the Z80 systems used SRAM.

 

FWIW, the Apple II has a screwy screen memory map so that it can use the screen refresh to refresh the DRAM chips.

DRAM refresh doesn't require a specific refresh cycle, it just needs to be accessed on a regular basis.

Edited by JamesD
Link to comment
Share on other sites

...

The WOZ, knew that ALL those slots in the Apple ][ would be important, in the future, even though they had no purpose when the Apple ][ was shipped.. Jobs was thinking they needed maybe Two of the Slots.. It was a good thing that Woz got his way, because the Apple ][ was chosen for all kinds of purposes that were never envisioned when it was released.. In fact the Original IBM PC came with so many Slots, because the Apple ][ did..

WOZ was used to looking at hobby computers that were either single board computers with limited expansion or everything was in slots.

DEC PDP and other systems were also built that way since what? The 50s?

 

The Apple I had been more like single board computers. It had an edge connector... but WOZ threw in one slot for a future cassette interface.

I think he realized that would quickly get outgrown when he designed the II. If he had time, the Apple I probably would have had more slots.

Edited by JamesD
  • Like 1
Link to comment
Share on other sites

Fairly sure Z80 auto-refresh was mentioned before - I wasn't aware it was only 7 bits worth though.

How did Apple 2 ensure the refresh coverage? Is the RAM arranged such that sequential accesses hit a different row each time or something?

I know the screen interleave is 8:1 and 64:1. That's enough to hit rows or columns of RAM at proper intervals but I don't know how it generates the CAS or RAS signals to the RAM chips off of that.

*edit*

I know those signals are used somehow because the timing on the II/II+ has to be changed from one of those to a different signal for reliable 65C02 use.

Edited by JamesD
Link to comment
Share on other sites

I thought it'd either be that, or wire the Ram up such that the row/column mixes up a bit.

 

Problem with mixing the Row/Column mapping though is that what works with one Ram config won't necessarily work with another.

 

Fairly sure most/all the earlier DRam types used a Cas-only cycle for refresh (?) - later chips provide a pin dedicated for refresh which gives the option to automate it at the chip level.

 

Practically all the early 6502 systems with DRam used 7400 series logic to implement the translation of address to R/C, later implementations incorporated all the mess into one place, e.g. VIC-2 and Freddie.

Link to comment
Share on other sites

I thought it'd either be that, or wire the Ram up such that the row/column mixes up a bit.

 

Problem with mixing the Row/Column mapping though is that what works with one Ram config won't necessarily work with another.

 

Fairly sure most/all the earlier DRam types used a Cas-only cycle for refresh (?) - later chips provide a pin dedicated for refresh which gives the option to automate it at the chip level.

 

Practically all the early 6502 systems with DRam used 7400 series logic to implement the translation of address to R/C, later implementations incorporated all the mess into one place, e.g. VIC-2 and Freddie.

There is definitely a RAS signal in the circuit.

http://www.txbobsc.com/aal/1985/aal8510.html#a2

Link to comment
Share on other sites

The video scanning circuit does refresh and displays the graphics, and does so on one clock edge while the CPU runs on another, which maximizes the 1Mhz delivered to the CPU.

 

No DMA / Interrupts in the default system. The video system just works in parallel with the CPU. Add on cards can choose to do both an interrupt and or DMA.

Link to comment
Share on other sites

FWIW, the Apple II has a screwy screen memory map so that it can use the screen refresh to refresh the DRAM chips.

DRAM refresh doesn't require a specific refresh cycle, it just needs to be accessed on a regular basis.

 

The screwy memory map for video was simply so Woz could save a chip in the counter chain. DRAM doesn't care what order you refresh it in, just as long as you get all addresses in a page. And the screwyness was only in the high address lines anyhow. Refresh could be as simple as an IRQ routine consisting of 127 NOPs and an RTI.

Link to comment
Share on other sites

  • 2 months later...
  • 4 weeks later...

The Z80 by far is the better processor. As a matter of fact, there's a whole line of them still in production. They're used in alot of microcontroller applications (washing machines, etc), in variants from the original 2.3 Mhz all the way up to 50 Mhz. It's one of, if not THE best 8 bit processor every made.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...