Thomas Jentzsch Posted September 18, 2014 Share Posted September 18, 2014 A very good read. Thanks for the link. Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 18, 2014 Share Posted September 18, 2014 I gave it a look too... thinking some more, 6809 efficiency is somewhat more than my previous estimate especially if the programmer is aiming for it. Seems it was late to the party though. CPU culture is traditionally hard to break. Most prominent example obviously being the Mac with 68K then PPC then x86/64. But most of the big companies in the old days were set in their ways. Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 18, 2014 Share Posted September 18, 2014 I gave it a look too... thinking some more, 6809 efficiency is somewhat more than my previous estimate especially if the programmer is aiming for it. ... If you are only using the A register and Y as a counter for a simple loop, the 6502 can actually be faster than the 6809 due to the lack of a prefetch on the 6809. As long as you don't have a lot of indexing, can unroll loops, don't use the stack a lot, etc... the 6502 does pretty well. As soon as you need to use much indexed addressing the 6502 starts to show it's weakness and using 16 bit numbers/pointers pretty much swings things in favor of the 6809. Add the auto increment, auto decrement, 2 accumulators or 16 bit D register, single instruction stack operations, 2nd stack pointer, multiply, etc... and the time saved by the prefetch is more than eaten up by extra instructions. Just being able to use a stack pointer as a data pointer can make a huge difference code size wise. You can load multiple registers and update your data pointer with a single instruction. The combined D register also lets you load two bytes of data with a single instruction. Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 18, 2014 Share Posted September 18, 2014 If you are only using the A register and Y as a counter for a simple loop, the 6502 can actually be faster than the 6809 due to the lack of a prefetch on the 6809. As long as you don't have a lot of indexing, can unroll loops, don't use the stack a lot, etc... the 6502 does pretty well. ... That should probably say "and X or Y as a counter". Quote Link to comment Share on other sites More sharing options...
flashjazzcat Posted September 19, 2014 Share Posted September 19, 2014 (edited) Providing complex data structures are arranged optimally for the 8-bit accumulator, the only things I tend to miss on the 6502 (with linked lists and the like) are STX abs,Y and STY abs,X to complement the load instructions. As with PHA and PLA, lots of register copying necessary there which would have been nicely avoided. I've never used (ZP,X) once in twenty years of assembly coding, so I wouldn't miss that at all. Providing code isn't in ROM, of course, self-modifying code can be a nice way of getting around the lack of LDA/STA (ZP),X if you happen to be using both registers at once to access data in a loop. I do try to avoid it, though, especially since I started to do a lot of cartridge coding. Edited September 19, 2014 by flashjazzcat Quote Link to comment Share on other sites More sharing options...
NorthWay Posted September 27, 2014 Share Posted September 27, 2014 But by the time XL came out they were using 150 ns chips, and pretty sure the XE had 90 ns chips. IIRC the Amiga was designed around 150ns memory. Or was it 120ns? I'm pretty sure I read that the ST "overclocked" its chips compared to spec. I'm guessing 150/90 was what was the cheapest available on the market at that time. (And why does someone do something as idiotic as the Electron and split bytes into nibbles?) (Why #2 didn't C= 128 2MHz mode just sync down to 1MHz when the video was fetching memory?) Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 27, 2014 Share Posted September 27, 2014 The C= design from the start seems kind of flawed - the initial idea was supposedly to run the CPU at full speed and put up with the sporadic DMA like Atari does. Supposedly a little later the decision was to go to the slightly faster system bus speed which gives the compressed horizontal screen you see. And for whatever reason they went with the 50/50 memory access split (of course with the odd extra steal of dedicated CPU cycles). If you look at the allocation of cycles, you can see even with all sprites active there would still be a few spare cycles the CPU otherwise could have had - plus in normal circumstances, VIC doesn't exactly use a lot of it's cycles. The Electron - I think it was simply cost-cutting. At the time they could get the 4-bit DRams cheaply, but apparently not quite cheap enough to just have 64K rather than the method they chose to give 32K. Amiga - well, the system bus on base machines was ~ 7.2 MHz. Assuming access time needs to be about half a cycle duration that makes for 69 ns, about 62.5 ns for the ST at 8 MHz. Quote Link to comment Share on other sites More sharing options...
NorthWay Posted September 27, 2014 Share Posted September 27, 2014 I'm pretty sure I read that the ST "overclocked" its chips compared to spec. Ah, I found out where I had read that: http://www.dadhacker.com/blog/?p=1383 (Bit odd though when reading this a second time - the 68010 handled exceptions properly in 1982, and the 68020 came out in 1984 - too expensive I guess.) (And if you read http://www.easy68k.com/paulrsm/dg/dg34.htm it seems strangely spot on in its Atari speculations about what was about to happen after the buyout...(page 21)) Quote Link to comment Share on other sites More sharing options...
Chilly Willy Posted September 28, 2014 Share Posted September 28, 2014 Amiga - well, the system bus on base machines was ~ 7.2 MHz. Assuming access time needs to be about half a cycle duration that makes for 69 ns, about 62.5 ns for the ST at 8 MHz. The 68000 uses 4 clock cycles for a read or write cycle. That WOULD be ~560 ns, however, the 68000 only actually uses the last two cycles of the four to read/write the data. So the Amiga chipset was designed to use the first two cycles, leaving the next two free for the CPU. That theoretically made the CPU run at full speed while the chipset was active. However, not all 68000 cycles are multiple of four, which adds a two cycle wait to some instructions, and the chipset needs some of the CPU slots to do other things - in particular, more than 4 bitplanes in low-res mode, or more than 2 bitplanes in high-res mode can steal CPU slots. Anywho, a slot is therefore two cycles, or about 280 ns... VERY cheap memory. That was also one of the reasons Sega used the 68000 for the Genesis - really cheap slow roms. In fact, the VDP DMA on the Genesis is faster than the 68000 bus cycle, so it fails on old (slow) roms if you try to DMA directly from rom. Sega recommended copying/decompressing from rom to ram using the CPU, then DMAing from ram to vram. Newer carts were made using faster access times, so the game could DMA directly from rom to vram. 1 Quote Link to comment Share on other sites More sharing options...
Rybags Posted September 28, 2014 Share Posted September 28, 2014 Worst-case, Amiga is losing 75% of cycles just for playfield DMA if 6 bitplanes active. So I would think they'd need the faster Ram. I've got 3 machines here, might have to look inside one. Or just find some pics. Of several "Amiga 500 motherboard" pics I looked at, most are 80ns and one was 60ns Ram. Chip Ram needs to be faster since the chipset can potentially hit it on any cycle. The remaining Ram that the chipset can't access could potentially run slower. Quote Link to comment Share on other sites More sharing options...
NorthWay Posted September 28, 2014 Share Posted September 28, 2014 (edited) I found an A1000 picture showing TMS4464-12NL chips. I don't know if I should interpret the datasheet to classify it as 120ns or 230ns (you kinda would like to read or write, not just access it?). Then I found an A500 internal expansion using km41256ap-15 which seems to be 150/260ns. (And worst case would be 4 bitplanes hires - grabbing all the cycles.) Edited September 28, 2014 by NorthWay Quote Link to comment Share on other sites More sharing options...
Chilly Willy Posted September 29, 2014 Share Posted September 29, 2014 (edited) Worst-case, Amiga is losing 75% of cycles just for playfield DMA if 6 bitplanes active. So I would think they'd need the faster Ram. That's 50% of the CPU slots. Remember that the CPU takes four cycles to do a read or write, with the actual data being read or written in the last two cycles. The chipset always takes those first two "free" cycles, so the CPU has 100% of the slots it's CAPABLE of using free as long as the chipset doesn't take any of those slots. Five bitplanes (low-res) takes 25% of those slots, and six takes 50%. High-res bogs down more as 3 bitplanes takes 50% of those slots, and 4 bitplanes takes 100% of those slots. That's why stock Amiga 500s and 2000s are so effing slow in 16 color high-res - you only have free CPU slots in the blank periods when running in chip ram. That's why they recommend getting fast ram for your old Amiga. I've got 3 machines here, might have to look inside one. Or just find some pics. Of several "Amiga 500 motherboard" pics I looked at, most are 80ns and one was 60ns Ram. Chip Ram needs to be faster since the chipset can potentially hit it on any cycle. The remaining Ram that the chipset can't access could potentially run slower. They used whatever memory was cheapest to get in bulk at the time the assembly plant needed ram, but 150ns access time was plenty fast. Remember that the chipset was designed to be a console in 1983. There's no way they would have designed it around 80ns ram. The chipset uses two CPU clocks per cycle for all memory access operations. All COPPER timing is around that frequency. The chipset always has priority over the CPU when it needs more of those two-cycle slots. The BLITTER has a bus hog setting that can lock the CPU entirely until the blit is done, but most folks didn't do that. All this is in the various hardware books Commodore put out. I've got them all. You can also find these hardware manuals online in various formats. AGA changed this timing in two ways - first, they added what was called Double-CAS mode - they did two page mode reads in the same amount of time as one normal read, or one read per clock cycle (the 68020 ran at 2X the clock cycle, or 14.4 MHz). The also had Double-Wide mode, which fetched 32 bits instead of 16 bits. You could set either mode independent of the other, and one or the other or both were needed in order to fetch enough data for AGA modes. For example, the sprites normally fetch one word per access slot per bitplane per line. An AGA sprite would run in Double-CAS/Double-Wide mode to fetch two longs per bitplane per line, allowing sprites to be 64 pixels wide instead of 16. Likewise, switching to a faster fetch mode allowed low-res displays to show 8 bitplanes (or high-res to show 4 bitplanes) without using all available memory access slots like under the old chipset. High-res 8 bitplane still took all access slots. The new SuperHiRes mode (1280 wide instead of 640) took all access slots at 4 bitplanes even with the faster fetch mode. Edited September 29, 2014 by Chilly Willy 1 Quote Link to comment Share on other sites More sharing options...
barrym95838 Posted October 6, 2014 Share Posted October 6, 2014 (edited) I'm not sure I've ever used the (ind,X) more than a few times. Or even if the Atari OS even uses an instruction in that mode. If it was replaced with something else, I doubt it'd be missed much. Those other instructions - largely covered by the later 65C02, but of course it's of little use when the established base was already there on the original CPU... (zp,X) comes in handy once in awhile in things like Forth interpreters, where X is typically used as the parameter stack pointer, and you want to use a stack cell as a pointer without having to copy it to a fixed zp location. I used it in couple of spots in my translation of the VTL-2 mini-interpreter from the 6800 to the 6502, but only in the degenerate case where X was 0. Here's a snip; ... ;------------------------------------------------------ ; Delete/insert program line and restart command prompt ; entry: Carry must be clear ; uses: find, start, {@ _ # & * (}, linbuf ; skp2 tya ;save linbuf offset pointer pha jsr find ;locate first line >= {#} bcs insrt lda lparen cmp pound ;if line doesn't already exist bne insrt ; then skip deletion process lda lparen+1 eor pound+1 bne insrt tax ;x = 0 lda (at),y tay ;y = length of line to delete eor #-1 adc ampr ;{&} = {&} - y sta ampr bcs delt dec ampr+1 delt lda at sta under ;{_} = {@} lda at+1 sta under+1 delt2 lda under cmp ampr ;delete the line lda under+1 sbc ampr+1 bcs insrt lda (under),y sta (under,x) inc under bne delt2 inc under+1 bcc delt2 ;(always taken) insrt pla tax ;x = linbuf offset pointer lda pound pha ;push the new line number on lda pound+1 ; the system stack pha ldy #2 cntln inx iny ;determine new line length in y lda linbuf-1,x ; and push statement string on pha ; the system stack bne cntln cpy #4 ;if empty line then skip the bcc jstart ; insertion process tax ;x = 0 tya clc adc ampr ;calculate new program end sta under ;{_} = {&} + y txa adc ampr+1 sta under+1 lda under cmp star lda under+1 ;if {_} >= {*} then the program sbc star+1 ; won't fit in available RAM, bcs jstart ; so abort to the "OK" prompt slide lda ampr bne slide2 dec ampr+1 slide2 dec ampr lda ampr cmp at lda ampr+1 sbc at+1 bcc move ;slide open a gap inside the lda (ampr,x) ; program just big enough to sta (ampr),y ; hold the new line bcs slide ;(always taken) move tya tax ;x = new line length move2 pla ;pull the statement string and dey ; the new line number and store sta (at),y ; them in the program gap bne move2 ldy #2 txa sta (at),y ;store length after line number lda under sta ampr ;{&} = {_} lda under+1 sta ampr+1 jstart jmp start ;dump stack, restart cmd prompt ... Using it like that allowed me to open or close gaps <256 bytes wide in the program text without having to use and update two separate pointers for source and destination. Mike. Edited October 6, 2014 by barrym95838 1 Quote Link to comment Share on other sites More sharing options...
barrym95838 Posted October 7, 2014 Share Posted October 7, 2014 I forgot about another spot in the same interpreter that uses (zp,X) for non-zero values of X. It's the rough equivalent of Forth's @ word (pronounced fetch): ... getval3 cmp #'(' ;sub-expression? beq eval ; yes: evaluate it recursively jsr convp ; no: first set var[x] to the lda (0,x) ; named variable's address, pha ; then replace that address inc 0,x ; with the variable's actual bne getval4 ; value before returning inc 1,x getval4 lda (0,x) sta 1,x pla sta 0,x getrts rts ... The Z80, 6809, and even the 6800 have a code-size advantage for activities like this, due to their abilities to load and store 16-bit values with a single instruction, but the 6502 has the best average cycle/instruction ratio of the bunch, so the advantage isn't as large as it might seem. Mike. 1 Quote Link to comment Share on other sites More sharing options...
ClausB Posted October 26, 2014 Share Posted October 26, 2014 An excellent investigation into the Z80's 4-bit ALU: http://www.righto.com/2013/09/the-z-80-has-4-bit-alu-heres-how-it.html 3 Quote Link to comment Share on other sites More sharing options...
Polybius Posted September 17, 2016 Share Posted September 17, 2016 So how come the Z80 shines on SMS but on Spectrum it all looks like regurgitated toe nails? That has to do with the SMS Graphics chips and parser compared to the spectrum. The SMS had additional chips to help with the load, while on the Sinclair machine, the Z80 is responsible for practically everything on a Spectrum, from drawing the graphics, tracking movement, and even parsing text. Plus, the sprites in memory were stored on the spectrum in black and white, and the color was added to the screen as it was drawn. Quote Link to comment Share on other sites More sharing options...
adamchevy Posted September 17, 2016 Share Posted September 17, 2016 I've really enjoyed reading through this thread, it inspires me to continue to study the 6502 processor. You can't find some of this information in books, well not easily anyway. Cheers! 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 25, 2016 Share Posted September 25, 2016 FWIW, I've done a lot of coding on my 64/80 column graphics text code since this thread was written and there are a few things I discovered.1. LDIR was not the fastest way to scroll the screen on the Z80. I used an unrolled loop of LDI instructions.2. The 6803 in the MC-10 which is clocked at .89 MHz, runs the demo faster than the 1.7 MHz 6502 in the Atari. The 8 bit only registers kill the 6502 on the scroll and the screen is pretty ugly during a scroll due to the address mode I had to use. I scroll columns rather than rows. Using the A register 16 bit mode on the 65816 scroll made the Atari version the fastest tested so far. The 65816 memory move instructions are slower than an unrolled loop. I should point out that the VZ200 version does have to move more data in the scroll due to the paged screen memory, so it might actually be faster without that. 3. To get the most speed out of the 6502, I had to reorganize the font data to suit the 6502 addressing modes, this made it possible to dump the costly multiply. It also means I can't just load a font anywhere in memory without modifying the character printing routines since addresses are hard coded. A direct port from another CPU is going to be slow, you have to rebuild things to suit the 6502.4. The 6803 (Motorola) addressing modes seem to be the most useful (IMHO) and I just used a multiply instruction to calculate the character offset in the font. The LDAA ##,X addressing just works better with data structures than LDA ####,x on the 6502. 5. The z180 version of the Z80 code has one change (I haven't looked for others). It replaces 8 instructions with the new multiply instruction to calculate the address of the character in the font data. This alone drops a lot of clock cycles.6. Using the really slow IX, IY addressing mode on the Z80 actually allowed me to make some of the code faster since I didn't have to shuffle values in registers and increment an index register. I used it for printing two characters at a time.7. Since implementing the Atari 65816 code, I have looked at optimizing the Z80 code further. By positioning tables on 256 byte boundaries, I can increment the least significant bytes of the pointers instead of all 16 bits. It is faster, but I haven't compared it to the 65816 code yet. I don't think it's enough of a speed increase to make it the fastest again.8. With the Z180, not only can I use a multiply instruction, but I could also use DMA to scroll the screen which should be much faster than the 65816 code. And that's in addition to the 20% speed increase in execution speed on the same code. I haven't benchmarked it or looked at the number of memory cycles it takes per byte for the memory move.9. I haven't written a 6809 version yet, but I wrote part of the character printing routine and scroll code. It will definitely be faster than the 6803. 10. The 6309 memory move instruction should make it's scroll one of the fastest at 3 clock cycles per byte. The 65816 memory move instruction takes 7 clock cycles per byte and it's unrolled loop is something like 5 clock cycles per byte. 4 Quote Link to comment Share on other sites More sharing options...
adamchevy Posted September 25, 2016 Share Posted September 25, 2016 What's the best vintage hardware to write/debug 6507 programs for the Atari VCS. And when I say best I mean also what's the most readily available hardware today? I've read on the forums that the Apple IIe was used, and those are readily available for a good price. But maybe there's other vintage hardware out their that's better and similarly priced? Quote Link to comment Share on other sites More sharing options...
mbd30 Posted September 25, 2016 Share Posted September 25, 2016 Once again, it comes down to support hardware. The video hardware on Spectrum is primitive in comparison to the SMS. But the Spectrum's sublime audio quality helps make up for the poor graphics quality. 2 Quote Link to comment Share on other sites More sharing options...
CatPix Posted September 25, 2016 Share Posted September 25, 2016 If you take a Spectrum 3 with the AY chip then in this way the Spectreum is better than the SMS Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 26, 2016 Share Posted September 26, 2016 But the Spectrum's sublime audio quality helps make up for the poor graphics quality. I'm really glad I wasn't drinking anything when I read that. 1 Quote Link to comment Share on other sites More sharing options...
JamesD Posted September 26, 2016 Share Posted September 26, 2016 I've been reading through old issues of MICRO looking for info on a couple topics and I ran across some info related to this thread. First of all... I located the info on the 6509. The 6509's actual name was to be the SY6516 and the manufacturer was Synertek. The first mention of the SY6516 that I ran across as in issue 21 (Feb 1980) Page 11.| ... The first article is located in Issue 20 page 36. It is very different from the 65816. 1 Quote Link to comment Share on other sites More sharing options...
MarkO Posted October 9, 2016 Share Posted October 9, 2016 I'm really glad I wasn't drinking anything when I read that. Hot Beverages and Carbonated Drinks are not fun coming out of your nose... MarkO Quote Link to comment Share on other sites More sharing options...
MarkO Posted October 10, 2016 Share Posted October 10, 2016 The first article is located in Issue 20 page 36. It is very different from the 65816. I can't locate any info on page 36 about the SY6516.. I found a Follow Up Article here on PDF Pages 69-76: http://archive.6502.org/publications/micro/micro_34_mar_1981.pdf MarkO Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.