7800/6502 Hardware Questions

jbanes · April 5, 2006

I have a few questions on the 7800's design that I hope someone doesn't mind answering. Sorry if these have been answered before, but I haven't been able to find the necessary documentation. Thanks in advance!

Can the processor be halted in mid-instruction? i.e. When the Maria sets the halt line, the processor is supposed to halt in preparation for DMA from the Maria chip. However, most instructions take at least 2 cycles. Some instructions take as many as 7 cycles. Would the 6502 stop in mid-instruction, or would the Maria be forced to wait up to 6 cycles? The documentation mentions long TIA and memory controller accesses as a possible delays, but it doesn't specify the instructions.
There seems to be some confusion on how (indirect),y memory addressing works. To verify, the indirect address is an 8 bit value that points to a 16 bit, little-endian address in the zero page of memory. That 16 bit address is then added to Y to produce the final address. Correct?
Many opcodes take an extra cycle if the page boundary is crossed. However, none of the documentation seems to suggest which page we're working in relation to. Absolute addressing is particularly confusing, as it's in relation to nothing. The 6502Sim package appears to test against the zero page (0x0000 - 0x00FF), but I'm not certain that's correct. I would tend to assume that the current page would be defined as whatever page is currently flowing through the memory controller. i.e. I should look to the Instruction Pointer for the current page. Does anyone know which one is correct?
PHP and PLP can be used to get and retrieve a byte representation of the processor's internal flags. Does anyone know which order these are in? I've been assuming the following, but I haven't found any docs that confirm it:
```
7 6 5 4 3 2 1 0
x N V B D I Z C 
```
The TIA/Maria registers appear to be mapped into the Zero Page of memory, and the stack area (0x0100 - 0x01FF) is interrupted with Shadow Memory. (0x0100 - 0x013F) Does the 7800 Sally map the zero page and stack pointer to some other address, or are the Zero Page and stack pages cut down to 192 usable bytes each?

Thanks again for your help!

Nukey Shay · April 5, 2006

Not 7800-specific...since I dunno much about that. But regarding 6502...

2) Indirect-Y uses 4 bytes in total (not including the data that it fetches). One byte each for the opcode/$argument, and two bytes existing at $argument and $argument+1 which specify the 16-bit address. When using, the address affected would be the 16-bit address PLUS whatever Y currently holds.

Assuming that $D0/$D1 holds the address $B080, and Y holds a value of $10...LDA ($D0),Y would fetch the value at $B090. Note that the instruction is only using 2 bytes of program space...and keep in mind that the zero-page argument must hold the target address in the common LSB/MSB order. Argument+1 is assumed by the CPU to follow $argument...so 2 bytes are not needed to specify it. This is NOT true with JMP($argument), however...since the pointer can exist anywhere in memory in that case. With Indirect-Y, it's always a zero-page address.

3) The extra-cycle issue is not present on all instructions (JMP $absolute is not affected, for example...while forward branches that are taken and cross a page always are). Seems to be more prevelant with opcodes that use the status register in conjunction with an address. Online charts of opcodes often include which opcode+addressing mode have boundry issues.

On this page, extra cycles are indicated by + next to the cycle times...

http://www.6502.org/tutorials/6502opcodes.html

4) Status register bits follow this pattern: NV-BDIZC. The dash (bit 5) is unused.

Edited April 5, 2006 by Nukey Shay

vdub_bobby · April 5, 2006

A few answers:

2. There seems to be some confusion on how (indirect),y memory addressing works. To verify, the indirect address is an 8 bit value that points to a 16 bit, little-endian address in the zero page of memory. That 16 bit address is then added to Y to produce the final address. Correct?

(indirect),Y addressing works like this:

indirect is an 8-bit zero-page byte which contains the low byte of a 16-bit address. The high byte is found at indirect+1.

So, if $90 = $00, $91 = $F0, and Y = 7 then

   lda ($90),Y

Will load the accumulator with the value found at $F007 ($F000 + 7).

That make sense?

Also: I don't think any of the indexed zero-page opcodes handle zero-page page-crossing; I think if

$FF = $00, $100 = $F0, $00 = $E0, and Y = 3, then

   lda ($FF),Y

Will load the accumulator with the value found at address $E003, not $F003!

Many opcodes take an extra cycle if the page boundary is crossed. However, none of the documentation seems to suggest which page we're working in relation to. Absolute addressing is particularly confusing, as it's in relation to nothing.

It is always the target (16-bit) address.

Examples ($80=$FF, $81=$F8 ,Y=1, X=1)

   lda $F1FF,Y   ;page crossed is $F100 -> $F200
  lda $FF,X	  ;no penalty!  page-crossing not handled in ZP,X opcodes
  lda ($80),Y  ;page crossed is $F800 -> $F900

That make sense?

PHP and PLP can be used to get and retrieve a byte representation of the processor's internal flags. Does anyone know which order these are in? I've been assuming the following, but I haven't found any docs that confirm it:
7 6 5 4 3 2 1 0
x N V B D I Z C 

That is incorrect; the unused bit is bit 5, I think. Look here: http://www.geocities.com/oneelkruns/asm1step.html

	   bit ->   7						   0
		  +---+---+---+---+---+---+---+---+
		  | N | V |   | B | D | I | Z | C |  <-- flag, 0/1 = reset/set
		  +---+---+---+---+---+---+---+---+

Incidentally, that document (and this: http://www.6502.org/tutorials/6502opcodes.html) are probably sufficient to answer all your 6502 assembly questions.

Nukey Shay · April 5, 2006

Also: I don't think any of the indexed zero-page opcodes handle zero-page page-crossing; I think if
$FF = $00, $100 = $F0, $00 = $E0, and Y = 3, then
   lda ($FF),Y
Will load the accumulator with the value found at address $E003, not $F003!

I believe that's correct. The tome I use says it affects JMP($indirect) as well...even though the argument is 16-bits!

Edited April 5, 2006 by Nukey Shay

Rybags · April 5, 2006

1. DMA will halt the processor mid-instruction. Video timing can't wait for anything, which is why most old computers needed means to halt the processor at short notice.

3. abs,X is another example where page-crossing can add the extra cycle. The reason is that the pipeline design actually prefetches data but doesn't always "predict" the address correctly. The extra cycle penalty only applies to instructions which do reads.

Flipper · April 5, 2006

1. DMA will halt the processor mid-instruction. Video timing can't wait for anything, which is why most old computers needed means to halt the processor at short notice.

Are you sure?

The offical docs say there is an uncertainty in the amount of time a DMA startup will take. It takes 5-9 video cycles to halt the CPU. That doesn't sound like the CPU will stop in the middle of an instruction....

The lineram within the maria is double buffered, and that in addition to the amount of time the video chip holds HALT low for should be plenty to aleviate the need to halt the CPU in the middle of an instruction.

Of course, I've never seen a datasheet for a 6502C, and thus don't have any way to prove what goes on at that level.

At any rate, since the MARIA is read only, it probably doesn't matter.

vdub_bobby · April 5, 2006

The extra cycle penalty only applies to instructions which do reads.

Well, kinda. You always have an extra cycle on write instructions:

   lda Absolute,Y ;4 or 5 cycles
  sta Absolute,Y ;5 cycles always!

You can see why here: http://www.nvg.ntnu.no/bbc/doc/6502.txt

Edited April 5, 2006 by vdub_bobby

Nukey Shay · April 5, 2006

Due to the exceptions...it's better just to have the chart handy if you are uncertian while working on a time-critical portion of code

jbanes · April 5, 2006

Thanks guys! That answers nearly all my questions.

So to recap:

1. Yes. The 6502 will halt immediately.

2. Correct.

3. The extra cycle stems from opcodes that address in combination with a register. The register may push it over the original page boundry, demanding that the memory controller reset to a new page.

4. Close, but move the blank bit: N V x B D I Z C

5. Still waiting for someone to comment.

Also: I don't think any of the indexed zero-page opcodes handle zero-page page-crossing; I think if
$FF = $00, $100 = $F0, $00 = $E0, and Y = 3, then
   lda ($FF),Y
Will load the accumulator with the value found at address $E003, not $F003!

That's my understanding as well. The 6502's memory wrapping apparently caused a bug in the JMP instruction. From what I understand, if you jumped to the end of a page, the next instruction would read $xxFF and $xx00 rather than $xxFF and $xxFF+1. The issue wasn't fixed until the 65C02.

vdub_bobby · April 5, 2006

That's my understanding as well. The 6502's memory wrapping apparently caused a bug in the JMP instruction. From what I understand, if you jumped to the end of a page, the next instruction would read $xxFF and $xx00 rather than $xxFF and $xxFF+1. The issue wasn't fixed until the 65C02.

This only applies to the absolute indirect addressing mode for the JMP instruction: JMP ($xxxx)

And it doesn't matter where you are jumping to, it matters where you are getting the destination address.

You can jump to any location you wish; you just can't store the destination address in a (2-byte) location that crosses a page boundary.

Basically, any instruction of the form

   jmp ($xxFF)

is going to be trouble, no matter what the intended destination is.

supercat · April 5, 2006

That's my understanding as well. The 6502's memory wrapping apparently caused a bug in the JMP instruction. From what I understand, if you jumped to the end of a page, the next instruction would read $xxFF and $xx00 rather than $xxFF and $xxFF+1. The issue wasn't fixed until the 65C02.

I am curious as to what's so difficult about handling the JMP($xxFF) instruction. I know the main ALU is only eight bits, but the program counter is sixteen. Why not simply treat $6C as a JMP but jam the state machine logic so that it thinks, after fetching the PC MSB, that it has just fetched a $4C opcode (which would cause it to then fetch two bytes that would be placed into the PC)? Even the 65C02 doesn't seem to do this, since JMP ($xxxx) takes an extra cycle.

Nukey Shay · April 5, 2006

For whatever reason, the carry status isn't introduced when the CPU is calculating the vector origin. I dunno the cause, but I suspect it has something to do with how all other indirect addressing is done (which exist in zero page...and was probably overlooked that the LSB could exist @ $FF). So the ($absolute) addressing mode mimmiced the process too closely....rather than updating the MSB, it remains static - causing the glitch. IMO it's not anything really serious tho, because you can VERY easily avoid any $xxFF vector placement to use the instruction. Supposedly, this bug was fixed in the 65C02 (not the 7800's 6502C).

Kind of funny, but the manual I learned with (Machine Language For Beginners) advised to avoid using this instruction at all...as well as BMI/BPL! It takes all of a couple of sentances to explain the bugs...so not very good advice IMO

supercat · April 5, 2006

For whatever reason, the carry status isn't introduced when the CPU is calculating the vector origin.

The JMP ($xxxx) instruction uses logic similar to other indirect addressing modes, vector fetches, and subroutine returns. Its behavior is a consequence of that. What I fail to understand is why.

To my mind, the most logical way to impliment JMP ($xxxx) would be to start out like a normal JMP, but instead of reverting to the opcode-fetch state, go to the JMP operand-fetch state. In other words, if I do a JMP ($12FF), then start by loading $12FF into the PC, and then pretend that I'm sitting with a PC of $12FF having just fetched an opcode $4C (in which case the processor would, with no difficulty whatsoever, fetch the operand bytes from $12FF and $1300). Perhaps the 6502 design evolved in such a way as to preclude that, but it would have seemed a nice approach for the 65C02.

Edited April 5, 2006 by supercat

Nukey Shay · April 6, 2006

The "why" might not be such a mystery. All 16 variations of bit pattern %xxxx0001 (used to signify indirect addressing) were already used, and the standalone opcode JMP($ind) was instructed to behave similarly. IMO since the bug was probably overlooked in the original 16 opcodes, this could have been passed down to the new instruction. The way that it appears, the zero page carryover bug wasn't caught during the design process. Could have been just somebody looking at zero page and thinking that the MSB is always zero (which works out fine...except if you happen to be working with a vector at the last byte)? Or maybe it was discovered during R&D...but the design itself precluded any chance of fixing it?

DanBoris · April 6, 2006

Yes, you do lose the bottom 64 bytes of RAM on both pages due to the registers.

[*]The TIA/Maria registers appear to be mapped into the Zero Page of memory, and the stack area (0x0100 - 0x01FF) is interrupted with Shadow Memory. (0x0100 - 0x013F) Does the 7800 Sally map the zero page and stack pointer to some other address, or are the Zero Page and stack pages cut down to 192 usable bytes each?

Thanks again for your help!

Edited April 6, 2006 by DanBoris

DanBoris · April 6, 2006

The CPU can be stopped mid-instruction, but NOT mid-clock. Since the video clock runs at 7.16Mhz and the CPU at only 1.79 or 1.19 Mhz it could take up to 5 video clock cycles before the CPU completes it's current clock cycle.

Dan

1. DMA will halt the processor mid-instruction. Video timing can't wait for anything, which is why most old computers needed means to halt the processor at short notice.

Are you sure?

The offical docs say there is an uncertainty in the amount of time a DMA startup will take. It takes 5-9 video cycles to halt the CPU. That doesn't sound like the CPU will stop in the middle of an instruction....

The lineram within the maria is double buffered, and that in addition to the amount of time the video chip holds HALT low for should be plenty to aleviate the need to halt the CPU in the middle of an instruction.

Of course, I've never seen a datasheet for a 6502C, and thus don't have any way to prove what goes on at that level.

At any rate, since the MARIA is read only, it probably doesn't matter.

jbanes · April 6, 2006

I am curious as to what's so difficult about handling the JMP($xxFF) instruction. I know the main ALU is only eight bits, but the program counter is sixteen. Why not simply treat $6C as a JMP but jam the state machine logic so that it thinks, after fetching the PC MSB, that it has just fetched a $4C opcode (which would cause it to then fetch two bytes that would be placed into the PC)? Even the 65C02 doesn't seem to do this, since JMP ($xxxx) takes an extra cycle.

I have a theory on that. (Which is bolstered by vdub's excellent correction to my understanding of the JMP bug.) The 6502 uses 8 bit pages of data, of which it addresses directly. When you cross a page boundary, it needs to reconfigure the memory controller to address a new page. (i.e. An early form of segmented memory.) When the JMP goes to pull the indirect address from memory, it first tells the memory controller which page to address. Then it uses 8 bit addressing to get the two bytes. The first one is returned fine, but the second one causes the 8 bit address register to wrap. Thus, you end up with the wrong value for the second byte.

Yes, you do lose the bottom 64 bytes of RAM on both pages due to the registers.

Excellent! Thanks!

supercat · April 6, 2006

I have a theory on that. (Which is bolstered by vdub's excellent correction to my understanding of the JMP bug.) The 6502 uses 8 bit pages of data, of which it addresses directly. When you cross a page boundary, it needs to reconfigure the memory controller to address a new page. (i.e. An early form of segmented memory.) When the JMP goes to pull the indirect address from memory, it first tells the memory controller which page to address. Then it uses 8 bit addressing to get the two bytes. The first one is returned fine, but the second one causes the 8 bit address register to wrap. Thus, you end up with the wrong value for the second byte.

The 6502 has a single 8-bit ALU. In addition, the program counter includes a counter circuit with a full 16-bit carry chain. With the exception of incrementing the PC, all address computations must be performed in the 8-bit ALU.

When performing the instruction "ADC $1234,X", the following sequence of events takes place:

-1- Perform any pending ALU operation from previous instruction and fetch opcode into instruction register. Increment PC.

-2- Finish any pending ALU operation from previous instruction and fetch operand into operand register (O). Increment PC if appropriate (it is, in this case).

-3- Add X to the operand register while fetching another operand byte into the H register.

-4- Add 1 to the H register while fetching a byte from address H:O. If the add in step 3 didn't produce a carry, go to step 1 of next instruction.

-5- Since the add in step 3 produced a carry, fetch a byte from address H:O again (using the new H).

-1- (next instruction) Perform the add while fetching the next opcode into the instruction register.

-2- (also next instruction)Store the result of the add into the accumulator while fetching the next operand into the operand register.

Note that ALU operations take a cycle to perform; the 6502 takes advantage of little-endian addressing so that the math can be performed on the low byte of an address while the high byte is being fetched. Note also that the 6502 can overlap ALU operations with opcode fetches.

The odd behavior of the JMP ($xxxx) instruction is a result of the instruction's processing using the ALU to handle the increment necessary to fetch the second indirect byte. My question is why even the 65C02 uses the ALU for that rather than the program counter.

Rybags · April 6, 2006

The 6502 base clock is often based on a divisor of the master clock, so it gets the proper phase 0/2 signals.

Systems like the C-64 use it to advantage to interleave most of the video DMA (it's RAM runs at 2 MHz). So, it suffers less DMA penalty (but runs a lot slower than the Atari to start with).

Player-missile graphics are proof that DMA does perform cycle-exact halting. ANTIC fetches the data (a dummy read, really), and GTIA just grabs whatever is sitting on the data bus during the relevant cycles.

The garbled vertical lines you often see on an A8 after a crash, or if PMG DMA isn't enabled on Antic isn't just random data - it's actually operands, instructions and memory stores from the 6502.

jbanes · April 6, 2006

The odd behavior of the JMP ($xxxx) instruction is a result of the instruction's processing using the ALU to handle the increment necessary to fetch the second indirect byte. My question is why even the 65C02 uses the ALU for that rather than the program counter.

Actually, your explanation makes perfect sense. Incrementing the program counter would skip the next instruction rather than fetching two bytes of the indirect address. So the ALU is used to increment the 8 bit portion of the address to fetch the second byte.

For example:

Let $4004 = $10

Let $4005 = $0a

1. The processor hits the instruction JMP ($4004)

2. The processor fetches the absolute address $4004

3. $10 is found to be the low byte of the address

4. Add 1 to the low byte of the Absolute address ($04 + $01 = $05 | ( $40 << 8 ) = $4005)

5. The processor fetches the absolute address $4005

6. $0a is found to be the high byte

7. The bytes are combined to form the address $0a10

8. $0a10 is loaded into the Program Counter (aka Instruction Pointer, depending on your terminology)

In the case of this bug, the following happens:

Let $4000 = $09

Let $40FF = $10

Let $4100 = $0a

1. The processor hits the instruction JMP ($40FF)

2. The processor fetches the absolute address $40FF

3. $10 is found to be the low byte of the address

4. Add 1 to the low byte of the Absolute address ($FF + $01 = $00 | ( $40 << 8 ) = $4000)

5. The processor fetches the absolute address $4000

6. $09 is found to be the high byte

7. The bytes are combined to form the address $0910

8. $0910 is loaded into the Program Counter

In order to fix the problem, there obviously needs to be a step 4.5, similar to the step 4 you gave:

4.5 If the carry flag is set, increment the high byte. ($40 + $01 = $41)

Does that make sense?

EricBall · April 16, 2006

5. Yup. Better than the 2600 which has only 128 bytes and shadows the stack and zero page together. Note 0040-00FF is shadowed to 2040-20FF and 0140-01FF is shadowed to 2140-21FF.

Handling JMP (xxFF) was probably judged to require too much logic for that single exception which could be reasonably avoided by a competent programmer.

Bryan · May 8, 2006

1. DMA will halt the processor mid-instruction. Video timing can't wait for anything, which is why most old computers needed means to halt the processor at short notice.

Are you sure?

The offical docs say there is an uncertainty in the amount of time a DMA startup will take. It takes 5-9 video cycles to halt the CPU. That doesn't sound like the CPU will stop in the middle of an instruction....

Make sure you're reading an Atari-related document when it comes to CPU timing. The Atari does halt the CPU on any cycle where DMA occurs. This was originally accomplished using discrete logic on the 400/800, but the Sally CPU (6502C) moved the function into the chip (via the HALT pin).

A typical 6502 has a READY input that will stop execution on the 1st read cycle encountered, which is used on other systems which is why the C64 halts the CPU 3 cycles ahead of when it must do character DMA. Atari did not use this feature but rather stopped the clock signal, effectively freezing the CPU until DMA was finished.

-Bry

supercat · May 8, 2006

A typical 6502 has a READY input that will stop execution on the 1st read cycle encountered, which is used on other systems which is why the C64 halts the CPU 3 cycles ahead of when it must do character DMA. Atari did not use this feature but rather stopped the clock signal, effectively freezing the CPU until DMA was finished.

I wonder why the 2600 didn't do that? One would still probably want a non-pausing clock wire to feed the RIOT, but the number of pins on the TIA would remain the same (output CPU clock and RIOT clock, instead of general clock and READY). Using something else would allow a CPU to be chosen with another possibly-useful pin (IRQ or NMI, tied to the RIOT, would actually have been somewhat handy). Had Atari been willing to use a larger edge connector, A13 would also have been a nice choice.

supercat · May 8, 2006

In order to fix the problem, there obviously needs to be a step 4.5, similar to the step 4 you gave:

I missed your post earlier. My inclination for implimenting JMP ($xxx) would be to do something like:

-1- During first-byte operand fetch, if opcode is 01x01100, go to JMP1 state

-2- During JMP1 state, copy the previously-fetched operand into PCL and the current data bus contents into PCH. If bit 5 of latched opcode was set, clear bit 5 of the latched opcode and go back to state -1-. Otherwise fetch the next opcode.

When the processor does JMP $1234 starting at address $10FE, it has no trouble handling the page crossing while fetching the high byte of the operand. My implementation would be to have the state of the CPU after the third cycle of "JMP ($10FF)" by the same as the state of the CPU after fetching a JMP opcode from $10FE.

Bryan · May 8, 2006

A typical 6502 has a READY input that will stop execution on the 1st read cycle encountered, which is used on other systems which is why the C64 halts the CPU 3 cycles ahead of when it must do character DMA. Atari did not use this feature but rather stopped the clock signal, effectively freezing the CPU until DMA was finished.

I wonder why the 2600 didn't do that? One would still probably want a non-pausing clock wire to feed the RIOT, but the number of pins on the TIA would remain the same (output CPU clock and RIOT clock, instead of general clock and READY). Using something else would allow a CPU to be chosen with another possibly-useful pin (IRQ or NMI, tied to the RIOT, would actually have been somewhat handy). Had Atari been willing to use a larger edge connector, A13 would also have been a nice choice.

Well, the 2600 was an exercise in low parts count, so they used the 6507 as-is, rather than adding clock logic. Besides, the net effect is pretty much the same. The opcode fetch cycle after WSYNC is strobed will be extended until READY is lifted.

-Bry

7800/6502 Hardware Questions

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members