mos6507's Blog

Thomas Jentzsch · November 13, 2006

I must admit I have no idea about hardware, but after looking at the DPC code in Pitfall II and reading supercat's suggestions, I wonder if something I would describe as "directly linking a queue to a hardware register" is possible.

E.g. instead of...

  lda QUEUE0
 sta GRP0
 lda QUEUE1
 sta GRP1

...when you directly link a queue to GRP0 and another one to GRP1, so that any write (or read?) access to those registers triggers the queue and you can do just this:

  sta GRP0; reads from QUEUE0
 sta GRP1; reads from QUEUE1

Or maybe the other way around:

  lda QUEUE0; writes to GRP0
 lda QUEUE1; writes to GRP1

As I said above, I have no idea if it is possible. But if something like this is possible (even when having to use a 5 cycle r-m-w instruction), we would gain a lot of additional CPU time inside the kernel.

Delicon · November 13, 2006

  sta GRP0; reads from QUEUE0
 sta GRP1; reads from QUEUE1

Or maybe the other way around:

  lda QUEUE0; writes to GRP0
 lda QUEUE1; writes to GRP1

I dont think this is possible, but I really dont know anything about VCS assembly. If I am understanding you, you want a move instruction that can have any effective address for the source and the destination. With 6502 mnemonics like LDA, LDX, and LDY, I dont think an instruction exists for that. This is just a guess, I am hardly a 6502 expert.

What Glenn is talking about above is the ability of a VCS program to swap in and out portions of itself at will. Say you only have the 4K in cart space, remove banking completely from the picture because that just complicates things. The VCS could ask the ARM to load X bytes from position Y in file Z to location A in my 4K code space. The idea being that this is kinda how any operating system sorta works. Code is run from RAM and there is not enough room in RAM to hold all the code. So (in the old days, now operating system do it for you), you tell the ARM to move chunks of yourself in and out. A simple use of this would be for level loading. After completing a level, a new chunk of level data would replace the old chunk. A much more complex use would swap in and out data and executable code.

There are disadvantage to this, the largest being time. It does take time to access disks. But the VCS doesnt have to stop doing stuff while a load is in progress, its only restriction is that it cant ask the ARM to do any other work before it finishes the current task. We are offsetting the access delay with 64K of code space instead of 4K, separated into two 2K banks (each bank being completely independent and can be assigned anywhere in the 64K region), so it will be possible to 'prefetch' data and code. This would avoid the VCS ever needing to sit around and wait for disk I/O.

Vern

Thomas Jentzsch · November 13, 2006

I dont think this is possible, but I really dont know anything about VCS assembly. If I am understanding you, you want a move instruction that can have any effective address for the source and the destination. With 6502 menmoics like LDA, LDX, and LDY, I dont think an instruction exists for that.

Right, such an instruction doesn't exist.

But couldn't additional hardware help out here? E.g. when I write to a TIA register, put the data from a queue on the adressbus?

Or, if that's not possible, use the TIA adresses also as queue read registers and then do a read-modifiy-write instruction (e.g. INC), which reads from the queue, modifies (AFAIK there are no read-write instructions without modify) and then writes the result to the TIA register?

EricBall · November 13, 2006

Good idea, but not really possible. Although you could have a queue drive the data bus when it sees the appropriate TIA address, the 6502 would also be driving the bus. i.e. on STA GRP0, both the 6502 and Chimera would be trying to change the data bus. For LDA GRP0 the 6502 wouldn't drive the bus, but the TIA wouldn't latch the data.

Thomas Jentzsch · November 13, 2006

Although you could have a queue drive the data bus when it sees the appropriate TIA address, the 6502 would also be driving the bus. i.e. on STA GRP0, both the 6502 and Chimera would be trying to change the data bus. For LDA GRP0 the 6502 wouldn't drive the bus, but the TIA wouldn't latch the data.

And if I do both? First I do a load operation (Chimera would drive the bus) and then a store. No?

Or couldn't we use some of the repeated TIA registers instead?

Sorry, if I sound stupid.

EricBall · November 13, 2006

Hmm... so you're suggesting

LDA GRP0 ; read from linked queue

STA GRP0 ; write to TIA

That might work. It depends on whether the TIA drives the bus on the LDA. IIRC the TIA only drives the upper 2 bits even on the defined addresses. So the question becomes what it does for addresses outside the defined read addresses.

Unfortunately, looking at page 2 of the schematics, it appears the read address decoder is only 4 bits wide. So the read registers mirror at all addresses in the TIA address space. $xE and $xF aren't used so you might be able to use a queue to drive PF1/PF2 updates.

The emu guys might be able to confirm or deny this behaviour.

I wouldn't say stupid, just not familiar with the hardware side of things.

Delicon · November 13, 2006

Unfortunately, looking at page 2 of the schematics, it appears the read address decoder is only 4 bits wide. So the read registers mirror at all addresses in the TIA address space. $xE and $xF aren't used so you might be able to use a queue to drive PF1/PF2 updates.

You are correct, the TIA does mirror its address location. I wish it was different, a few free zero page address would be real nice. As it is though, all TIA reads drive bits 6 and 7, so that leaves six other bits that the Chimera could drive. This really limits the usefulness, but not totally. Glenn tells me that sound data only needs 4 bits, so Chimera could shovel sound data to the VCS using TIA reads.

Vern

EricBall · November 13, 2006

Actually, if $xE and $xF are truely unused, there's no reason those addresses couldn't be linked to queues, even on a static basis. That's 16 zero page read-only addresses. Plenty for player, playfield and even audio.

Delicon · November 13, 2006

Actually, if $xE and $xF are truely unused, there's no reason those addresses couldn't be linked to queues, even on a static basis. That's 16 zero page read-only addresses. Plenty for player, playfield and even audio.

I dont think they are free, at least from my interpretation of supercat's post.

http://www.atariage.com/forums/index.php?showtopic=85745

If A12 and A7 are 0, the TIA is active for a read or write. If its a write, the chimera can only read the data bus because the VCS will be driving it. If its a read, only 2 data bits are driven by the TIA and the rest can be manipulated with the Chimera.

Vern

Thomas Jentzsch · November 13, 2006

Hmm... so you're suggesting

LDA GRP0 ; read from linked queue

STA GRP0 ; write to TIA

The idea was to do this in one instruction, e.g.

INC GRP0 ; read from linked queue, (modify), write to TIA

which would save one more cycle. But since it doesn't work anyway...

I wouldn't say stupid, just not familiar with the hardware side of things.

Very diplomatic.

Thomas Jentzsch · November 13, 2006

Glenn tells me that sound data only needs 4 bits, so Chimera could shovel sound data to the VCS using TIA reads.

The volume register only needs the 4 LSB, so digitized music would be possible that way. The are a few other TIA registers which might still be used that way (e.g. enabling missiles and the ball).

Thomas Jentzsch · November 13, 2006

If A12 and A7 are 0, the TIA is active for a read or write.

Any chance to disable the read functionality (e.g. inside the kernel)?

Or another (most likely also impossible) idea: Switch A12 during a RMW-instruction. Then e.g. INC $1000+GRP0 would read from $101b and write to $1b (=GRP0).

BTW: Please tell me when to stop.

mos6507 · November 13, 2006

There are also timing constraints. Delicon barely got SRAM queues to fit under 4 VCS cycles. With fast queues it may or may not be possible to get it under 3 cycles.

If you really want fast code, given enough time in VBLANK, you can self-modify the kernel for load-immediate.

Delicon · November 13, 2006

If A12 and A7 are 0, the TIA is active for a read or write.

Any chance to disable the read functionality (e.g. inside the kernel)?

Unfortunately the way the chips are wired physically together this isnt possible. There are two problems. The first is that the TIA looks for addresses with A12=0 and A7=0, then it 'turns on'. It also looks at the R/W line from the 6507. There are only two states for the R/W line, it can be a 1 or a 0, a read or a write. So we are kind of stuck when it comes to that. The second problem is that only the 6507 can control the address bus. The is no safe time for another device to drive its own values. As far as the data bus goes, its always safe for any device to read the data bus, but non 6507 devices can only write to the bus at special times.

Or another (most likely also impossible) idea: Switch A12 during a RMW-instruction. Then e.g. INC $1000+GRP0 would read from $101b and write to $1b (=GRP0).

BTW: Please tell me when to stop.

There is no shame in asking questions . This is the situation of another device driving the address bus, unfortunately that cant happen.

Vern

Delicon · November 13, 2006

There are also timing constraints. Delicon barely got SRAM queues to fit under 4 VCS cycles. With fast queues it may or may not be possible to get it under 3 cycles.

For fast queues, we can make three cycles. All you need is one external SRAM write for fast queues, it takes the ARM 0.5us to write to external SRAM. So it should be possible. The question becomes, does that one cycle saved matter that much? I am being serious, I really know nothing of VCS programming.

Vern

Thomas Jentzsch · November 13, 2006

Ok, I think I understand now (at least a bit).

And if any of my ideas would have worked, David Crane would most likely have done it before anyway.

Thomas Jentzsch · November 13, 2006

The question becomes, does that one cycle saved matter that much?

Each and every cycle saved inside the kernel matters. That's why David Crane invented the DPC. The more cycles you have, the more advanced graphics you can create. And you can also use them for other stuff which has to be done inside the kernel, e.g. digitized music or paddle controller support.

Delicon · November 13, 2006

And you can also use them for other stuff which has to be done inside the kernel, e.g. digitized music or paddle controller support.

Thats funny because we are trying to add paddle support functions and functions for sound. The paddle support would come from the ARM directly polling the paddles itself, so the VCS would just read a byte as fast as it wants, and a current value will be available. And for the sound, I think I can get the ARM to generate three simultaneous waves of any frequency between 20Hz to 20KHz. By sacrificing a couple queues, you can create custom wave forms, so any shape is possible, square, triangle, sine, anything you want.

Vern

mos6507 · November 13, 2006

There are also timing constraints. Delicon barely got SRAM queues to fit under 4 VCS cycles. With fast queues it may or may not be possible to get it under 3 cycles.

For fast queues, we can make three cycles. All you need is one external SRAM write for fast queues, it takes the ARM 0.5us to write to external SRAM. So it should be possible. The question becomes, does that one cycle saved matter that much? I am being serious, I really know nothing of VCS programming.

Vern

If any queues could be read via a zero-page address as a hotspot, then yes, it would help.

batari · November 13, 2006

I've wondered something about the 6507 data bus. Is it open-collector, or can it be driven low when controlled by the 6507 without much current drain and without damage?

Namely, if the 6507 is driving the bus to $FF during a TIA write, is it possible to change this value on the fly without damage so a different value gets written to the TIA?

If this is possible, one could write a kernel containing lots of TIA writes, then the ARM could change those values to something else. e.g., set LDA #$FF, then a kernel could contain STA GRP0, STA GRP1, STA AUDVx, etc. and write custom values in 3 cycles each.

Thomas Jentzsch · November 14, 2006

If this is possible, one could write a kernel containing lots of TIA writes, then the ARM could change those values to something else. e.g., set LDA #$FF, then a kernel could contain STA GRP0, STA GRP1, STA AUDVx, etc. and write custom values in 3 cycles each.

Sounds like my (impossible ) idea above, but with some more technical background.

batari · November 14, 2006

If this is possible, one could write a kernel containing lots of TIA writes, then the ARM could change those values to something else. e.g., set LDA #$FF, then a kernel could contain STA GRP0, STA GRP1, STA AUDVx, etc. and write custom values in 3 cycles each.

Sounds like my (impossible ) idea above, but with some more technical background.

It is similar, which is what inspired my twist on it. An open-collector bus can only be driven low; when it's not being driven, it's pulled up by a resistor (sometimes internal.) So $FF on an open-collector bus would effectively release it. This type of bus can be used by multiple devices with them all enabled at the same time.

But even if it's not technically an open collector bus, it might still be possible to drive the bus when you aren't "supposed" to without melting any chips. And if this is the case, the questions are: which device would "win" the bus contention, and how much current would be sunk by the two devices fighting over the bus?

Delicon · November 14, 2006

I've wondered something about the 6507 data bus. Is it open-collector, or can it be driven low when controlled by the 6507 without much current drain and without damage?

Unfortunately this isnt true. A quick look through the schematics and I cant find any pullup resistors on the data lines. This is required in open-drain and open-collector as pins cannot actually drive a high voltage, they can only sink.

The pullups may be internal, that should be easy check against 6502 documentation.

Vern

Delicon · November 14, 2006

But even if it's not technically an open collector bus, it might still be possible to drive the bus when you aren't "supposed" to without melting any chips. And if this is the case, the questions are: which device would "win" the bus contention, and how much current would be sunk by the two devices fighting over the bus?

This could be risky. If damage did not occur immediately, it surely would over time.

I have 'burned' out 6 or 8 CPLDs during Chimera development doing this. The interaction was between the ARM and the CPLD, but the situation would still be the same if the 6507 was one of the devices. I got sloppy with my programming and had the ARM and the CPLD driving the same bus at the same time. If I caught it immediately (a couple seconds), there didnt seem to be a problem, any more and the CPLD would fail.

Vern

Thomas Jentzsch · November 14, 2006

It is similar, which is what inspired my twist on it.

To bad it doesn't work. 3 cycles/write would be a giant performance leap.

27 Comments

Recommended Comments

Thomas Jentzsch 11,457

Link to comment

Delicon 0

Link to comment

Thomas Jentzsch 11,457

Link to comment

EricBall 258

Link to comment

Thomas Jentzsch 11,457

Link to comment

EricBall 258

Link to comment

Delicon 0

Link to comment

EricBall 258

Link to comment

Delicon 0

Link to comment

Thomas Jentzsch 11,457

Link to comment

Thomas Jentzsch 11,457

Link to comment

Thomas Jentzsch 11,457

Link to comment

mos6507 252

Link to comment

Delicon 0

Link to comment

Delicon 0

Link to comment

Thomas Jentzsch 11,457

Link to comment

Thomas Jentzsch 11,457

Link to comment

Delicon 0

Link to comment

mos6507 252

Link to comment

batari 4,718

Link to comment

Thomas Jentzsch 11,457

Link to comment

batari 4,718

Link to comment

Delicon 0

Link to comment

Delicon 0

Link to comment

Thomas Jentzsch 11,457

Link to comment

Recently Browsing 0 members

Apps

My Activity Streams

More