Jump to content
  • entries
    143
  • comments
    451
  • views
    174,012

Atari 2600 "DOS"


mos6507

1,231 views

Another thing Delicon and I were discussing was how to handle the native Chimera file format. We decided that the best approach is for a Chimera game to load in a 4K "loader" which in turn instructs the ARM where to get the rest of the data and where to put it. While this adds some extra overhead to the game, it is the most flexible approach, and closer to how a modular program for an 8-bit home computer would work. Since the ARM would have the ability to use the FAT filesystem on the flash, then the VCS simply sends a short command to the ARM requesting that a file be opened, and to seek to a position in the file, and to read out a chunk and put it anywhere in the 128K SRAM space. The ARM does all the hard work. We should be able to carry this through the wire via serial also, using the control program on the PC. Right now we're thinking the control software will probably be a java application. The upside of all this is that we abandon the notion of supercharger packets and loads and just leave it to the VCS to do general IO routines. This didn't make sense before we implemented the filesystem, but now we can do it. So the programmer can modularize his game any way he chooses. You aren't limited to 256 byte segments. Load IDs no longer mean anything. It also means that a game doesn't have to jump out into the loader ROM to fetch more data. It may still have to pause, but it could keep the kernel going and display some kind of loading indicator. So IO will be more like having a hard drive or a CD-ROM for the VCS.

27 Comments


Recommended Comments



I must admit I have no idea about hardware, but after looking at the DPC code in Pitfall II and reading supercat's suggestions, I wonder if something I would describe as "directly linking a queue to a hardware register" is possible.

 

E.g. instead of...

  lda QUEUE0
 sta GRP0
 lda QUEUE1
 sta GRP1

...when you directly link a queue to GRP0 and another one to GRP1, so that any write (or read?) access to those registers triggers the queue and you can do just this:

  sta GRP0; reads from QUEUE0
 sta GRP1; reads from QUEUE1

Or maybe the other way around:

  lda QUEUE0; writes to GRP0
 lda QUEUE1; writes to GRP1

As I said above, I have no idea if it is possible. But if something like this is possible (even when having to use a 5 cycle r-m-w instruction), we would gain a lot of additional CPU time inside the kernel.

Link to comment

  sta GRP0; reads from QUEUE0
 sta GRP1; reads from QUEUE1

Or maybe the other way around:

  lda QUEUE0; writes to GRP0
 lda QUEUE1; writes to GRP1

I dont think this is possible, but I really dont know anything about VCS assembly. If I am understanding you, you want a move instruction that can have any effective address for the source and the destination. With 6502 mnemonics like LDA, LDX, and LDY, I dont think an instruction exists for that. This is just a guess, I am hardly a 6502 expert.

 

What Glenn is talking about above is the ability of a VCS program to swap in and out portions of itself at will. Say you only have the 4K in cart space, remove banking completely from the picture because that just complicates things. The VCS could ask the ARM to load X bytes from position Y in file Z to location A in my 4K code space. The idea being that this is kinda how any operating system sorta works. Code is run from RAM and there is not enough room in RAM to hold all the code. So (in the old days, now operating system do it for you), you tell the ARM to move chunks of yourself in and out. A simple use of this would be for level loading. After completing a level, a new chunk of level data would replace the old chunk. A much more complex use would swap in and out data and executable code.

 

There are disadvantage to this, the largest being time. It does take time to access disks. But the VCS doesnt have to stop doing stuff while a load is in progress, its only restriction is that it cant ask the ARM to do any other work before it finishes the current task. We are offsetting the access delay with 64K of code space instead of 4K, separated into two 2K banks (each bank being completely independent and can be assigned anywhere in the 64K region), so it will be possible to 'prefetch' data and code. This would avoid the VCS ever needing to sit around and wait for disk I/O.

 

Vern

Link to comment

I dont think this is possible, but I really dont know anything about VCS assembly. If I am understanding you, you want a move instruction that can have any effective address for the source and the destination. With 6502 menmoics like LDA, LDX, and LDY, I dont think an instruction exists for that.

Right, such an instruction doesn't exist.

 

But couldn't additional hardware help out here? E.g. when I write to a TIA register, put the data from a queue on the adressbus?

 

Or, if that's not possible, use the TIA adresses also as queue read registers and then do a read-modifiy-write instruction (e.g. INC), which reads from the queue, modifies (AFAIK there are no read-write instructions without modify) and then writes the result to the TIA register?

Link to comment

Good idea, but not really possible. Although you could have a queue drive the data bus when it sees the appropriate TIA address, the 6502 would also be driving the bus. i.e. on STA GRP0, both the 6502 and Chimera would be trying to change the data bus. For LDA GRP0 the 6502 wouldn't drive the bus, but the TIA wouldn't latch the data.

Link to comment

Although you could have a queue drive the data bus when it sees the appropriate TIA address, the 6502 would also be driving the bus. i.e. on STA GRP0, both the 6502 and Chimera would be trying to change the data bus. For LDA GRP0 the 6502 wouldn't drive the bus, but the TIA wouldn't latch the data.

And if I do both? First I do a load operation (Chimera would drive the bus) and then a store. No? :D

 

Or couldn't we use some of the repeated TIA registers instead?

 

Sorry, if I sound stupid. ;)

Link to comment

Hmm... so you're suggesting

LDA GRP0 ; read from linked queue

STA GRP0 ; write to TIA

 

That might work. It depends on whether the TIA drives the bus on the LDA. IIRC the TIA only drives the upper 2 bits even on the defined addresses. So the question becomes what it does for addresses outside the defined read addresses.

 

Unfortunately, looking at page 2 of the schematics, it appears the read address decoder is only 4 bits wide. So the read registers mirror at all addresses in the TIA address space. $xE and $xF aren't used so you might be able to use a queue to drive PF1/PF2 updates.

 

The emu guys might be able to confirm or deny this behaviour.

 

I wouldn't say stupid, just not familiar with the hardware side of things.

Link to comment

Unfortunately, looking at page 2 of the schematics, it appears the read address decoder is only 4 bits wide. So the read registers mirror at all addresses in the TIA address space. $xE and $xF aren't used so you might be able to use a queue to drive PF1/PF2 updates.

You are correct, the TIA does mirror its address location. I wish it was different, a few free zero page address would be real nice. As it is though, all TIA reads drive bits 6 and 7, so that leaves six other bits that the Chimera could drive. This really limits the usefulness, but not totally. Glenn tells me that sound data only needs 4 bits, so Chimera could shovel sound data to the VCS using TIA reads.

 

Vern

Link to comment

Actually, if $xE and $xF are truely unused, there's no reason those addresses couldn't be linked to queues, even on a static basis. That's 16 zero page read-only addresses. Plenty for player, playfield and even audio.

Link to comment

Actually, if $xE and $xF are truely unused, there's no reason those addresses couldn't be linked to queues, even on a static basis. That's 16 zero page read-only addresses. Plenty for player, playfield and even audio.

I dont think they are free, at least from my interpretation of supercat's post.

 

http://www.atariage.com/forums/index.php?showtopic=85745

 

If A12 and A7 are 0, the TIA is active for a read or write. If its a write, the chimera can only read the data bus because the VCS will be driving it. If its a read, only 2 data bits are driven by the TIA and the rest can be manipulated with the Chimera.

 

Vern

Link to comment

Hmm... so you're suggesting

LDA GRP0 ; read from linked queue

STA GRP0 ; write to TIA

The idea was to do this in one instruction, e.g.

INC GRP0 ; read from linked queue, (modify), write to TIA

which would save one more cycle. But since it doesn't work anyway...

 

I wouldn't say stupid, just not familiar with the hardware side of things.

Very diplomatic. ;)

Link to comment
Glenn tells me that sound data only needs 4 bits, so Chimera could shovel sound data to the VCS using TIA reads.

The volume register only needs the 4 LSB, so digitized music would be possible that way. The are a few other TIA registers which might still be used that way (e.g. enabling missiles and the ball).

Link to comment

If A12 and A7 are 0, the TIA is active for a read or write.

Any chance to disable the read functionality (e.g. inside the kernel)?

 

Or another (most likely also impossible) idea: Switch A12 during a RMW-instruction. Then e.g. INC $1000+GRP0 would read from $101b and write to $1b (=GRP0).

 

BTW: Please tell me when to stop. ;)

Link to comment

There are also timing constraints. Delicon barely got SRAM queues to fit under 4 VCS cycles. With fast queues it may or may not be possible to get it under 3 cycles.

 

If you really want fast code, given enough time in VBLANK, you can self-modify the kernel for load-immediate.

Link to comment

If A12 and A7 are 0, the TIA is active for a read or write.

Any chance to disable the read functionality (e.g. inside the kernel)?

Unfortunately the way the chips are wired physically together this isnt possible. There are two problems. The first is that the TIA looks for addresses with A12=0 and A7=0, then it 'turns on'. It also looks at the R/W line from the 6507. There are only two states for the R/W line, it can be a 1 or a 0, a read or a write. So we are kind of stuck when it comes to that. The second problem is that only the 6507 can control the address bus. The is no safe time for another device to drive its own values. As far as the data bus goes, its always safe for any device to read the data bus, but non 6507 devices can only write to the bus at special times.

 

Or another (most likely also impossible) idea: Switch A12 during a RMW-instruction. Then e.g. INC $1000+GRP0 would read from $101b and write to $1b (=GRP0).

 

BTW: Please tell me when to stop. ;)

There is no shame in asking questions :D. This is the situation of another device driving the address bus, unfortunately that cant happen.

 

Vern

Link to comment

There are also timing constraints. Delicon barely got SRAM queues to fit under 4 VCS cycles. With fast queues it may or may not be possible to get it under 3 cycles.

For fast queues, we can make three cycles. All you need is one external SRAM write for fast queues, it takes the ARM 0.5us to write to external SRAM. So it should be possible. The question becomes, does that one cycle saved matter that much? I am being serious, I really know nothing of VCS programming.

 

Vern

Link to comment
The question becomes, does that one cycle saved matter that much?

Each and every cycle saved inside the kernel matters. That's why David Crane invented the DPC. The more cycles you have, the more advanced graphics you can create. And you can also use them for other stuff which has to be done inside the kernel, e.g. digitized music or paddle controller support.

Link to comment

And you can also use them for other stuff which has to be done inside the kernel, e.g. digitized music or paddle controller support.

Thats funny because we are trying to add paddle support functions and functions for sound. The paddle support would come from the ARM directly polling the paddles itself, so the VCS would just read a byte as fast as it wants, and a current value will be available. And for the sound, I think I can get the ARM to generate three simultaneous waves of any frequency between 20Hz to 20KHz. By sacrificing a couple queues, you can create custom wave forms, so any shape is possible, square, triangle, sine, anything you want.

 

Vern

Link to comment

There are also timing constraints. Delicon barely got SRAM queues to fit under 4 VCS cycles. With fast queues it may or may not be possible to get it under 3 cycles.

For fast queues, we can make three cycles. All you need is one external SRAM write for fast queues, it takes the ARM 0.5us to write to external SRAM. So it should be possible. The question becomes, does that one cycle saved matter that much? I am being serious, I really know nothing of VCS programming.

 

Vern

 

If any queues could be read via a zero-page address as a hotspot, then yes, it would help.

Link to comment

I've wondered something about the 6507 data bus. Is it open-collector, or can it be driven low when controlled by the 6507 without much current drain and without damage?

 

Namely, if the 6507 is driving the bus to $FF during a TIA write, is it possible to change this value on the fly without damage so a different value gets written to the TIA?

 

If this is possible, one could write a kernel containing lots of TIA writes, then the ARM could change those values to something else. e.g., set LDA #$FF, then a kernel could contain STA GRP0, STA GRP1, STA AUDVx, etc. and write custom values in 3 cycles each.

Link to comment

If this is possible, one could write a kernel containing lots of TIA writes, then the ARM could change those values to something else. e.g., set LDA #$FF, then a kernel could contain STA GRP0, STA GRP1, STA AUDVx, etc. and write custom values in 3 cycles each.

Sounds like my (impossible ;)) idea above, but with some more technical background. :D

Link to comment

If this is possible, one could write a kernel containing lots of TIA writes, then the ARM could change those values to something else. e.g., set LDA #$FF, then a kernel could contain STA GRP0, STA GRP1, STA AUDVx, etc. and write custom values in 3 cycles each.

Sounds like my (impossible ;)) idea above, but with some more technical background. :D

It is similar, which is what inspired my twist on it. An open-collector bus can only be driven low; when it's not being driven, it's pulled up by a resistor (sometimes internal.) So $FF on an open-collector bus would effectively release it. This type of bus can be used by multiple devices with them all enabled at the same time.

 

But even if it's not technically an open collector bus, it might still be possible to drive the bus when you aren't "supposed" to without melting any chips. And if this is the case, the questions are: which device would "win" the bus contention, and how much current would be sunk by the two devices fighting over the bus?

Link to comment

I've wondered something about the 6507 data bus. Is it open-collector, or can it be driven low when controlled by the 6507 without much current drain and without damage?

Unfortunately this isnt true. A quick look through the schematics and I cant find any pullup resistors on the data lines. This is required in open-drain and open-collector as pins cannot actually drive a high voltage, they can only sink.

 

The pullups may be internal, that should be easy check against 6502 documentation.

 

Vern

Link to comment

But even if it's not technically an open collector bus, it might still be possible to drive the bus when you aren't "supposed" to without melting any chips. And if this is the case, the questions are: which device would "win" the bus contention, and how much current would be sunk by the two devices fighting over the bus?

This could be risky. If damage did not occur immediately, it surely would over time.

 

I have 'burned' out 6 or 8 CPLDs during Chimera development doing this. The interaction was between the ARM and the CPLD, but the situation would still be the same if the 6507 was one of the devices. I got sloppy with my programming and had the ARM and the CPLD driving the same bus at the same time. If I caught it immediately (a couple seconds), there didnt seem to be a problem, any more and the CPLD would fail.

 

Vern

Link to comment

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...