Jump to content
  • entries
    62
  • comments
    464
  • views
    86,883

4A50 cart--hardware details


Guest

722 views

I haven't gone into much detail elsewhere about the 4A50 cart and some of the techniques it uses, but since people may find it interesting I'll discuss it here.The heart of the cartridge is a Xilinx CPLD. This device has 36 macrocells connected to 32 I/O pins. While it's a step up from the 22V10 used in Al's bankswitch carts, it's still very cheap as such devices go.Another key to the cartridge is a 14.31818Mhz oscillator. Although many RAM-plus carts get by with some simple RC circuitry for timing, using a crystal oscillator makes it possible for the CPLD to "know" the cycle phase of the Atari's processor. This is essential for the "magic RAM write trick" discussed below.Finally, the cartridge contains 64KB of flash (treated as ROM) and 32KB of SRAM. Both of these are simple common ordinary chips.In addition to supporting 4A50 bankswitching, I wanted the cart to be able to support other forms by reprogramming the CPLD. The 32 I/O pins thus break down as:AD0-AD6 -- Tied to A0-A6 on the 2600 and A0-A6 of the RAM and ROMAD7-AD10 -- Tied to A7-A10 on the 2600 and, via resistors, to AQ7-AQ10AD11-AD12 -- Tied to A11-A12 on the 2600AQ7-AQ10 -- Tied to A7-A10 on the RAM and ROM and, via aforementioned resistors, to AD7-AD10AQ11-AQ14 -- Tied to A11-A14 on the RAM and ROMRamRW -- Tied to the R/W pin on the RAM, and A15 on the ROMRAMCS -- Chip-select of the RAMROMCS -- Chip-select of the ROM (flash)D0-D7 -- Tied to the data bus--used as inputs and for the 'bus-keeper' functionXtalIn -- Input from the 14.31818Mhz crystalspares -- Used for debugging; may also be usable for adding an EEPROM, LED, or other feature.The CPLD is thus capable of mapping any address to any 128-byte block of RAM or ROM, but when it's plugged into the 2600 it does not have control of the lower address bits even though it can see them.The use of resistors on A7-A10 provides a couple benefits:

  • It allows the chip to control A7 if needed (as in Superchip games) but does not waste a macrocell if such ability is not needed.
  • In bank-switching schemes, like 4A50, where the output address should either be taken from a particular set of latches or from the input address, the resistors provide an "almost-free" multiplexor. When the AQ outputs are enabled, the address pins will be driven by their corresponding latches; otherwise they'll be driven by the input address.

The first prototype only had the A7 resistor on-board (A8-A10 resistors were soldered on later). The next batch of prototypes will include them on-board.Although the 4A50's ability to bank-switch much more memory than earlier designs is simply a consequence of the larger memory chips used, there are a few aspects of its design that are unique. These include "magic RAM writes", "hotspot MSB discrimination", and "memory-mode presets".Magic RAM writesTo understand magic RAM writes, one must first understand how existing RAM cartridges work. Because the 2600 does not provide a read/write signal out to the cartridge port, cartridges have no inherent way of knowing whether a particular cycle is a read or a write. What most RAM cartridges do is allocate two ranges of addresses for the RAM: one for reads and one for writes. A read access to the read range will read the corresponding memory location; a write-address to the write range will write it. Write accesses to the read range will produce bus contention (bad), and read accesses to the write range will cause garbage data to be read and written (and the data read may not match the data written).In something like the Superchip with 128 bytes of RAM, doubling the address space from 128 bytes to 256 is no big deal. There are still 3840 bytes of address space left for the cartridge. With larger RAMs, however, things become problematic. In 3E bankswitching, RAM banks are limited to 1K because 1K of RAM uses up 2K of address space.The magic RAM write trick eliminates the requirement for a separate RAM space. It does this by taking advantage of a few observations:

  • The modern RAM chips can be read in less than half a cycle
  • The Xilinx CPLD features a "bus hold" function that will weakly try to hold the data bus high when it's high and low when it's low.
  • Neither the 6507 nor any of the other chips on the data bus have any trouble overpowering the CPLD's bus-hold circuitry when they "want" to, but the bus-hold circuitry can keep the bus state stable when nothing is deliberately driving it.
  • Writing a RAM address with the data it already held is harmless.
  • The 6507 only drives the databus during phi2 when it wants to perform a write cycle.

Using all of these facts together, the 4A50 cart RAM cycle performs a read after the address is stable, then--while keeping the chip selected--hits the /WE line shortly before the 6507 drives phi2. The RAM chip-select is then released shortly before the 6502 releases phi2. If the 6507 is performing a read cycle, it will read the data that was put on the bus during phi1 and held there by the CPLD. That data will then get written back to the RAM when the chip-select is released. If the 6507 is performing a write cycle, the data that was read from the RAM will be overwritten by the processor's data, and that will get written into RAM when chip-select is released. This technique produces nice waveforms on the data bus with the Heavy Sixer and the 2600jr. The 7800 seems to have pullups on the data bus which make things somewhat marginal, but testing instructions with many consecutive bus float states (e.g. "LDA (0,X)") suggests that things should be stable there as well.Note that the magic RAM write trick requires that the cartridge know when phi2 is going to start/end. This can be inferred by counting 14.31818Mhz clocks following a change of A0. This will work nicely for NTSC machines, but PAL machines will require the use of a different oscillator. Otherwise, code executing in RAM that performs an STA WSYNC that idles the CPU for 75 cycles will likely fail.Hotspot discriminationOn most 6502-based systems, reading a random RAM address is generally pretty harmless. On a 2600 running a 4K cart, the only read operation that will have any sort of side-effect is a read of the RIOT timer (which clears the interrupt latch). Unless software happens to use this latch (the interrupts themselves are not used) even that read will be harmless. Thus, it is safe to do things like use the "BIT abs" opcode to harmlessly skip over a two-byte instruction without worrying about what instruction is being skipped.On typical bank-switch carts, there are a few more addresses which can cause trouble, but there still aren't a whole lot. Superchip carts add bigger 'problematic' address ranges (reading the "write addresses" of RAM is bad), and the Supercharger has an even bigger one. Any accidental access of the form $N0xx (N being odd; xx being anything) can spell disaster. So trying to skip over something like "ORA #$10" will trigger an access to $1009. Oops.To alleviate this problem somewhat, the 4A50 cartridges will ignore any access to a bank-switch hotspot unless the previous byte fetched was $6X or $7X. Thus, the hotspot at $68A9 will be triggered by "BIT $68A9" but not by "BIT $0809" or by a BIT instruction that's used to skip over an "LDA #$08" instruction (which would access address $0809). Although a programmer could still get tripped up by trying to skip over e.g. an "LDA #$68" instruction, operands of the form $6x-$7x seem like they'd be less common than many other values.Memory Mode PresetsOne thing that's often desirable in a bank-switching scheme is to allow the programmer to switch among many banks efficiently. Unfortunately, conventional techniques for allowing this require bank-switching hardware to include registers to hold the different bank selections among which the program will switch. In a part with 36 macrocells, that approach simply isn't going to fly; even in a part with 72 macrocells, it can only go so far.What the 4A50 bank-switch method does to alleviate this is to use RAM instead. RAM addresses from $E8 to $FF are reserved for magic bank-switch hotspots. Reading or writing one of those will cause the value read or written to be loaded into one of the CPLD's bank-switch registers; consequently, it will "feel" as though the CPLD has twelve more bank-switch registers than it really does.Note that accesses from $01E8-$01FF will not trigger these hotspots. Although it is important that any of those hotspots that is actually used not overlap the stack, code which doesn't need all twelve of the hotspots may place the stack on top of the ones it doesn't use.

5 Comments


Recommended Comments

This sounds really cool, though the $40+ price is a little daunting. :|

 

Also, am I reading that last section correctly? So the standard clean start macro should not be used since it will hit all the bank switch spots?

            MAC CLEAN_START
               sei
               cld
           
               ldx #0
               txa
               tay
.CLEAR_STACK    dex
               txs
               pha
               bne .CLEAR_STACK    ; SP=$FF, X = A = Y = 0

           ENDM

And...24 zeropage bytes that are unusable is a bummer. There isn't a way around this?

Link to comment
Also, am I reading that last section correctly?  So the standard clean start macro should not be used since it will hit all the bank switch spots?

 

Actually, it won't trip any of them because it writes to the RAM using addresses $0180-$01FF rather than $0080-$00FF.

 

And...24 zeropage bytes that are unusable is a bummer. There isn't a way around this?

 

Mea culpa. It's $F4-$FF, not $E8. So twelve bytes. And unused presets won't really cost memory because you can set the stack pointer just below the lowest used one and use that space for stack. In the event that your stack doesn't use up all your presets (e.g. you only need $FA-$FF for presets and four bytes of stack) you could still address the other locations as $01F4 and $01F5. Not zero-page, but always available nonetheless.

 

Further, I really don't think zero-page RAM is going to be as scarce and precious a commodity in a 4A50 game as in a normal one (since other RAM will be available). When I did SDI (a 1K minigame for the SuperCharger), I had boatloads of zero-page RAM left. So much that spending 8 bytes of RAM to save two bytes of code was a no-brainer.

Link to comment

12 isn't so bad.

 

But am I reading this right; you can access $F4-$Fx with the stack only?

 

How does that work? Which opcodes can you use? JSR, BRK? What about PHA, PLA? And, just curious, why not LDA/STA/etc.?

Link to comment
12 isn't so bad.

 

But am I reading this right; you can access $F4-$Fx with the stack only?

 

How does that work?  Which opcodes can you use?  JSR, BRK?  What about PHA, PLA?  And, just curious, why not LDA/STA/etc.?

 

On the 2600, address pin 8 is not connected to the RAM or the Stella, nor are they involved in the chip-select logic. Thus, all addresses of the form "xxx0 xx0x 1nnn nnnn" will access location "nnn nnnn" of RAM. So "lda $6985" will behave the same as "lda $0105" or "lda.w $0085", and both will read the same data as "lda $85" (though the latter will be a cycle shorter).

 

Even though the TIA and RIOT might not be able to distinguish among these various addresses, the cartridge port can (except for the top 3 bits anyway, and even those it can do sometimes). To see how this is relevant with hotspots, consider the following example.

 

Assume A contains 5, X contains 7, Y contains 12, and S contains $FF. Note that address $FE is a hotspot for selecting a flash page to appear at $1E00-$1EFF and is $FF a hotspot for selecting a page of RAM to appear there. The following instructions are run in sequence.

 STA $FE   ; Stores 5 in $FE *and* selects page 5 of flash
 STX $01FE ; Stores a 7 in $FE, but leaves page 5 of flash selected
 STY $FF   ; Stores a 12 in $FF *and* selects page 12 of RAM
 LDA $FE   ; Loads a 7 *and* selects page 7 of flash
 INC $FF   ; Stores a 13 in $FF (was 12) *and* selects page 13 of RAM
 NOP $FE   ; Selects page 7 of flash (no registers or flags affected) 
 LDA #$1A
 PHA       ; Stores a $1A in $FF, but leaves page 7 of flash selected
 NOP $FF   ; Selects page $1A of flash

 

If all one wants to do is select a particular known bank of RAM or flash into the $1E00 block and the page number isn't in a register, it may be done in 4 cycles via the $6C00-$6DFF hotspots. "NOP $6C53" will select page $53 of ROM; "NOP $6DEA" will select page $EA of RAM; neither instruction affects flags or registers. But if it's necessary to switch frequently among a few pages, the RAM hotspots allow that to be done in 3 cycles. Further, use of instructions like "inc" and "dec" on the RAM hotspots make indexing highly convenient.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...