F18A programming, info, and resources

matthew180 · January 14, 2013

I'm starting this thread as a means to hopefully promote some F18A development, answer specific questions about programming the F18A, and finally as place to look for links to updated documentation and eventually firmware updates.

This first post will always have the latest documents and updates attached, so there is no need to go digging through the thread to find the most recent information. I also hope it will contain questions, answers, and code examples. I would like to keep this thread technical and on-topic, so if you have other general F18A questions or comments, please start a new thread or use the other existing F18A thread.

* Documentation: On-going. This is something I hope to complete, but until then Rasmus has collected many of the F18A programming posts from the forum and created PDF of them (thank you Rasmus!) See the files attached to this thread, and please ask F18A technical questions in this thread.

The main F18A webpage (http://codehackcreate.com/archives/30) has the main feature list, as well as an initial post to getting started with programming the F18A. As I add documentation, I will post it on the website first, then make an update here to let anyone interested know there is something new.

* Register Use Spreadsheet: Libre Office / Open Office .ods format. This is the primary spreadsheet I used while developing the F18A, and all functionality was documented in the spreadsheet first, then converted into HDL. That means the spreadsheet is always up to date with respect to the F18A's functionality.

While some of the F18A's features require more documentation to use, much of the functionality is very self explanatory and can be used just by looking at the spreadsheet and reading the notes. For example, it does not take much to guessing to figure out what the "horizontal scroll register" does.

*************
COMPATIBILITY
*************

Pin-compatible replacement for the TMS9918A, 9928, 9929, and TMS9118 Video Data Processors.

The F18A has been tested in the following systems:

TI-99/4A Home Computer
ColecoVison Game Console*
ColecoVision ADAM Computer#
Toshiba HX-10 MSX1 Computer
Toshiba Pasopia-IQ MSX1 Computer
JVC Victor HC-7 MSX1 Computer
Yamaha CX5M MSX1 Computer@
SpectraVideo 328 Computer*@
Tomy Tutor Computer*@
SEGA SG-1000 Game Console
SEGA SC-1000II (replaced a TMS9118 VDP)
Telegames Personal Arcade
Powertran Cortex Computer

* Note1: These systems are known to have the original VDP soldered directly to the system circuit board and will require desoldering and a socket installed.

# Note2: The ADAM computer requires an "offset board" to keep the F18A inside the main PCB outline. This is an available option when ordering and F18A.

@ Note3: These systems are known to require USR4 jumper removed because the main system uses the CPUCLK output from the VDP as the main system clock.

************************
F18A FIRMWARE Change Log
************************

F18A V1.9 Dec 31, 2018 (CRC: 147A)

* Prepare for open source release.
* Split up the original "core" to create a top-module for the stand-alone
F18A, and a "main core" that can be used as part of a larger SoC.
* Fixed the VGA horizontal timing error caused by treating the pixel time as
40ns instead of 39.68ns. Because events were being counted in "pixels",
this caused the horizontal sync pulse to be slightly off, and the overall
line time to be 32us instead of 31.746us. This error meant each line was
around 6.4 pixels too long, and pushed the total frame rate to 59.2Hz.
This error was enough to cause games to fail (Pole Position on the 99/4A),
and some monitors to not sync properly when run through video converters.
The timing error also caused many problems for the PAL ColecoVision.
* Removed sprite-linking. This was an unused feature and helped free up
FPGA resources to allow the core to better fit in the Spartan-3E 250K.
* Removed programmable GROMCLK divisor. Unused feature, free up resources.
* Register mode and cd_i inputs to CPU component.

V1.8 - Aug 24, 2016 (CRC: F981)

* Fixed sprite collision bug where sprite collisions were being incorrectly detected outside of the active display, after line 191 or 239 depending on the line mode.

* Added hybrid VR write restriction to mask VR writes to three-bits when the F18A is locked, like the real 9918A does. However, if mode bit M4 is set (80-columns), writes to VRs over VR7 are *ignored* instead of masked to three-bits. This allows various 9938 programs to work (or continue to work), as well as continue to support TurboForth that writes to VRs 0..15 to set up 80-columns (if straight masking was used, VRs 8..15 would over-write VR 0..7).

V1.7 - Jan 1, 2016 (CRC: A3B5)

* Fixed Bitmap-Layer (BML) display bug
* Fixed GPU's PIX instruction to properly calculate BML addresses
* Added power-on graphic that shows the current firmware version

V1.6 - Apr 26, 2015 (CRC: 40CC)

* Removed fixed tile functionality
* Removed border scroll limit functionality
* Removed banner functionality
* Removed host-side 32-bit counter
* Removed host-side 32-bit RNG
* Removed GPU 32-bit counter
* Removed GPU 32-bit RNG
* Removed the sprite "disable value" (>F8) in the sprite Y-location when ROW30 is enabled.

* Added second tile layer with its own NTBA, h/v page sizes, and h/v scroll regs

* Added ECM2/3 pattern table size selections for tiles and sprites.

* Added host-side segmented counter with 10ns accuracy.

* Added configurable HSYNC and VSYNC GPU triggers.

* Added fat-pixel (2x1) with 16-color support to the bitmap layer (BML).

* Added 1x1 page scroll support for T40 and T80 modes.

* Added option to reset most VDP registers to their power-on values.

* Added option to disable Tile Layer 1, which includes GM1, GM2, MCM, T40, and T80.
Sprites, the BML, and TL2 are still active and can be enabled/disabled independently.

* Added option to allow attribute byte to be fg/bg color select in T40 and T80.

* Added per-position tile attribute support.

* Added DMA capability to the GPU:
8xx0 - MSB src
8xx1 - LSB src
8xx2 - MSB dst
8xx3 - LSB dst
8xx4 - width
8xx5 - height
8xx6 - stride
8xx7 - 0..5 | !INC/DEC | !COPY/FILL
8xx8 - trigger

FILL (active high) will read a single byte at the src address and fill the
destination with that byte.

src, dst, width, height, and stride are copied to dedicated counters when
the DMA is triggered, thus the original values remain unchanged.

* Added USR3 jumper to control GROMCLK/CPUCLK output on pin37 to provide support for 9128/29

* Added USR2 jumper to disable/enable simulated scan lines (every other VGA scan line has its
color reduced by 50%.) Also controllable via a new VDP register bit.

* Added a 5th sprite reporting option instead of reporting the max-sprite, which on the F18A
might be different than the original VDP because all 32 sprites can be on a single scan line.

* Added a new register (VR51) to limit the maximum sprite processed. This has nothing
to do with the number of sprites that can be visible on a scan line, which is controlled
by a separate register (VR30). This register is always active and can be used instead of
the >D0 byte in the sprite Y-location, and is the only way to limit sprite processing early
when ROW30 is enabled.

* Changed the GPU interlock so that polling the VDP status register will not cause the GPU
to pause. This should greatly increase GPU performance during heavy VDP interrupt polling.

* Fixed T80 NTBA two LSbit problem. They are ignored (set to "00") when the F18A
is locked to provide compatibility with the 9938 and avoid problem with software
that set the two LSbits of the NTBA to other than "11" as the 9938 documentation
specifies they should be. This limits the T80 name table to 4K boundaries. When
the F18A is unlocked, all 4-bits of the NTBA are used and the T80 name table can
be located on 1K boundaries.

* Fixed the 5th number update during a scan line. As long as the 5S flag is zero, the 5th
number register follows the sprite scanning sequence. Seems to be a transparent latch that
follows the input (current sprite being scanned) until latched by the 5S flag. If the status
register is being polled and 5S is reset mid frame, then the 5th number begins following the
scanned sprites again. This bug is known to have affected Miner49er on the 99/4A.

V1.5 - July 2013
Not really a *bug* fix since the problem it corrects exists on the real 9918A, and only has to do with sporadic collision bit reporting during heavy polling of the original 9918A VDP status register. This was discovered while Rasmus was writing Titanium. The 9918A was not designed to have its status register polled which is why it provides an interrupt output.

I don't think the original 9918A designers took the hazard into consideration, but I decided to make this correction because it is what the original designers would have done given their preference (and I asked Karl Guttag about it). Thus, the F18A implements what you would consider the "expected behavior", and will work as expected where the original 9918A might not. I did not make this decision lightly.

V1.4 - April 2013
Fixed the sprite collision bug and a GPU bug with the divide circuit. The sprite bug is mostly affected by XB when a program uses CALL COINC(ALL). Most assembly games probably don't rely on the collision bit alone for sprites and perform coordinate testing, which is most likely why the bug slipped through all the testing (and I tested with a *lot* of games on a lot of platforms).

V1.3 - July 2012
Original release firmware.

********
UPDATING
********

The In-System firmware update is available for 99/4A users. I am very thankful to Rasums and Tursi for their help in making this possible. You can download the F18AUpdate_vXX.zip file below. Detailed instructions are available on my website here: http://codehackcreate.com/archives/418

Alternatively you can update your F18A in any system via a JTAG programming cable. You can purchase a JTAG programming cable for about $59 USD from Digilent:

JTAG HS3 programming cable/

This is very inexpensive for a JTAG cable (my Xilinx-brand cable was over $250!), and Digilent makes quality gear.

You also need the Xilinx ISE-Webpack tools:

http://www.xilinx.com/support/download/index.htm

This is a free download from Xilinx, but it is BIG! About 6GB the last time I checked. There is a smaller download that contains just the programming tools called "Lab Tools" and is only about 1G. I'm still looking for a smaller / simpler solution. You will have to create an account (which is free). The primary program you need is called IMPACT and is used to program the FPGA and SPI-flash.

Once you get the tools installed, download and unzip the f18a_250k_vXX.zip file. In the zip file you will find the MCS file:

f18a_250k_vXX.mcs

The .mcs file is used to update the SPI-flash ROM attached to the FPGA. Here are the quick instructions.

The term "system" means your 99/4A, ColecoVision, MSX, etc., and "PC" means the modern personal computer you are running the Xilinx tools on.

0. Make sure your system is powered OFF to begin
1. Open your system to get physical access to the F18A
2. Plug the JTAG programmer in to your PC (via USB) and the F18A (via JTAG)
3. Power ON your system
4. Launch the Xilinx IMPACT tool
5. Double-click on "Boundary Scan", then right-click in the main area and select "initialize chain"
6. The FPGA should be detected and show up in the big area. A window will open with device properties, just click "ok"
7. Above the FPGA icon should be a dotted line with "SPI/BPI ?" in it. Right-click on that box and select "Add SPI/BPI Flash..."
8. Navigate to the f18a_250k_vXX.mcs file you extracted from the .zip file and choose "Open"
9. Select "SPI PROM" and "M25P80" from the two drop-down selections and click "OK"
10. The box above the FPGA should now say "FLASH" in it. Right-click the box and select "Program"

Once the programming is finished, cycle power on your system and make sure it comes up.

********
Examples
********

Included in the zip file is a demos disk that shows many of the enhanced features of the F18A. The source for all the programs are included. I did not write these programs and I am very thankful to Rasmus and Tursi for contributing them.

rasmus_scroll.zip

F18A documentation.pdf

f18a_register_use.zip

F18A_V19.zip

Edited January 2, 2019 by matthew180

+5-11under · January 14, 2013

I'm interested in creating games that could be enhanced by the F18A, but would work normally without the F18A. For instance, I'd like to know how to do the following, if possible:

- Check to see if an F18A is present

- change one or more of the colours

- Use 2 or more colors per sprite

Any help would be appreciated!

Tursi · January 14, 2013

Matthew has a supported register-testing method for detecting the F18A, but I coded a simpler method into my slideshow program, which was to simply to upload a tiny GPU program to VDP memory that changed a byte and then stopped. Then set the registers to execute the program and check the VDP memory. If it changed, there was a GPU present.

Enhancing titles for F18A seems the best way forward!

matthew180 · January 15, 2013

And Tursi didn't share his idea until now. ;-) That is a good idea, using the GPU to set a byte is probably a lot less code than the test I wrote. I'll use Tursi's method below in the example, but before you can detect the F18A you have to unlock it.

The F18A defaults to a "locked" mode of operation to prevent legacy software from accidentally enabling any of the enhanced features. I added the lock because during testing some ColecoVision games were causing strange behavior which I discovered was due to the software writing to VDP registers over register 7. Since the 9918A only has 8 registers (0 to 7), it did not matter, and the higher values were simply masked to a number between 0 and 7.

But, the F18A supports VDP register values from 0 to 63 which is how you take advantage of the new features. This is also how the 9938/58 add additional features, and the datasheets for the 9918A indicates registers over 7 are reserved. However, that didn't stop some software from not following the rules, and on the 9918A/9928/9929 the bad behavior did not have any impact. But, the F18A has to protect itself from that old software, thus it powers up locked.

Since the unlocking sequence has to be performed "in band", i.e. using the standard 9918A registers, I had to come up with a way that would would never happen on the real 9918A. VR1 is probably the most critical VDP register since it contains most of the mode bits plus the memory size bit, thus it is VR1 in the form of VR57 (VR57 the same as VR1 on a non-F18A system) that is used to unlock the F18A.

Unlocking is done by writing >1C to VR57 twice, which on a real 9918A VDP is the same as writing to VR1. The value >1C was chosen because it sets the bits in VR1 to something you would never do on a real 9918A, even accidentally because it makes the VDP almost useless. And to write such a value twice, consecutively, is hopefully beyond all probability of happening accidentally.

Value >1C in VR1 looks like this:

|4/16K|BLANK| IE0 | M1  | M2  |  X  |SIZE | MAG |
|  0  |  0  |  0  |  1  |  1  |  1  |  0  |  0  |

By writing >1C on the real 9918A you are setting 4K VRAM, blank the screen, no interrupts, both M1 and M2 to '1' which is an illegal mode, and a '1' to the unused bit-5 that the datasheet indicates should always be '0'. This would pretty much make the real 9918A useless, and any working software would never operate with this combination of bits in VR1.

Writing to VR57 (binary: 111001) is VR1 on the 9918A which only sees the low 3-bits "001", and must be done twice in a row with no other CPU-to-VDP access. On the F18A you will be writing to VR57, not VR1, and after two consecutive writes the ERM (Enhanced Register Mode) will be unlocked. Any further writes to VR57 after being unlocked will re-lock the F18A.

Because writing >1C to VR1 on the real 9918A would mess up the video mode and other critical VDP configuration, a write to VR1 should immediately follow the unlock sequence if you care to detect the F18A and write software that works on both the 9918A and F18A.

Thus you would have something like:

VDPERM
      LIMI 0                   * Interrupts must be off
      LI   R0,>391C            * VR1/57, value 00011100
      BL   @VWTR               * Write once
      BL   @VWTR               * Write twice, unlock
      LI   R0,>01E0            * VR1, value 11100000, a sane setting
      BL   @VWTR               * Write

Note that I'm using my version of VWTR here, not the E/A (or XB) versions.

Now you can test for the F18A, which is coming up in my next post.

matthew180 · January 15, 2013

To test for the F18A, I'm going to use Tursi's idea of using the GPU, which should make for a smaller test. Assuming the F18A unlock sequence has been performed, a small GPU program will be loaded to the VRAM and executed that will change 1 byte in VRAM. If the byte changed, the F18A is present, otherwise the system is running a stock VDP.

The GPU is a slightly modified 9900 CPU so you can use any standard 9900 assembler to write code for the F18A's GPU. Since the GPU is inside the VDP it can only access the VRAM, plus an additional 2K of memory above the normal 16K of VRAM. The GPU's memory map looks like this:

VRAM 14-bit, 16K @ >0000 to >3FFF (0011 1111 1111 1111)
GRAM 11-bit, 2K  @ >4000 to >47FF (0100 x111 1111 1111)
PRAM  7-bit, 128 @ >5000 to >5x7F (0101 xxxx x111 1111)
VREG  6-bit, 64  @ >6000 to >6x3F (0110 xxxx xx11 1111)
current scanline @ >7000 to >7xx0 (0111 xxxx xxxx xxx0)
blanking         @ >7001 to >7xx1 (0111 xxxx xxxx xxx1)
32-bit counter   @ >8000 to >8xx6 (1000 xxxx xxxx x110)
32-bit rng       @ >9000 to >9xx6 (1001 xxxx xxxx x110)
F18A version     @ >A000 to >Axxx (1010 xxxx xxxx xxxx)
GPU status data  @ >B000 to >Bxxx (1011 xxxx xxxx xxxx)

"GRAM" means GPU-RAM and has nothing to do with "GROM or GRAM" of the TI console. It is just a coincidence. PRAM is the palette RAM in the F18A, and VREG is the VDP registers to which the GPU has full read/write access.

The program will be loaded up high in VRAM. I like >3F00 for no particular reason, other than it is 256 bytes from the top of VRAM and probably unused unless there is disk access going on (which there won't be during the test).

This is the code that will be loaded into VRAM for the GPU to execute:

0000 3F00        DEF  MAIN
                AORG >3F00
         MAIN
3F00 04E0        CLR  @>3F00
3F02 3F00  
3F04 0340        IDLE
3F06 0000        END

That is a total of 6 bytes of assembly, which is pretty small for the test. The GPU will clear the word at >3F00, which in this case is the CLR instruction's opcode itself. You have to love self modifying code. :-) After the code runs, the value at VRAM >3F00 be >00 if the F18A is present, otherwise it will be >04 on a stock VDP.

This is the code to load the program to VRAM. I'm including all the support routines here too so it is a complete program:

      DEF MAIN

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
*
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines

GPU
      DATA >04E0             * 3F00 04E0        CLR  @>3F00
      DATA >3F00             * 3F02 3F00
      DATA >0340             * 3F04 0340        IDLE
GPUEND

MAIN
      LIMI 0
      LWPI WRKSP

*      F18A Unlock
      LI   R0,>391C          * VR1/57, value 00011100
      BL   @VWTR             * Write once
      BL   @VWTR             * Write twice, unlock
      LI   R0,>01E0          * VR1, value 11100000, a real sane setting
      BL   @VWTR             * Write reg

*      Copy GPU code to VRAM
      LI   R0,>3F00
      LI   R1,GPU
      LI   R2,GPUEND-GPU
      BL   @VMBW

*      Set the GPU PC which also triggers it
      LI   R0,>363F
      BL   @VWTR
      LI   R0,>3700
      BL   @VWTR

*      Compare the result in >3F00
      LI   R0,>3F00
      BL   @VRAD
      MOVB @VDPRD,R0
      JEQ  PASS

*
FAIL

*
PASS


*********************************************************************
*
* VDP Set Write Address
*
* R0   Address to set VDP address counter to
*
VWAD   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set the two MSbits to 01 for write
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
      ANDI R0,>3FFF          * Restore R0 top two MSbits
      B    *R11
*// VWAD / VRAD


*********************************************************************
*
* VDP Set Read Address
*
* R0   Address to set VDP address counter to
*
VRAD   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ANDI R0,>3FFF          * Make sure the two MSbits are 00 for read
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
      B    *R11
*// VRAD


*********************************************************************
*
* VDP Multiple Byte Write
*
* R0   Starting write address in VDP RAM
* R1   Starting read address in CPU RAM
* R2   Number of bytes to send to the VDP RAM
*
* R1 is modified by the value of R2
* R2 is changed to 0
*
VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set the two MSbits to 01 for write
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM
      DEC  R2                * Byte counter
      JNE  VMBWLP            * Check if done
      ANDI R0,>3FFF          * Restore R0 top two MSbits
      B    *R11
*// VMBW


*********************************************************************
*
* VDP Write To Register
*
* R0 MSB    VDP register to write to
* R0 LSB    Value to write
*
VWTR   MOVB @R0LB,@VDPWA      * Send low byte (value) to write to VDP register
      ORI  R0,>8000          * Set up a VDP register write operation (10)
      MOVB R0,@VDPWA         * Send high byte (address) of VDP register
      ANDI R0,>3FFF          * Restore R0 top two MSbits
      B    *R11
*// VWTR

      END

This code triggers the GPU:

*      Set the GPU PC which also triggers it
      LI   R0,>363F
      BL   @VWTR
      LI   R0,>3700
      BL   @VWTR

The PC (program counter) in the GPU is 16-bit, just like the normal 9900, so it takes two bytes to set up the address. VR54 (>36) is the MSB and VR55 (>37) is the LSB. After writing the LSB to VR55, the GPU automatically triggers and begins execution as the address just set up. In this case it executes the CLR instruction, then goes idle via the IDLE instruction which is perfectly fine on the GPU (don't use IDLE in the 9900 in your 99/4A though!)

Now the value at >3F00 is tested. The VRAD routine sets up a VDP read address without doing a read.

*      Compare the result in >3F00
      LI   R0,>3F00
      BL   @VRAD
      MOVB @VDPRD,R0
      JEQ  PASS

The MOVB moves the byte at >3F00 in VRAM into the MSB of R0. R0 will now be >0000 if the GPU was present, or >0400 on a stock VDP. The MOVB instruction will automatically compare R0 to zero, so the JEQ will cause a jump if the R0 == 0, i.e. the F18A is present. Or you can put JNE if you need the opposite jump.

Note that writing to VR54 and VR55 is the same as VR6 and VR7 on a stock VDP, so if the test for the F18A fails, you should restore those values to something sensible, or simply set up your VDP accordingly now that you know if you have an F18A or stock VDP (9918A/9928/9929).

Edited January 15, 2013 by matthew180

matthew180 · January 15, 2013

- change one or more of the colours

By "change colour" I'm going to assume that you are talking about changing a palette register.

The F18A has 64 palette registers (PR) that are 12-bits each, which gives it a color palette of 4069 colors. Which Palette Register (PR) is used to specify the color of a given pixel depends on a lot of settings. The PRs are grouped into "banks" depending on how many bits are used to resolve a pixel's color. In the 9918A compatible modes (1-bit per pixel (bpp)), there are 4-banks of 16-colors each. VR49 has two bits that control which of the 4-banks will be used for tiles in 1-bpp modes, which means you could set up each of the 4-banks with 16 different colors and change to the new palette with a single register write.

In the Enhanced Color Modes (ECM), there are more bits used to specify a single pixel's color, and thus the number of palette banks grows, but the number of colors in each bank shrinks.

With 2-bpp, there are 16-banks with 4-colors each, and the 2-bits for each pixel select the color from the bank. With 3-bpp, there are 8-banks with 8-colors each.

There are a lot of options for colors, but in this example I'll stick to just updating the palette registers themselves which allows you to use any of the 4096 colors.

*NOTE* palette changes survive a soft-reset! If you modify the palette and then exit, those changes will remain in effect until the system is power-cycled or hard reset (a cartridge is plugged in, etc.)

Palette registers are numbered 0 to 63 and consist of 12-bits to specify a color in the format:

|  BYTE 1  |  BYTE 2  |
| ----rrrr | ggggbbbb |

Because there is only one "data write port" to the 9918A (mapped at address >8C00 on the 99/4A) and subsequently to the F18A as well, the F18A has a "Data Port Mode", controlled by a bit in VR47, to select between writing data to VRAM or to Palette Registers.

Two byte writes are required to update a single palette register. Palette registers are written to (they cannot be read by the host CPU) by setting the Data Port Mode (DPM) bit in VR47 to 1, then writing to the VDP data port as normal. After the second byte is written, the 12-bit color will be latched into the specified palette register. Side note: a nice advantage of the GPU is that it has full read/write access to the palette registers using normal word instructions like MOV.

If the auto increment bit in VR47 was not set, then the DPM automatically falls back to the default "write to VRAM" mode after the second palette byte has been written.

If a large number of palette registers need to be updated, setting the auto increment flag will keep the DPM in palette mode until VR47 is written again to return to normal VRAM write mode, or the palette address rolls over to 0, which will force the DPM back to VRAM mode. This is a fail-safe to prevent the VDP from inadvertently getting stuck in the write-to-palette DPM.

DMP is also exited any time *any* VDP status register is read, the VDP is externally reset, the palette address rolls over to 0, or by setting VR47 DPM bit to 0.

VR47 controls data-port mode and palette address:

0  |     1    | 2 3 4 5 6 7  |
DPM | AUTO INC | PAL REG ADDR |

DPM = Data Port Mode

0 = VDP data writes go to VRAM as normal

1 = VDP data writes go to the palette

AUTO INC = Auto Increment

0 = Do NOT increment the palette address after a single *palette* write, which consists of *TWO* bytes written to the VDP. After the second byte has been received, the addressed palette register will be updated and the Data Port Mode defaults back to normal VRAM for writes to the VDP. This mode of operation is intended for updating a single palette register at a time.

1 = Increment the palette address every time a palette register is updated, which consists of *TWO* bytes written to the VDP with the DPM bit = 1. This allows multiple palette registers to be updated consecutively and quickly.

PAL REG ADDR = Palette Register Address

This is the address of the single palette register to update, or the first palette regiser to update when AUTO INC = 1.

Here is an example of writing PR4, which is normally "dark blue" (RGB: 54F) to a pure blue (RGB: 00F):

      LI   R0,>2F84          * Reg 47, value: 1000 0100, DPM = 1, AUTO INC = 0, PR4.
      BL   @VWTR
      LI   R1,>000F          * RGB: 00F, or pure blue in place "dark blue"
*      Two bytes written to the VDP now go to PR1
      MOVB R0,@VDPWD
      SWPB R0
      MOVB R0,@VDPWD

After the second write (MOVB instruction), the DPM will fall back to normal VRAM mode since the auto-increment bit in VR47 was not set.

If you were going to update a whole set of registers, then you could use the auto-increment feature. If your update does not cause the PR number to roll over to zero, then you must leave DPM mode after updating:

*      Update the first 7 palette values from the host CPU
*      Palette 0 is not updated to keep the screen color stable.
      LI   R0,PAL0+2
      LI   R1,>0111          * Add 1 to each R,G,B value
      LI   R2,7
INCPAL
      A    R1,*R0+           * Update the 12-bit color
      DEC  R2
      JNE  INCPAL

      LI   R0,>2FC1          * Reg 47, value: 1100 0001, DPM = 1, AUTO INC = 1, start PR1.
      BL   @VWTR
*      Every two bytes written to the VDP now go to the palette registers.
      LI   R0,PAL0+2
      LI   R2,14             * Each 12-bit palette entry requires 2 bytes
UPDPAL
      MOVB *R0+,@VDPWD
      DEC  R2
      JNE  UPDPAL
      LI   R0,>2F00          * Reg 47, value: 0000 0000, exit DMP
      BL   @VWTR


**
* Standard color palette
*
* 12-bit color format: ---rrrrggggbbbb
      EVEN
PAL0
      DATA >0000             * Transparent
      DATA >0000             * Black
      DATA >02C3             * Medium Green
      DATA >05D6             * Light Green
      DATA >054F             * Dark Blue
      DATA >076F             * Light Blue
      DATA >0D54             * Dark Red
      DATA >04EF             * Cyan
      DATA >0F54             * Medium Red
      DATA >0F76             * Light Red
      DATA >0DC3             * Dark Yellow
      DATA >0ED6             * Light Yellow
      DATA >02B2             * Dark Green
      DATA >0C5C             * Magenta
      DATA >0CCC             * Gray
      DATA >0FFF             * White
PAL0E

Edited January 15, 2013 by matthew180

+InsaneMultitasker · January 15, 2013

Controller cards (i.e., TI, CorComp, BwG, CF7+) which use VDP for disk buffers usually store a copy of the last accessed disk's sector 0 from 0x3ef5 to 0x3ff4. I do not recall if this sector is ever flushed to disk or if it is a read-only copy. If the former, blanking out 0x3F00 could result in changing the disk bitmap. It's likely improbably, but worst case the sectors at this bitmap address would be marked as available, creating a future over-write condition. I recommend you save and restore this byte to be safe.

matthew180 · January 15, 2013

Good to know, and anyone doing F18A detection will have to determine their situation. In these examples I'm assuming the programs are probably games (but maybe not), and that this detection is done at program initialization, i.e. before any disk access or such. Also, in this case all 6-bytes would need to be saved/restored.

However, it is also possible to write the bytes to any VRAM address, i.e. the name table is a perfectly good place too, and probably safer since the name table is usually under user program control, even in environments like XB.

Willsy · January 15, 2013

This is really good information. Thanks. It seems a lot more straightforward to do register/pallette writes etc on the GPU side, rather than the 9900 side. I'll probably explore this method more.

matthew180 · January 15, 2013

- Use 2 or more colors per sprite

Multi-color sprites.

The color enhancements to sprites and tiles, as far as the pattern representation goes, works the same way. As you know, the original VDP could support 2-colors per sprite with one of the colors always being transparent, thus we tend to think of sprites a having one color. The sprite's color could be any of the sixteen original colors, including transparent which presents some interesting possibilities.

In the original sprite pattern data, a '0' bit specifies a transparent pixel and also does not count in collision detection. However, a '1' bit in the pattern specifies that the sprite has a pixel at that location, regardless of the color. That is an important distinction to keep in mind, because is allows you to have a pixel (1-bit in the pattern) that is transparent, i.e. the sprite's main color is set to 0.

So, the bit patterns in the original sprite (and tile) modes don't actually represent the color, they represent if the sprite has a pixel at that location. The color data is derived from a different location, and in the case of sprites the color comes from the Sprite Attribute Table (SAT) entry for the given sprite.

When moving to the ECM (enhanced color modes) for sprites and tiles, the pattern data itself *does* actually select a color from a palette. However there is a distinction between tiles and sprites when it come to the "zero-index", i.e. color data "0", "00", or "000". For sprites the zero-index is *ALWAYS* transparent, so the number of actual colors you can display is 1, 3, or 7 vs 2, 4, or 8. In the ECMs for tiles there is an attribute byte for each tile "name" (0-255) were you can specify if the zero-index is transparent or the color at that index.

To provide more pixel data, the extra pattern bits need to come from some where. To keep some sort of compatibility with existing patterns, I chose to implement the extra bits via "bit planes". This allows you to start off with some existing sprite or tile patterns, and expand them to support more colors at a later time in your development. Also, the 3-bpp mode does not pack neatly into a single byte had I tried to use a linear bit-packing method.

The down-side to bit-planes is that making patterns is more of a pain. Luckily sometimes99er has implemented some initial support for the multi-color sprites via his sprite editor.

For example, for a 2-bpp pattern you need two bits to specify which of the 4-colors to use for a given pixel. There are four possible values:

 
bits | index color
00   | 0
01   | 1
10   | 2
11   | 3

.

This is simply binary representation, and for 3-bpp it becomes "000" to "111". Note that the least significant bit comes from the original pattern table (bit plane), and the 2nd or 3rd bits come from subsequent bit-planes.

For example, here are two pattern bytes that will have four pixels next to each other, each being one of the four possible colors in a given palette (which is specified by a tile's or sprite's attribute byte).

 
01010000 pattern-plane 0
00110000 pattern-plane 1, 2K (2048) byte-offset from bit-plane 0 (the original pattern table)
--------
01230000 color index values.

.

For each byte that makes up a sprite's or tile's pattern, there are one or two more bytes in the additional pattern-planes that are used to make the final color index for a given pixel. The additional pattern-planes are always 2K (and 4K for 3-bpp) bytes offset from the Sprite Pattern Generator Table (SPGT). This means that in 1-bpp the SPGT is the normal 2K, for 2-bpp the SPGT is 4K, and 3-bpp it is 6K.

To get a pixel's color index you combine the bits *vertically* from each pattern byte in each plane. So, the second pixel is "01", or index 1. The byte from the first pattern-plane represents the LSbit in the final index value. Shown below is the sprite data using 3-bpp to show all eight colors in a single row:

 
bits | color index
000  | 0 (0 is always transparent for sprites)
001  | 1
010  | 2
011  | 3
100  | 4
101  | 5
110  | 6
111  | 7

the bits are combined vertically
 | | | | | | | |
 V V V V V V V V
|0|1|0|1|0|1|0|1| LSbit pattern-plane 0, 2K total
|0|0|1|1|0|0|1|1|       pattern-plane 1, 4K total, 2048 bytes offset from the SPGT
|0|0|0|0|1|1|1|1| MBbit pattern-plane 2, 6K total, 4096 bytes offset from the SPGT
 | | | | | | | |
 V V V V V V V V
|0|1|2|3|4|5|6|7| color index values.

.

The first step to making a multi-color sprite is to make a pattern that has data for all the bit-planes, which depends on the number of colors you want. You set up the sprite tables as you normally would, load the patterns, set up the SAT, and finally enable the ECM for sprites.

VR49 controls the ECM for both tiles and sprites:

 
|   0    |   1   |   2     3   |    4   |   5  |   6     7   |
FIXED_EN | ROW30 | ECMT0 ECMT1 | Y_REAL | LINK | ECMS0 ECMS1 |

.

The bit fields for ECM(T)iles and ECM(S)sprites are:

00 - 0 - original 9918A mode

01 - 1 - 1-bpp

10 - 2 - 2-bpp

11 - 3 - 3-bpp

Thus, to enable sprites to use 2-bpp just write >02 to VR49. Heh, all that talking just to say that... Really all the detail is in setting up the patterns, which is just additional data written to the VRAM.

Edited September 23, 2013 by matthew180

matthew180 · January 16, 2013

Plotting Pixels on the Bitmap Layer (BML) and GM2.

The GPU can plot a BML pixel, given an XY location, in a single instruction. It can also read a pixel, conditionally set a pixel based on the current pixel color, read and write a pixel at the same time, just calculate a pixel's VRAM address, or calculate a GM2 pixel's address!

I call the new instruction PIX, and it uses the same opcode as the 9900's XOP instruction, so you can use any 9900 assembler to code the PIX instruction. The F18A GPU does not have a Workspace Pointer (since its registers are hard-wired instead of memory-base), so XOP was not implemented.

The XOP format is multi-addressing for the source, and workspace register for the destination. This makes it very flexible for the PIX instruction. Here are the options you can use with PIX:

Format: MAxxRWCE xxOOxxPP

M - 1 = calculate the effective address for GM2 instead of the new bitmap layer
    0 = use the remainder of the bits for the new bitmap layer pixels
A - 1 = retrieve the pixel's effective address instead of setting a pixel
    0 = read or set a pixel according to the other bits
R - 1 = read current pixel into PP, only after possibly writing PP
    0 = do not read current pixel into PP
W - 1 = do not write PP
    0 = write PP to current pixel
C - 1 = compare OO with PP according to E, and write PP only if true
    0 = always write
E - 1 = only write PP if current pixel is equal to OO
    0 = only write PP if current pixel is not equal to OO
OO - pixel to compare to existing pixel
PP - new pixel to write, and previous pixel when reading

The source value is the XY location as two bytes, the X being the MSB. Since the XOP supports multiple addressing for the source parameter, you can use a register or memory location. XY values are 0 to 255.

The destination parameter is the PIX instruction as indicated above. If you use the M or A operations (calculate addresses only), the destination register will contain the address after the instruction has executed. If you use the R operation, the read pixel will be in PP (over writes the LSbits). You can read and write at the same time, in which case the PP bits are written first and then replaced with the original pixel bits.

Example (this is code running on the GPU):

LI  R0,>2020 * xy=32,32
LI  R1,>0001 * write a pixel of "01"
XOP R0,R1 * PIX R0,R1

- or -
EVEN * make sure XPIX is an even address
XPIX BYTE 50
YPIX BYTE 50
.
.
.
LI  R1,>0801 * Read existing pixel at XPIX,YPIX and write a "01" pixel in its place
XOP @XPIX,R1

- or -
LI  R1,>0302 * ONLY write a 2("10") pixel if the current pixel is 0("00")
XOP @XPIX,R1

- or -
LI  R1,>0213 * ONLY write a 3("11") pixel if the current pixel is NOT 1("01")
XOP @XPIX,R1

- or -
LI  R1,>8000 * Get the GM2 effective address of the pixel at XY location
XOP @XPIX,R1 * R1 now contains the VRAM address byte containing the pixel.
* Doing (XPIX AND >07) will isolate the bit in the specified byte.

The PIX instruction was really designed to assist with the BML, so using it with GM2 does require a little extra work to update the pixel in the appropriate byte. However, the address of the byte that contains the pixel to be updated it calculated for you, which replaces all this code (from the E/A manual page 336):

MOV  R1,R4
SLA  R4,5
SOC  R1,R4
ANDI R4,>FF07
MOV  R0,R5
ANDI R5,7
A    R0,R4
S    R5,R4

This is a very nice routine, and it took me a long time to figure out how it worked. But once I did, I was very impressed with what was going on, and I was also intrigued to know that all this code does is bit-twiddling. Since bit-twiddling is something that takes a lot of work via programming, but something that hardware does naturally, all that code can be replaced with a single bit of hardware (shown here as HDL):

gm2_addr <= "00" & (
(pgba & src_oper(8 to 12) & "00000" & src_oper(13 to 15)) +    -- y / 8 * 256 + (y % 
("0000" & src_oper(0 to 4) & "000"));       -- + (x AND >F8) (mask out the pixel index bits)

Two dashes -- are comments in HDL. So, one adder and some bit twiddling and the address is calculated in 10ns. Not that you need to know that, but I thought it was interesting.

Edited September 23, 2013 by matthew180

matthew180 · March 20, 2013

See the first thread in this post for the firmware download and instructions for people using a JTAG cable for the update. The in-system update is not ready yet, sorry.

March 20, 2013 firmware update:

* Fixed the sprite collision bug

* Fixed the text1 (40-column mode) bug when the F18A 30-row option is enabled

* Updated default built-in GPU support functions

TheMole · April 4, 2013

Is there a list of software that is out there that is written to support the F18A (even regardless of platform, CV also ok)? I know of Tursi's slideshow program, but apart from that I have no idea.

You asked before what might be missing out there to get some traction on F18A (game) development. I think a set of higher level language constructs might be useful for some of us (e.g. some XB CALL LINK routines or somesuch).

Either way, now that I've received mine, my first project will be porting my Alex Kidd proof-of-concept to the F18A. Hopefully I can turn that into a full side-scrolling platform game (if not a remake of the original). With a bit of luck I can set up the hardware this week-end.

Asmusr · April 4, 2013

Could you post a list of the updated built-in GPU support functions?

matthew180 · April 6, 2013

The pre-loaded routines are probably not as much as you might think. Rather than try to guess what everyone might need, I decided to keep it to a minimum (and not hold up the update any longer) and just provide some sort of software library later.

The initial firmware had two pre-loaded routines, a block copy and font load. In the v1.4 update I just added a minimal number of routines, and made room for the user to add their own routines using the same parameter mechanism and vector table.

BLKCPY * Block Copy

FONTLD * Font Load

GETINF * Get catalog version, free memory, vector tables

GETIDX * Get a catalog index entry

BLOBLD * Load a data blob from the catalog

The firmware has a catalog file that I had hoped to fill with more routines, sound data, patterns, etc. but it never happened (too much work, not enough time). The catalog currently contains the pre-loaded code itself (so the F18A can be software reset if desired), the default palettes, and about 22 character sets (patterns for tiles 0 to 255).

The pre-loaded code is attached to this post.

gpu_preload.zip

Asmusr · April 12, 2013

Perhaps a crazy idea, but would it be possible to run an assembler on the F18A GPU? Being able to assemble fast would reduce the pain of programming on a real TI considerably. But I guess it would be difficult to fit an assembler and the source code into the 16K VDP memory + 2K, or what? This bring me to my actual question: If you want to run some code on the GPU that uses branch instructions and not just jump instructions, how would you produce and 'upload' that code? Thanks.

matthew180 · April 13, 2013

You could run an assembler on the F18A, but you would have to write it from scratch probably. I don't think the E/A could be fixed up to run on the GPU.

As for loading GPU code, you assemble using any 99/4A compatible assembler (E/A, asm994a, etc.), but you have to use AORG with a VRAM value from >0000 to >47FE, just like you do for cartridge development (except the cart uses AORG >6000 and is in host CPU RAM, not VDP VRAM). Once you have the opcodes, you have to include them in an E/A loadable assembly program, or load the data from disk at runtime. It gets a little tedious I know, and eventually I hope to have a few tools to help with development.

Using the branch instructions are just like any others, you just use them as you would in any assembly program. With AROG, all the references are resolved at compile time and the code will only run correctly if loaded at the specified address.

I did a lot of this kind of code when testing the F18A. Here is an example of the GPU and host (99/4A) programs I used to validate the GPU's jump instructions.

This is the GPU code, i.e. to be included in the host assembly program and loaded to VRAM for execution by the GPU:

* F18A GPU Test
* Matthew Hagerty
* June 13, 2012
*
* Test jump instructions

      DEF MAIN
      AORG >3F10
MAIN   IDLE
      MOV  @>3F00,@JINST     * Jump opcode to execute at >3F00, >3F01
      CLR  R0
      LI   R14,JINST
      MOV  @>3F02,R15        * Flag value at >3F02, >3F03
      RTWP                   * R14->PC, R15->status flags
JINST  DATA 0                 * Replaced by opcode
      INC  R0                * Only executed if jump falls through
      MOV  R0,@>3F00         * R0 result in >3F00, >3F01
      B    @MAIN
      END

I take the listing and convert the opcodes to DATA statements to be included in the host assembly program:

Asm994a TMS99000 Assembler - v3.010

               * Asm994a Generated Register Equates
               *
     0000 0000 R0      EQU     0 
     0000 0001 R1      EQU     1 
     0000 0002 R2      EQU     2 
     0000 0003 R3      EQU     3 
     0000 0004 R4      EQU     4 
     0000 0005 R5      EQU     5 
     0000 0006 R6      EQU     6 
     0000 0007 R7      EQU     7 
     0000 0008 R8      EQU     8 
     0000 0009 R9      EQU     9 
     0000 000A R10     EQU     10
     0000 000B R11     EQU     11
     0000 000C R12     EQU     12
     0000 000D R13     EQU     13
     0000 000E R14     EQU     14
     0000 000F R15     EQU     15
               *
  1            * F18A GPU Test
  2            * Matthew Hagerty
  3            * June 13, 2012
  4            *
  5            * Test jump instructions
  6            
  7  0000 3F10        DEF MAIN
  8                   AORG >3F10
  9  3F10 0340 MAIN   IDLE
 10  3F12 C820        MOV  @>3F00,@JINST     * Jump opcode to execute at >3F00, >3F01
 10  3F14 3F00  
 10  3F16 3F24  
 11  3F18 04C0        CLR  R0
 12  3F1A 020E        LI   R14,JINST
 12  3F1C 3F24  
 13  3F1E C3E0        MOV  @>3F02,R15        * Flag value at >3F02, >3F03
 13  3F20 3F02  
 14  3F22 0380        RTWP                   * R14->PC, R15->status flags
 15  3F24 0000 JINST  DATA 0                 * Replaced by opcode
 16  3F26 0580        INC  R0                * Only executed if jump falls through
 17  3F28 C800        MOV  R0,@>3F00         * R0 result in >3F00, >3F01
 17  3F2A 3F00  
 18  3F2C 0460        B    @MAIN
 18  3F2E 3F10  
 19  3F30 0000        END
 19            


Assembly Complete - Errors: 0,  Warnings: 0


------ Symbol Listing ------

JINST  ABS:3F24 JINST
MAIN   ABS:3F10 MAIN
R0     ABS:0000 R0
R1     ABS:0001 R1
R10    ABS:000A R10
R11    ABS:000B R11
R12    ABS:000C R12
R13    ABS:000D R13
R14    ABS:000E R14
R15    ABS:000F R15
R2     ABS:0002 R2
R3     ABS:0003 R3
R4     ABS:0004 R4
R5     ABS:0005 R5
R6     ABS:0006 R6
R7     ABS:0007 R7
R8     ABS:0008 R8
R9     ABS:0009 R9

Here is the host-side assembly with the GPU program included as data. It will be copied to the VRAM at the location used in the AORG, which is >3F10 in this case. Once in VRAM, the GPU can execute the code.

* F18A CPU to GPU jump instruction test driver
* Matthew Hagerty
* June 4, 2012
*
* 99/4A driver for the GPU jump instruction test.
* Each jump instruction is executed, then the GPU executes
* the same jump instruction and the jump / no-jump result
* is compared.

      DEF MAIN

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
*
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines
R1LB   EQU  WRKSP+3           * R1 low byte
R2LB   EQU  WRKSP+5           * R2 low byte
R3LB   EQU  WRKSP+7           * R3 low byte
R4LB   EQU  WRKSP+9           * R4 low byte
R5LB   EQU  WRKSP+11          * R5 low byte
R6LB   EQU  WRKSP+13          * R6 low byte
* R5  R6  R7  R8  R9  R10 R11 R12 R13 R14 R15
* 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31

SAVR0  DATA 0
PASFAL DATA 0                 * Pass/fail flag
TXTPAS TEXT 'pass'
HEX    TEXT '0123456789ABCDEF'
SPACE  BYTE 32                * space character

      EVEN
GPU
      DATA >0340             * 3F10 0340 MAIN   IDLE
      DATA >C820             * 3F12 C820        MOV  @>3F00,@JINST     * Jump opcode to execute at >3F00, >3F01
      DATA >3F00             * 3F14 3F00
      DATA >3F24             * 3F16 3F24
      DATA >04C0             * 3F18 04C0        CLR  R0
      DATA >020E             * 3F1A 020E        LI   R14,JINST
      DATA >3F24             * 3F1C 3F24
      DATA >C3E0             * 3F1E C3E0        MOV  @>3F02,R15        * Flag value at >3F02, >3F03
      DATA >3F02             * 3F20 3F02
      DATA >0380             * 3F22 0380        RTWP                   * R14->PC, R15->status flags
      DATA >0000             * 3F24 0000 JINST  DATA 0                 * Replaced by opcode
      DATA >0580             * 3F26 0580        INC  R0                * Only executed if jump falls through
      DATA >C800             * 3F28 C800        MOV  R0,@>3F00         * R0 result in >3F00, >3F01
      DATA >3F00             * 3F2A 3F00
      DATA >0460             * 3F2C 0460        B    @MAIN
      DATA >3F10             * 3F2E 3F10
GPUEND

MAIN
      LIMI 0
      LWPI WRKSP

*      F18A blind unlock, no testing for success or failure.
*      Perform the Enhance Register Mode (ERM) unlock sequence
*      for the F18A.
      LI   R0,>391C          * VR1/57, value 00011100
      BL   @VWTR             * Write once
      BL   @VWTR             * Write twice, unlock
      LI   R0,>01E0          * VR1, value 11100000, a real sane setting
      BL   @VWTR             * Write reg

*      Copy GPU code to VRAM
      LI   R0,>3F10
      LI   R1,GPU
      LI   R2,GPUEND-GPU
      BL   @VMBW


      LI   R0,33             * 1,1 screen location to start output
      LI   R7,33             * Screen next row
      LI   R3,JLIST
      LI   R13,WRKSP         * Used in the RTWP instruction and will never change
      LI   R14,JINST

*      Instruction loop
ILOOP
      BL   @VWAD
      MOVB *R3+,@VDPWD       * Display the instruction name
      MOVB *R3+,@VDPWD
      MOVB *R3+,@VDPWD
      MOVB *R3+,@VDPWD
      AI   R0,5              * Adjust past the name and add a space

      MOV  *R3,@JINST        * Copy the jump opcode for execution
      CLR  R15               * Reset the count
      CLR  @PASFAL           * Clear the pass/fail flag

CLOOP  CLR  R5
      RTWP                   * R13->WP, R14->PC, R15->status flags
JINST  DATA 0                 * Replaced with jump instruction opcode
      INC  R5

*      Copy to VRAM for GPU
      MOV  R0,@SAVR0         * Temp save R0
      LI   R0,>3F00
      BL   @VWAD

      MOV  *R3,R0
      MOVB R0,@VDPWD         * Jump opcode >3F00, >3F01
      MOVB @R0LB,@VDPWD

      MOV  R15,R0
      MOVB R0,@VDPWD         * Current flag to >3F02, >3F03
      MOVB @R0LB,@VDPWD

*      Set the GPU PC which also triggers it
      LI   R0,>363F
      BL   @VWTR
      LI   R0,>3712
      BL   @VWTR

*      Compare the result in >3F01
      LI   R0,>3F01
      BL   @VRAD
      MOV  @SAVR0,R0         * Restore R0

      CB   @VDPRD,@R5LB
      JEQ  CNEXT

*      Test failed, display the value that failed
      INC  @PASFAL
      BL   @HEXDMP
      JMP  BAIL              * Skip the rest of the flags

CNEXT
      AI   R15,>0400         * Increment the flags
      JNE  CLOOP

BAIL
*      Check if test passed (nothing has been displayed yet)
      MOV  @PASFAL,@PASFAL   * Compare to 0
      JNE  JNEXT             * If there was a failure, do not display 'PASS'

*      If the flags failed, display 'flag', otherwise 'pass'
      LI   R1,TXTPAS
      LI   R2,4              * R0 is already set up
      BL   @VMBW

JNEXT
      AI   R7,32             * Next screen row
      MOV  R7,R0

      INCT R3
      MOV  *R3,R4            * Next instruction
      JEQ  DONE
      B    @ILOOP

DONE   JMP  DONE


JLIST  TEXT 'JEQ '
      JEQ  $+4
      TEXT 'JGT '
      JGT  $+4
      TEXT 'JH  '
      JH   $+4
      TEXT 'JHE '
      JHE  $+4
      TEXT 'JL  '
      JL   $+4
      TEXT 'JLE '
      JLE  $+4
      TEXT 'JLT '
      JLT  $+4
      TEXT 'JMP '
      JMP  $+4
      TEXT 'JNC '
      JNC  $+4
      TEXT 'JNE '
      JNE  $+4
      TEXT 'JNO '
      JNO  $+4
      TEXT 'JOC '
      JOC  $+4
      TEXT 'JOP '
      JOP  $+4
      DATA 0


**
* Display R15 as a hex number at screen location in R0
* Uses R5
*
HEXDMP
      MOV  R11,R5            * Save return address
      BL   @VWAD
      MOV  R5,R11            * Restore return address

      MOV  R15,R5
      ANDI R5,>F000          * Isolate the first digit
      SRL  R5,12             * Convert to a number
      MOVB @HEX(R5),@VDPWD   * Convert to ASCII and write to the screen

      MOV  R15,R5
      ANDI R5,>0F00
      SRL  R5,8
      MOVB @HEX(R5),@VDPWD

      MOV  R15,R5
      ANDI R5,>00F0
      SRL  R5,4
      MOVB @HEX(R5),@VDPWD

      MOV  R15,R5
      ANDI R5,>000F
      MOVB @HEX(R5),@VDPWD

      MOVB @SPACE,@VDPWD
      AI   R0,5              * 4 digits plus a space

      B    *R11
*// HEXDMP


*********************************************************************
*
* VDP Set Write Address
*
* R0   Address to set VDP address counter to
*
VWAD   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set the two MSbits to 01 for write
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
      ANDI R0,>3FFF          * Restore R0 top two MSbits
      B    *R11
*// VWAD / VRAD


*********************************************************************
*
* VDP Set Read Address
*
* R0   Address to set VDP address counter to
*
VRAD   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ANDI R0,>3FFF          * Make sure the two MSbits are 00 for read
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
      B    *R11
*// VRAD


*********************************************************************
*
* VDP Multiple Byte Write
*
* R0   Starting write address in VDP RAM
* R1   Starting read address in CPU RAM
* R2   Number of bytes to send to the VDP RAM
*
* R1 is modified by the value of R2
* R2 is changed to 0
*
VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
      ORI  R0,>4000          * Set the two MSbits to 01 for write
      MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM
      DEC  R2                * Byte counter
      JNE  VMBWLP            * Check if done
      ANDI R0,>3FFF          * Restore R0 top two MSbits
      B    *R11
*// VMBW


*********************************************************************
*
* VDP Write To Register
*
* R0 MSB    VDP register to write to
* R0 LSB    Value to write
*
VWTR   MOVB @R0LB,@VDPWA      * Send low byte (value) to write to VDP register
      ORI  R0,>8000          * Set up a VDP register write operation (10)
      MOVB R0,@VDPWA         * Send high byte (address) of VDP register
      ANDI R0,>3FFF          * Restore R0 top two MSbits
      B    *R11
*// VWTR

      END

Manic1975 · April 21, 2013

Matthew is there something new about in-system update? It would be nice to update my F18A and play all extended basic games.

matthew180 · April 21, 2013

Sorry, no updates yet. :-( I'm spending as much spare time on it as I can.

Edited April 21, 2013 by matthew180

Manic1975 · April 22, 2013

Is there any difference in updating F18A with JTAG cable? What would I need to update F18A this way?

matthew180 · April 22, 2013

A potential work-around for the XB games might be to limit the number of sprites so the CALL COINC(ALL) only acts on the visible sprites. I think there is a way to do that, but I don't remember.

Updating via JTAG is the same way I initially program the firmware. The advantage of using the JTAG cable is that it is fast (about 20 seconds for the update) and it does not matter if something goes wrong, if you lose power during the update, etc, you can just start over. With the "in place" update, if there is a problem and power is lost, the F18A will have to be updated with a JTAG cable.

Look at the first post in this thread for details on how to update via a JTAG cable. I'm not sure about out-side of the U.S. but Digilent sells a JTAG programmer for $50USD, which is pretty cheap for a JTAG cable. You also need the Xilinx tools (as noted), and I suggest getting just the "Lab Tools" (also noted above) since it is a smaller download.

matthew180 · April 26, 2013

Small update, I attached a new f18a_250k.zip file to the first post with a fix for the GPU DIV instruction. If you have a JTAG programming cable, you can update your F18A firmware.

Manic1975 · June 1, 2013

Hello Matthew,

I have buy JTAG programming cable and have updated my F18A firmware. I follow your instruction and eveerything work just fine! Thank you for good instruction and firmware.

Just continue publish new firmware this way.

Asmusr · September 12, 2013

When moving to the ECM (enhanced color modes) for sprites and tiles, the pattern data itself *does* actually select a color from a palette. This is different than the 1-bpp original mode because, for example, in 2-bpp mode a "00" pixel means palette index 0, which will cause a color to be displayed. Maybe. In ECMs there is a setting to specify if a "00" or "000" pixel value means the pixel is transparent, or if the pixel should use the 0-index in the palette. So, just like the original mode for sprites where you have 2-colors but one is always transparent, you can select if you want sprites to have transparent pixels or index colors. When you select to have transparent colors for "0", "00", or "000" pixels, you sacrifice 1 color.

Hi Matthew, I'm reading this again because I'm thinking about using multi-color sprites for my Scramble game (and fall back to monochrome sprites if an F18A is not detected) but I'm not sure I fully understand how it works. Do all sprites share the same palette of 4 or 8 colors, or can each sprite use a different palette?

To provide more pixel data, the extra pattern bits need to come from some where. To keep some sort of compatibility with existing patterns, I chose to implement the extra bits via "bit planes". This allows you to start off with some existing sprite or tile patterns, and expand them to support more colors at a later time in your development. Also, the 3-bpp mode does not pack neatly into a single byte had I tried to use a linear bit-packing method.

I'm not sure what you mean by "This allows you to start off with some existing sprite or tile patterns, and expand them to support more colors at a later time in your development."? It would be nice if you could keep the monochrome sprite pattern table unmodified and just add more planes when you switch to multi-color sprites, but that's not how it works, right? You will also need to replace the first plane/table.

For each byte that makes up a sprite's pattern, there are one or two more bytes in the additional planes that are used to make the final color index for a given pixel. The additional pattern planes are always 2K (and 4K for 3-bpp) bytes offset from the Sprite Pattern Generator Table (SPGT). This means that in 1-bpp the SPGT is the normal 2K, for 2-bpp the SPGT is 4K, and 3-bpp it is 6K.

Could a future version of the firmware include an option only to have 1K or 512 bytes between the planes? This could save a lot of RAM if you only have a few sprite patterns. Having to reserve 6k for sprite patterns would be problematic in many cases.

Thanks,

Rasmus

sometimes99er · September 13, 2013

Could a future version of the firmware include an option only to have 1K or 512 bytes between the planes? This could save a lot of RAM if you only have a few sprite patterns. Having to reserve 6k for sprite patterns would be problematic in many cases.

Don't forget the power over overlapping areas. In principle TI Basic and XB have the PDT in over the SIT, CT, SAL and SDT.

F18A programming, info, and resources

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members