F18A programming, info, and resources

Imperious · May 23, 2015

That Bombjack demo has arcade quality graphics from my TI99/4a!!.

Considering that and the fact that somehow You got SID quality sound coming from my TI in the Light Year demo that I loaded up recently,

the future looks very bright indeed. If this continues the old TI will be getting lots more attention from the retro gaming community.

I wonder if Colecovision and MSX users are doing anything with the F18a?

Edited May 23, 2015 by Imperious

TheMole · May 23, 2015

Very nice!

please ask if you have any questions.

Well, if you insist .

When scrolling horizontally using just one page, does the first/last visible column have to visibly wrap on the screen edges, or is a scroll page one column wider (and/or row higher) than what is visible on the screen?

Asmusr · May 23, 2015

Very nice!

Well, if you insist .

When scrolling horizontally using just one page, does the first/last visible column have to visibly wrap on the screen edges, or is a scroll page one column wider (and/or row higher) than what is visible on the screen?

It is not possible to make the scroll page just one column wider than the visible region. You can either make it one full screen wider or you can use tile layer 2 to mask the edges of a single scroll page.

TheMole · May 23, 2015

It is not possible to make the scroll page just one column wider than the visible region. You can either make it one full screen wider or you can use tile layer 2 to mask the edges of a single scroll page.

That's a shame, it would make scrolling much cheaper as far as VRAM is concerned. A better workaround might be the bitmap layer, that would only eat up 192 bytes of VRAM (in 1bpp color mode). Instead of a second tile layer or a second page (both of which need 726 bytes at the least).

*edit* just noticed that the BML doesn't have a 1bpp mode, but at 384 bytes it's still less than a full name table. But sprites would be even better, you'd need 48 bytes in the sprite attribute table (12 16x16 sprites) and worst case scenario 96 bytes (in ECM3) to define the needed 16x16 pattern in the sprite generator table, totalling 144 bytes. Of course, you'd be left with only 20 sprites then, but depending on the game that might not be a problem.

Not that bad really, now that I think of it

Edited May 23, 2015 by TheMole

Asmusr · May 23, 2015

There is a positive effect of having two full scroll pages: You never have to spend time scrolling the name table, you only have to update one column or row every 8 pixels. For horizontal scrolling this means you only have to update 3 bytes + set a register every frame to obtain full screen smooth scrolling!

Another consideration is that you almost always want a non-scrolling region, so you have to spend VDP RAM on tile layer 2 anyway, so the masking comes with no extra cost.

Opry99er · May 23, 2015

Holy freakin crap....

TheMole · May 24, 2015

There is a positive effect of having two full scroll pages: You never have to spend time scrolling the name table, you only have to update one column or row every 8 pixels. For horizontal scrolling this means you only have to update 3 bytes + set a register every frame to obtain full screen smooth scrolling!

Another consideration is that you almost always want a non-scrolling region, so you have to spend VDP RAM on tile layer 2 anyway, so the masking comes with no extra cost.

That's perfectly achievable with one page as well (in fact, this is how the master system does it). Consider the following pseudo code:

nr_columns = 32;
current_column = 30;

for (position = 0; position < map_width; position++)
{
  // After every 8 pixels of scrolling, we need to start updating the next column
  if (!(position % )
    current_column = (current_column + 1) % nr_columns; // current_column wraps at 32 columns

  tiletable[current_column, ((position %  * 3)    ] = new_tile_1;
  tiletable[current_column, ((position %  * 3) + 1] = new_tile_2;
  tiletable[current_column, ((position %  * 3) + 2] = new_tile_3;

  update_scroll_register_pixel(position % ;
  update_scroll_register_char(current_column);
}

Basically, the column that is being updated each frame is the hidden one, and the first column that is rendered on the left-hand side of the screen is the column after the hidden one.

With regards to non-scrolling regions, I seem the remember the F18A had scroll lock registers, that you could use to keep certain screen areas from scrolling, but I can't seem to find those back in the sheet? I'm probably misremembering though...

Asmusr · May 24, 2015

With regards to non-scrolling regions, I seem the remember the F18A had scroll lock registers, that you could use to keep certain screen areas from scrolling, but I can't seem to find those back in the sheet? I'm probably misremembering though...

Matthew has replaced them by tile layer 2 because they didn't really provide the intended functionality.

matthew180 · May 27, 2015

@Rasmus: Thanks for another awesome demo!

@Mole: I spent a lot of time considering options about where to get incoming scroll data. Using a single 32-byte buffer was considered, but ultimately I borrowed the method used by the NES. I figured if it worked for Nintnedo then it would be good for the F18A too. Also, since people know how the NES does scrolling it would feel familiar to them and maybe help get more people interested in the 99/4A. The incoming data for scrolling has to come from somewhere, and with two pages you can update less frequently. Alternatively, you could set up two pages but only use one column from the second page (although I do realize this is one byte every 32-bytes, which makes using the rest of the page for generic data more difficult.) But, for vertical scrolling this does work out as a single row of 32 consecutive bytes.

But, using two pages lets you scroll left and right more easily since you effectively have a 64x24 (or 64x30) play field with a 32x24 window that you can move on a pixel basis. When it is time to update the name tables for horizontal tile shifting, you can use the GPU's DMA to get the job done in a few microseconds, or take advantage of the F18A's ability to increment the VRAM counter by values other than 1, i.e. like incrementing by 32 to move vertical bytes as efficiently as horizontal bytes.

I only recently learned that the Master System was based on the 9918A, and I might have done things differently had I known.

The scroll lock registers are gone, they proved to be useless in practice and the new TL2 (tile layer 2) gives you way more flexibility.

TheMole · May 27, 2015

@Mole: I spent a lot of time considering options about where to get incoming scroll data. Using a single 32-byte buffer was considered, but ultimately I borrowed the method used by the NES. I figured if it worked for Nintnedo then it would be good for the F18A too. Also, since people know how the NES does scrolling it would feel familiar to them and maybe help get more people interested in the 99/4A. The incoming data for scrolling has to come from somewhere, and with two pages you can update less frequently. Alternatively, you could set up two pages but only use one column from the second page (although I do realize this is one byte every 32-bytes, which makes using the rest of the page for generic data more difficult.) But, for vertical scrolling this does work out as a single row of 32 consecutive bytes.

But, using two pages lets you scroll left and right more easily since you effectively have a 64x24 (or 64x30) play field with a 32x24 window that you can move on a pixel basis. When it is time to update the name tables for horizontal tile shifting, you can use the GPU's DMA to get the job done in a few microseconds, or take advantage of the F18A's ability to increment the VRAM counter by values other than 1, i.e. like incrementing by 32 to move vertical bytes as efficiently as horizontal bytes.

I only recently learned that the Master System was based on the 9918A, and I might have done things differently had I known.

The scroll lock registers are gone, they proved to be useless in practice and the new TL2 (tile layer 2) gives you way more flexibility.

Yeah, the SMS VDP is an elegant extension to the 9918a, but it doesn't have anywhere the features the F18A has (even ignoring the GPU), and like you said: one page scrolling can be approximated in a number of ways so I'm not too worried about that. The scroll lock registers are fine as well, since for most of my scenario's you can just as easily use the scanline interrupt to set the scrolling registers after the locked area has been drawn to screen... (well, if it's a horizontal locked area at least)

Asmusr · May 27, 2015

The scroll lock registers are fine as well, since for most of my scenario's you can just as easily use the scanline interrupt to set the scrolling registers after the locked area has been drawn to screen... (well, if it's a horizontal locked area at least)

Yep. In fact I'm working on a little demo where I'm scrolling each scanline independently.

TheMole · May 27, 2015

Yep. In fact I'm working on a little demo where I'm scrolling each scanline independently.

Cool! Sine-wave "water", or parallax scrolling effect?

Willsy · May 27, 2015

I think I'm having a continence problem!

+OLD CS1 · May 27, 2015

The absolute top feature of the 1.6 firmware is the ability to add a second layer of tiles (aka patterns or characters) on top of the normal Graphics I tile layer. The new layer has its own name table but shares its patterns with tile layer 1. Sprites can be placed anywhere below, above or in between the two tile layer. What's more, each tile layer has its own set of pixel smooth hardware scroll registers. This provides almost unlimited potential for what we could do with a Ti-99/4A.

Check out the attached demo (needs to run on F18A console or JS99er.net). Here I have ripped some graphics from MAME from the arcade game Bomb Jack and converted it to TI format using Magellan.

screenshot.png screenshot (1).png screenshot (2).png

The background is displayed in tile layer 1 while the foreground with the platforms is tile layer 2. You can change between foreground and background layers using keys 1-9 and turn off the foreground using 0. 1-3 are displayed on the same background and so are 4-6 and 7-9.

If you had to make different patterns for each combination of foreground and background the 256 characters would soon run out, but with two tile layers we have no such problems.

Just to show the scrolling capabilities you can also scroll the top layer using the joystick. Imagine what we can do with this: graphical adventures like Leisure suit Larry, platform games like Mario bros, etc.

With the F18A it's really easy, all you need to do is to set a few VDP registers. Look in the register spreadsheet for instructions, and please ask if you have any questions.

Okay, so I clicked the "Like" button, but where is the "holy shit" button?

+OLD CS1 · May 27, 2015

.

Asmusr · May 27, 2015

Cool! Sine-wave "water", or parallax scrolling effect?

I have tried both but the former is far more interesting. This is really funny: Try to run E/A#3 SLSSIN from the attached disk, then run a game (I can recommend Donkey Kong).

Sorry Mario!

Edit: Also works in js99er.net. Run from Software/Apps/XB 2.7 Suite so you can easily run games after the "effect" is enabled (press Space for games menu in XB27 cart).

SCANLINE.dsk

matthew180 · May 27, 2015

Haha, you guys are nuts. :-) I think you do this stuff just because you can. :-)

TheMole · June 3, 2015

You could run an assembler on the F18A, but you would have to write it from scratch probably. I don't think the E/A could be fixed up to run on the GPU.

As for loading GPU code, you assemble using any 99/4A compatible assembler (E/A, asm994a, etc.), but you have to use AORG with a VRAM value from >0000 to >47FE, just like you do for cartridge development (except the cart uses AORG >6000 and is in host CPU RAM, not VDP VRAM). Once you have the opcodes, you have to include them in an E/A loadable assembly program, or load the data from disk at runtime. It gets a little tedious I know, and eventually I hope to have a few tools to help with development.

Using the branch instructions are just like any others, you just use them as you would in any assembly program. With AROG, all the references are resolved at compile time and the code will only run correctly if loaded at the specified address.

I did a lot of this kind of code when testing the F18A. Here is an example of the GPU and host (99/4A) programs I used to validate the GPU's jump instructions.

This is the GPU code, i.e. to be included in the host assembly program and loaded to VRAM for execution by the GPU:

* F18A GPU Test
* Matthew Hagerty
* June 13, 2012
*
* Test jump instructions

       DEF MAIN
       AORG >3F10
MAIN   IDLE
       MOV  @>3F00,@JINST     * Jump opcode to execute at >3F00, >3F01
       CLR  R0
       LI   R14,JINST
       MOV  @>3F02,R15        * Flag value at >3F02, >3F03
       RTWP                   * R14->PC, R15->status flags
JINST  DATA 0                 * Replaced by opcode
       INC  R0                * Only executed if jump falls through
       MOV  R0,@>3F00         * R0 result in >3F00, >3F01
       B    @MAIN
       END

I take the listing and convert the opcodes to DATA statements to be included in the host assembly program:

Asm994a TMS99000 Assembler - v3.010

                * Asm994a Generated Register Equates
                *
      0000 0000 R0      EQU     0 
      0000 0001 R1      EQU     1 
      0000 0002 R2      EQU     2 
      0000 0003 R3      EQU     3 
      0000 0004 R4      EQU     4 
      0000 0005 R5      EQU     5 
      0000 0006 R6      EQU     6 
      0000 0007 R7      EQU     7 
      0000 0008 R8      EQU     8 
      0000 0009 R9      EQU     9 
      0000 000A R10     EQU     10
      0000 000B R11     EQU     11
      0000 000C R12     EQU     12
      0000 000D R13     EQU     13
      0000 000E R14     EQU     14
      0000 000F R15     EQU     15
                *
   1            * F18A GPU Test
   2            * Matthew Hagerty
   3            * June 13, 2012
   4            *
   5            * Test jump instructions
   6            
   7  0000 3F10        DEF MAIN
   8                   AORG >3F10
   9  3F10 0340 MAIN   IDLE
  10  3F12 C820        MOV  @>3F00,@JINST     * Jump opcode to execute at >3F00, >3F01
  10  3F14 3F00  
  10  3F16 3F24  
  11  3F18 04C0        CLR  R0
  12  3F1A 020E        LI   R14,JINST
  12  3F1C 3F24  
  13  3F1E C3E0        MOV  @>3F02,R15        * Flag value at >3F02, >3F03
  13  3F20 3F02  
  14  3F22 0380        RTWP                   * R14->PC, R15->status flags
  15  3F24 0000 JINST  DATA 0                 * Replaced by opcode
  16  3F26 0580        INC  R0                * Only executed if jump falls through
  17  3F28 C800        MOV  R0,@>3F00         * R0 result in >3F00, >3F01
  17  3F2A 3F00  
  18  3F2C 0460        B    @MAIN
  18  3F2E 3F10  
  19  3F30 0000        END
  19            


 Assembly Complete - Errors: 0,  Warnings: 0


 ------ Symbol Listing ------

 JINST  ABS:3F24 JINST
 MAIN   ABS:3F10 MAIN
 R0     ABS:0000 R0
 R1     ABS:0001 R1
 R10    ABS:000A R10
 R11    ABS:000B R11
 R12    ABS:000C R12
 R13    ABS:000D R13
 R14    ABS:000E R14
 R15    ABS:000F R15
 R2     ABS:0002 R2
 R3     ABS:0003 R3
 R4     ABS:0004 R4
 R5     ABS:0005 R5
 R6     ABS:0006 R6
 R7     ABS:0007 R7
 R8     ABS:0008 R8
 R9     ABS:0009 R9

Here is the host-side assembly with the GPU program included as data. It will be copied to the VRAM at the location used in the AORG, which is >3F10 in this case. Once in VRAM, the GPU can execute the code.

* F18A CPU to GPU jump instruction test driver
* Matthew Hagerty
* June 4, 2012
*
* 99/4A driver for the GPU jump instruction test.
* Each jump instruction is executed, then the GPU executes
* the same jump instruction and the jump / no-jump result
* is compared.

       DEF MAIN

* VDP Memory Map
*
VDPRD  EQU  >8800             * VDP read data
VDPSTA EQU  >8802             * VDP status
VDPWD  EQU  >8C00             * VDP write data
VDPWA  EQU  >8C02             * VDP set read/write address

* Workspace
*
WRKSP  EQU  >8300             * Workspace
R0LB   EQU  WRKSP+1           * R0 low byte reqd for VDP routines
R1LB   EQU  WRKSP+3           * R1 low byte
R2LB   EQU  WRKSP+5           * R2 low byte
R3LB   EQU  WRKSP+7           * R3 low byte
R4LB   EQU  WRKSP+9           * R4 low byte
R5LB   EQU  WRKSP+11          * R5 low byte
R6LB   EQU  WRKSP+13          * R6 low byte
* R5  R6  R7  R8  R9  R10 R11 R12 R13 R14 R15
* 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31

SAVR0  DATA 0
PASFAL DATA 0                 * Pass/fail flag
TXTPAS TEXT 'pass'
HEX    TEXT '0123456789ABCDEF'
SPACE  BYTE 32                * space character

       EVEN
GPU
       DATA >0340             * 3F10 0340 MAIN   IDLE
       DATA >C820             * 3F12 C820        MOV  @>3F00,@JINST     * Jump opcode to execute at >3F00, >3F01
       DATA >3F00             * 3F14 3F00
       DATA >3F24             * 3F16 3F24
       DATA >04C0             * 3F18 04C0        CLR  R0
       DATA >020E             * 3F1A 020E        LI   R14,JINST
       DATA >3F24             * 3F1C 3F24
       DATA >C3E0             * 3F1E C3E0        MOV  @>3F02,R15        * Flag value at >3F02, >3F03
       DATA >3F02             * 3F20 3F02
       DATA >0380             * 3F22 0380        RTWP                   * R14->PC, R15->status flags
       DATA >0000             * 3F24 0000 JINST  DATA 0                 * Replaced by opcode
       DATA >0580             * 3F26 0580        INC  R0                * Only executed if jump falls through
       DATA >C800             * 3F28 C800        MOV  R0,@>3F00         * R0 result in >3F00, >3F01
       DATA >3F00             * 3F2A 3F00
       DATA >0460             * 3F2C 0460        B    @MAIN
       DATA >3F10             * 3F2E 3F10
GPUEND

MAIN
       LIMI 0
       LWPI WRKSP

*      F18A blind unlock, no testing for success or failure.
*      Perform the Enhance Register Mode (ERM) unlock sequence
*      for the F18A.
       LI   R0,>391C          * VR1/57, value 00011100
       BL   @VWTR             * Write once
       BL   @VWTR             * Write twice, unlock
       LI   R0,>01E0          * VR1, value 11100000, a real sane setting
       BL   @VWTR             * Write reg

*      Copy GPU code to VRAM
       LI   R0,>3F10
       LI   R1,GPU
       LI   R2,GPUEND-GPU
       BL   @VMBW


       LI   R0,33             * 1,1 screen location to start output
       LI   R7,33             * Screen next row
       LI   R3,JLIST
       LI   R13,WRKSP         * Used in the RTWP instruction and will never change
       LI   R14,JINST

*      Instruction loop
ILOOP
       BL   @VWAD
       MOVB *R3+,@VDPWD       * Display the instruction name
       MOVB *R3+,@VDPWD
       MOVB *R3+,@VDPWD
       MOVB *R3+,@VDPWD
       AI   R0,5              * Adjust past the name and add a space

       MOV  *R3,@JINST        * Copy the jump opcode for execution
       CLR  R15               * Reset the count
       CLR  @PASFAL           * Clear the pass/fail flag

CLOOP  CLR  R5
       RTWP                   * R13->WP, R14->PC, R15->status flags
JINST  DATA 0                 * Replaced with jump instruction opcode
       INC  R5

*      Copy to VRAM for GPU
       MOV  R0,@SAVR0         * Temp save R0
       LI   R0,>3F00
       BL   @VWAD

       MOV  *R3,R0
       MOVB R0,@VDPWD         * Jump opcode >3F00, >3F01
       MOVB @R0LB,@VDPWD

       MOV  R15,R0
       MOVB R0,@VDPWD         * Current flag to >3F02, >3F03
       MOVB @R0LB,@VDPWD

*      Set the GPU PC which also triggers it
       LI   R0,>363F
       BL   @VWTR
       LI   R0,>3712
       BL   @VWTR

*      Compare the result in >3F01
       LI   R0,>3F01
       BL   @VRAD
       MOV  @SAVR0,R0         * Restore R0

       CB   @VDPRD,@R5LB
       JEQ  CNEXT

*      Test failed, display the value that failed
       INC  @PASFAL
       BL   @HEXDMP
       JMP  BAIL              * Skip the rest of the flags

CNEXT
       AI   R15,>0400         * Increment the flags
       JNE  CLOOP

BAIL
*      Check if test passed (nothing has been displayed yet)
       MOV  @PASFAL,@PASFAL   * Compare to 0
       JNE  JNEXT             * If there was a failure, do not display 'PASS'

*      If the flags failed, display 'flag', otherwise 'pass'
       LI   R1,TXTPAS
       LI   R2,4              * R0 is already set up
       BL   @VMBW

JNEXT
       AI   R7,32             * Next screen row
       MOV  R7,R0

       INCT R3
       MOV  *R3,R4            * Next instruction
       JEQ  DONE
       B    @ILOOP

DONE   JMP  DONE


JLIST  TEXT 'JEQ '
       JEQ  $+4
       TEXT 'JGT '
       JGT  $+4
       TEXT 'JH  '
       JH   $+4
       TEXT 'JHE '
       JHE  $+4
       TEXT 'JL  '
       JL   $+4
       TEXT 'JLE '
       JLE  $+4
       TEXT 'JLT '
       JLT  $+4
       TEXT 'JMP '
       JMP  $+4
       TEXT 'JNC '
       JNC  $+4
       TEXT 'JNE '
       JNE  $+4
       TEXT 'JNO '
       JNO  $+4
       TEXT 'JOC '
       JOC  $+4
       TEXT 'JOP '
       JOP  $+4
       DATA 0


**
* Display R15 as a hex number at screen location in R0
* Uses R5
*
HEXDMP
       MOV  R11,R5            * Save return address
       BL   @VWAD
       MOV  R5,R11            * Restore return address

       MOV  R15,R5
       ANDI R5,>F000          * Isolate the first digit
       SRL  R5,12             * Convert to a number
       MOVB @HEX(R5),@VDPWD   * Convert to ASCII and write to the screen

       MOV  R15,R5
       ANDI R5,>0F00
       SRL  R5,8
       MOVB @HEX(R5),@VDPWD

       MOV  R15,R5
       ANDI R5,>00F0
       SRL  R5,4
       MOVB @HEX(R5),@VDPWD

       MOV  R15,R5
       ANDI R5,>000F
       MOVB @HEX(R5),@VDPWD

       MOVB @SPACE,@VDPWD
       AI   R0,5              * 4 digits plus a space

       B    *R11
*// HEXDMP


*********************************************************************
*
* VDP Set Write Address
*
* R0   Address to set VDP address counter to
*
VWAD   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ORI  R0,>4000          * Set the two MSbits to 01 for write
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
       ANDI R0,>3FFF          * Restore R0 top two MSbits
       B    *R11
*// VWAD / VRAD


*********************************************************************
*
* VDP Set Read Address
*
* R0   Address to set VDP address counter to
*
VRAD   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ANDI R0,>3FFF          * Make sure the two MSbits are 00 for read
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
       B    *R11
*// VRAD


*********************************************************************
*
* VDP Multiple Byte Write
*
* R0   Starting write address in VDP RAM
* R1   Starting read address in CPU RAM
* R2   Number of bytes to send to the VDP RAM
*
* R1 is modified by the value of R2
* R2 is changed to 0
*
VMBW   MOVB @R0LB,@VDPWA      * Send low byte of VDP RAM write address
       ORI  R0,>4000          * Set the two MSbits to 01 for write
       MOVB R0,@VDPWA         * Send high byte of VDP RAM write address
VMBWLP MOVB *R1+,@VDPWD       * Write byte to VDP RAM
       DEC  R2                * Byte counter
       JNE  VMBWLP            * Check if done
       ANDI R0,>3FFF          * Restore R0 top two MSbits
       B    *R11
*// VMBW


*********************************************************************
*
* VDP Write To Register
*
* R0 MSB    VDP register to write to
* R0 LSB    Value to write
*
VWTR   MOVB @R0LB,@VDPWA      * Send low byte (value) to write to VDP register
       ORI  R0,>8000          * Set up a VDP register write operation (10)
       MOVB R0,@VDPWA         * Send high byte (address) of VDP register
       ANDI R0,>3FFF          * Restore R0 top two MSbits
       B    *R11
*// VWTR

       END

I wonder if we can let gcc output code that can run on the GPU?

matthew180 · June 3, 2015

Sure, I don't see why not. The F18A's 9900-based GPU has a few differences from the real 9900 CPU though, so GCC needs to avoid the unimplemented instructions.

TheMole · June 4, 2015

This are instructions related to the workspace pointer, XOP and X, right?

matthew180 · June 4, 2015

Copied from the F18A register use spreadsheet:

New or modified instructions for the F18A 9900-based GPU

Inst Opcode Addressing               New name                
CALL >0C80  0000 1100 10Ts SSSS  new CALL     Call subroutine, push return address on stack (stack pointer is R15)
RET  >0C00  0000 1100 0000 0000  new RET      Return from subroutine, pop return address from stack (stack pointer is R15)
PUSH >0D00  0000 1101 00Ts SSSS  new PUSH     Push a 16-bit word onto the stack
POP  >0F00  0000 1111 00Td DDDD  new POP      Pop a 16-bit word off of the stack
SLC  >0E00  0000 1110 00Ts SSSS  new SLC      Shift Left Circular
CKON >03A0                       mod SPI_EN   Sets the chip enable line to the SPI Flash ROM low (enables the ROM)
CKOF >03C0                       mod SPI_DS   Sets the chip enable line to the SPI Flash ROM high (disables the ROM)
IDLE >0340                       mod IDLE     Forces the GPU state machine to the idle state, restart with VR56 trigger
LDCR >3000                       mod SPI_OUT  Writes a byte (always a byte operation) to the SPI Flash ROM
STCR >3400                       mod SPI_IN   Reads a byte (always a byte operation) from the SPI Flash ROM
RTWP >0380                       mod RTWP     Does not use R13, only performs R14->PC, R15->status flags
XOP  >2C00                       mod PIX      New dedicated pixel plotting and addressing instruction

Unimplemented instructions
SBO
SBZ
TB
BLWP
STWP
LWPI
LIMI
RSET
LREX

The GPU does not have a workspace pointer because the registers are real, i.e. built in to the GPU, so instructions like BLWP, STWP, etc. don't apply. There is no CRU either, so the CRU specific instructions are not implemented other than the ones with modified behavior as specified above. The GPU does not have 9900-like vectored interrupts, thus the LIMI instruction is irrelevant.

All the other instructions are implemented and regression tested against the real 9900.

GCC might actually benefit from a real stack.

TheMole · June 4, 2015

GCC might actually benefit from a real stack.

Certainly, but that's way beyond my skillset. I'm trying to coerce Insomnia's port to create binaries that can run on the GPU. So far no luck yet, but I'm sure it should be possible (gcc will never emit LIMI or CRU instructions itself, and a cursory overview of assembly generated for any of my projects reveals no BLWP, STWP, etc... instructions). I'm currently trying to figure out the right crt0 setup and linking requirements.

TheMole · June 12, 2015

For those that are interested, it is perfectly possible to use gcc to create GPU binaries (at least in the few simple cases that I've tested).

Suppose your GPU code resides in a source file called gpucode.c, you can use the following commands to get the pure binary that you can upload to the GPU:

tms9900-gcc -std=c99 -Werror -Wall -c gpucode.c -Os -s -o gpucode.o
tms9900-ld gpucode.o --section-start .text=0x1900 -o gpucode.elf
tms9900-objcopy -O binary gpucode.elf gpucode.bin

The resulting gpucode.bin file is a raw binary dump that can then be uploaded to VRAM at location >1900 (or whatever you defined with the --section-start parameter).

For more complex stuff (such as when using global variables, or the heap), you need to objcopy some other sections as well, but you can easily work around the need for that in most cases by addressing VRAM directly instead of working with variables.

Asmusr · June 13, 2015

For those that are interested, it is perfectly possible to use gcc to create GPU binaries (at least in the few simple cases that I've tested).

Suppose your GPU code resides in a source file called gpucode.c, you can use the following commands to get the pure binary that you can upload to the GPU:
tms9900-gcc -std=c99 -Werror -Wall -c gpucode.c -Os -s -o gpucode.o
tms9900-ld gpucode.o --section-start .text=0x1900 -o gpucode.elf
tms9900-objcopy -O binary gpucode.elf gpucode.bin
The resulting gpucode.bin file is a raw binary dump that can then be uploaded to VRAM at location >1900 (or whatever you defined with the --section-start parameter).

For more complex stuff (such as when using global variables, or the heap), you need to objcopy some other sections as well, but you can easily work around the need for that in most cases by addressing VRAM directly instead of working with variables.

Cool. Perhaps we will see some GPU powered raycasting soon?

TheMole · June 14, 2015

Cool. Perhaps we will see some GPU powered raycasting soon?

Dunno about soon, but it sure is on my list of things to do

F18A programming, info, and resources

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members