Imperious Posted May 23, 2015 Share Posted May 23, 2015 (edited) That Bombjack demo has arcade quality graphics from my TI99/4a!!. Considering that and the fact that somehow You got SID quality sound coming from my TI in the Light Year demo that I loaded up recently, the future looks very bright indeed. If this continues the old TI will be getting lots more attention from the retro gaming community. I wonder if Colecovision and MSX users are doing anything with the F18a? Edited May 23, 2015 by Imperious Quote Link to comment Share on other sites More sharing options...
TheMole Posted May 23, 2015 Share Posted May 23, 2015 Very nice! please ask if you have any questions. Well, if you insist . When scrolling horizontally using just one page, does the first/last visible column have to visibly wrap on the screen edges, or is a scroll page one column wider (and/or row higher) than what is visible on the screen? Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 23, 2015 Share Posted May 23, 2015 Very nice! Well, if you insist . When scrolling horizontally using just one page, does the first/last visible column have to visibly wrap on the screen edges, or is a scroll page one column wider (and/or row higher) than what is visible on the screen? It is not possible to make the scroll page just one column wider than the visible region. You can either make it one full screen wider or you can use tile layer 2 to mask the edges of a single scroll page. Quote Link to comment Share on other sites More sharing options...
TheMole Posted May 23, 2015 Share Posted May 23, 2015 (edited) It is not possible to make the scroll page just one column wider than the visible region. You can either make it one full screen wider or you can use tile layer 2 to mask the edges of a single scroll page. That's a shame, it would make scrolling much cheaper as far as VRAM is concerned. A better workaround might be the bitmap layer, that would only eat up 192 bytes of VRAM (in 1bpp color mode). Instead of a second tile layer or a second page (both of which need 726 bytes at the least). *edit* just noticed that the BML doesn't have a 1bpp mode, but at 384 bytes it's still less than a full name table. But sprites would be even better, you'd need 48 bytes in the sprite attribute table (12 16x16 sprites) and worst case scenario 96 bytes (in ECM3) to define the needed 16x16 pattern in the sprite generator table, totalling 144 bytes. Of course, you'd be left with only 20 sprites then, but depending on the game that might not be a problem. Not that bad really, now that I think of it Edited May 23, 2015 by TheMole Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 23, 2015 Share Posted May 23, 2015 There is a positive effect of having two full scroll pages: You never have to spend time scrolling the name table, you only have to update one column or row every 8 pixels. For horizontal scrolling this means you only have to update 3 bytes + set a register every frame to obtain full screen smooth scrolling! Another consideration is that you almost always want a non-scrolling region, so you have to spend VDP RAM on tile layer 2 anyway, so the masking comes with no extra cost. Quote Link to comment Share on other sites More sharing options...
Opry99er Posted May 23, 2015 Share Posted May 23, 2015 Holy freakin crap.... 1 Quote Link to comment Share on other sites More sharing options...
TheMole Posted May 24, 2015 Share Posted May 24, 2015 There is a positive effect of having two full scroll pages: You never have to spend time scrolling the name table, you only have to update one column or row every 8 pixels. For horizontal scrolling this means you only have to update 3 bytes + set a register every frame to obtain full screen smooth scrolling! Another consideration is that you almost always want a non-scrolling region, so you have to spend VDP RAM on tile layer 2 anyway, so the masking comes with no extra cost. That's perfectly achievable with one page as well (in fact, this is how the master system does it). Consider the following pseudo code: nr_columns = 32; current_column = 30; for (position = 0; position < map_width; position++) { // After every 8 pixels of scrolling, we need to start updating the next column if (!(position % ) current_column = (current_column + 1) % nr_columns; // current_column wraps at 32 columns tiletable[current_column, ((position % * 3) ] = new_tile_1; tiletable[current_column, ((position % * 3) + 1] = new_tile_2; tiletable[current_column, ((position % * 3) + 2] = new_tile_3; update_scroll_register_pixel(position % ; update_scroll_register_char(current_column); } Basically, the column that is being updated each frame is the hidden one, and the first column that is rendered on the left-hand side of the screen is the column after the hidden one. With regards to non-scrolling regions, I seem the remember the F18A had scroll lock registers, that you could use to keep certain screen areas from scrolling, but I can't seem to find those back in the sheet? I'm probably misremembering though... Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 24, 2015 Share Posted May 24, 2015 With regards to non-scrolling regions, I seem the remember the F18A had scroll lock registers, that you could use to keep certain screen areas from scrolling, but I can't seem to find those back in the sheet? I'm probably misremembering though... Matthew has replaced them by tile layer 2 because they didn't really provide the intended functionality. Quote Link to comment Share on other sites More sharing options...
matthew180 Posted May 27, 2015 Author Share Posted May 27, 2015 @Rasmus: Thanks for another awesome demo! @Mole: I spent a lot of time considering options about where to get incoming scroll data. Using a single 32-byte buffer was considered, but ultimately I borrowed the method used by the NES. I figured if it worked for Nintnedo then it would be good for the F18A too. Also, since people know how the NES does scrolling it would feel familiar to them and maybe help get more people interested in the 99/4A. The incoming data for scrolling has to come from somewhere, and with two pages you can update less frequently. Alternatively, you could set up two pages but only use one column from the second page (although I do realize this is one byte every 32-bytes, which makes using the rest of the page for generic data more difficult.) But, for vertical scrolling this does work out as a single row of 32 consecutive bytes. But, using two pages lets you scroll left and right more easily since you effectively have a 64x24 (or 64x30) play field with a 32x24 window that you can move on a pixel basis. When it is time to update the name tables for horizontal tile shifting, you can use the GPU's DMA to get the job done in a few microseconds, or take advantage of the F18A's ability to increment the VRAM counter by values other than 1, i.e. like incrementing by 32 to move vertical bytes as efficiently as horizontal bytes. I only recently learned that the Master System was based on the 9918A, and I might have done things differently had I known. The scroll lock registers are gone, they proved to be useless in practice and the new TL2 (tile layer 2) gives you way more flexibility. Quote Link to comment Share on other sites More sharing options...
TheMole Posted May 27, 2015 Share Posted May 27, 2015 @Mole: I spent a lot of time considering options about where to get incoming scroll data. Using a single 32-byte buffer was considered, but ultimately I borrowed the method used by the NES. I figured if it worked for Nintnedo then it would be good for the F18A too. Also, since people know how the NES does scrolling it would feel familiar to them and maybe help get more people interested in the 99/4A. The incoming data for scrolling has to come from somewhere, and with two pages you can update less frequently. Alternatively, you could set up two pages but only use one column from the second page (although I do realize this is one byte every 32-bytes, which makes using the rest of the page for generic data more difficult.) But, for vertical scrolling this does work out as a single row of 32 consecutive bytes. But, using two pages lets you scroll left and right more easily since you effectively have a 64x24 (or 64x30) play field with a 32x24 window that you can move on a pixel basis. When it is time to update the name tables for horizontal tile shifting, you can use the GPU's DMA to get the job done in a few microseconds, or take advantage of the F18A's ability to increment the VRAM counter by values other than 1, i.e. like incrementing by 32 to move vertical bytes as efficiently as horizontal bytes. I only recently learned that the Master System was based on the 9918A, and I might have done things differently had I known. The scroll lock registers are gone, they proved to be useless in practice and the new TL2 (tile layer 2) gives you way more flexibility. Yeah, the SMS VDP is an elegant extension to the 9918a, but it doesn't have anywhere the features the F18A has (even ignoring the GPU), and like you said: one page scrolling can be approximated in a number of ways so I'm not too worried about that. The scroll lock registers are fine as well, since for most of my scenario's you can just as easily use the scanline interrupt to set the scrolling registers after the locked area has been drawn to screen... (well, if it's a horizontal locked area at least) Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 27, 2015 Share Posted May 27, 2015 The scroll lock registers are fine as well, since for most of my scenario's you can just as easily use the scanline interrupt to set the scrolling registers after the locked area has been drawn to screen... (well, if it's a horizontal locked area at least) Yep. In fact I'm working on a little demo where I'm scrolling each scanline independently. Quote Link to comment Share on other sites More sharing options...
TheMole Posted May 27, 2015 Share Posted May 27, 2015 Yep. In fact I'm working on a little demo where I'm scrolling each scanline independently. Cool! Sine-wave "water", or parallax scrolling effect? Quote Link to comment Share on other sites More sharing options...
Willsy Posted May 27, 2015 Share Posted May 27, 2015 I think I'm having a continence problem! Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted May 27, 2015 Share Posted May 27, 2015 The absolute top feature of the 1.6 firmware is the ability to add a second layer of tiles (aka patterns or characters) on top of the normal Graphics I tile layer. The new layer has its own name table but shares its patterns with tile layer 1. Sprites can be placed anywhere below, above or in between the two tile layer. What's more, each tile layer has its own set of pixel smooth hardware scroll registers. This provides almost unlimited potential for what we could do with a Ti-99/4A. Check out the attached demo (needs to run on F18A console or JS99er.net). Here I have ripped some graphics from MAME from the arcade game Bomb Jack and converted it to TI format using Magellan. screenshot.pngscreenshot (1).pngscreenshot (2).png The background is displayed in tile layer 1 while the foreground with the platforms is tile layer 2. You can change between foreground and background layers using keys 1-9 and turn off the foreground using 0. 1-3 are displayed on the same background and so are 4-6 and 7-9. If you had to make different patterns for each combination of foreground and background the 256 characters would soon run out, but with two tile layers we have no such problems. Just to show the scrolling capabilities you can also scroll the top layer using the joystick. Imagine what we can do with this: graphical adventures like Leisure suit Larry, platform games like Mario bros, etc. With the F18A it's really easy, all you need to do is to set a few VDP registers. Look in the register spreadsheet for instructions, and please ask if you have any questions. Okay, so I clicked the "Like" button, but where is the "holy shit" button? 3 Quote Link to comment Share on other sites More sharing options...
+OLD CS1 Posted May 27, 2015 Share Posted May 27, 2015 . Quote Link to comment Share on other sites More sharing options...
Asmusr Posted May 27, 2015 Share Posted May 27, 2015 Cool! Sine-wave "water", or parallax scrolling effect? I have tried both but the former is far more interesting. This is really funny: Try to run E/A#3 SLSSIN from the attached disk, then run a game (I can recommend Donkey Kong). Sorry Mario! Edit: Also works in js99er.net. Run from Software/Apps/XB 2.7 Suite so you can easily run games after the "effect" is enabled (press Space for games menu in XB27 cart). SCANLINE.dsk 7 Quote Link to comment Share on other sites More sharing options...
matthew180 Posted May 27, 2015 Author Share Posted May 27, 2015 Haha, you guys are nuts. :-) I think you do this stuff just because you can. :-) 5 Quote Link to comment Share on other sites More sharing options...
TheMole Posted June 3, 2015 Share Posted June 3, 2015 You could run an assembler on the F18A, but you would have to write it from scratch probably. I don't think the E/A could be fixed up to run on the GPU. As for loading GPU code, you assemble using any 99/4A compatible assembler (E/A, asm994a, etc.), but you have to use AORG with a VRAM value from >0000 to >47FE, just like you do for cartridge development (except the cart uses AORG >6000 and is in host CPU RAM, not VDP VRAM). Once you have the opcodes, you have to include them in an E/A loadable assembly program, or load the data from disk at runtime. It gets a little tedious I know, and eventually I hope to have a few tools to help with development. Using the branch instructions are just like any others, you just use them as you would in any assembly program. With AROG, all the references are resolved at compile time and the code will only run correctly if loaded at the specified address. I did a lot of this kind of code when testing the F18A. Here is an example of the GPU and host (99/4A) programs I used to validate the GPU's jump instructions. This is the GPU code, i.e. to be included in the host assembly program and loaded to VRAM for execution by the GPU: * F18A GPU Test * Matthew Hagerty * June 13, 2012 * * Test jump instructions DEF MAIN AORG >3F10 MAIN IDLE MOV @>3F00,@JINST * Jump opcode to execute at >3F00, >3F01 CLR R0 LI R14,JINST MOV @>3F02,R15 * Flag value at >3F02, >3F03 RTWP * R14->PC, R15->status flags JINST DATA 0 * Replaced by opcode INC R0 * Only executed if jump falls through MOV R0,@>3F00 * R0 result in >3F00, >3F01 B @MAIN END I take the listing and convert the opcodes to DATA statements to be included in the host assembly program: Asm994a TMS99000 Assembler - v3.010 * Asm994a Generated Register Equates * 0000 0000 R0 EQU 0 0000 0001 R1 EQU 1 0000 0002 R2 EQU 2 0000 0003 R3 EQU 3 0000 0004 R4 EQU 4 0000 0005 R5 EQU 5 0000 0006 R6 EQU 6 0000 0007 R7 EQU 7 0000 0008 R8 EQU 8 0000 0009 R9 EQU 9 0000 000A R10 EQU 10 0000 000B R11 EQU 11 0000 000C R12 EQU 12 0000 000D R13 EQU 13 0000 000E R14 EQU 14 0000 000F R15 EQU 15 * 1 * F18A GPU Test 2 * Matthew Hagerty 3 * June 13, 2012 4 * 5 * Test jump instructions 6 7 0000 3F10 DEF MAIN 8 AORG >3F10 9 3F10 0340 MAIN IDLE 10 3F12 C820 MOV @>3F00,@JINST * Jump opcode to execute at >3F00, >3F01 10 3F14 3F00 10 3F16 3F24 11 3F18 04C0 CLR R0 12 3F1A 020E LI R14,JINST 12 3F1C 3F24 13 3F1E C3E0 MOV @>3F02,R15 * Flag value at >3F02, >3F03 13 3F20 3F02 14 3F22 0380 RTWP * R14->PC, R15->status flags 15 3F24 0000 JINST DATA 0 * Replaced by opcode 16 3F26 0580 INC R0 * Only executed if jump falls through 17 3F28 C800 MOV R0,@>3F00 * R0 result in >3F00, >3F01 17 3F2A 3F00 18 3F2C 0460 B @MAIN 18 3F2E 3F10 19 3F30 0000 END 19 Assembly Complete - Errors: 0, Warnings: 0 ------ Symbol Listing ------ JINST ABS:3F24 JINST MAIN ABS:3F10 MAIN R0 ABS:0000 R0 R1 ABS:0001 R1 R10 ABS:000A R10 R11 ABS:000B R11 R12 ABS:000C R12 R13 ABS:000D R13 R14 ABS:000E R14 R15 ABS:000F R15 R2 ABS:0002 R2 R3 ABS:0003 R3 R4 ABS:0004 R4 R5 ABS:0005 R5 R6 ABS:0006 R6 R7 ABS:0007 R7 R8 ABS:0008 R8 R9 ABS:0009 R9 Here is the host-side assembly with the GPU program included as data. It will be copied to the VRAM at the location used in the AORG, which is >3F10 in this case. Once in VRAM, the GPU can execute the code. * F18A CPU to GPU jump instruction test driver * Matthew Hagerty * June 4, 2012 * * 99/4A driver for the GPU jump instruction test. * Each jump instruction is executed, then the GPU executes * the same jump instruction and the jump / no-jump result * is compared. DEF MAIN * VDP Memory Map * VDPRD EQU >8800 * VDP read data VDPSTA EQU >8802 * VDP status VDPWD EQU >8C00 * VDP write data VDPWA EQU >8C02 * VDP set read/write address * Workspace * WRKSP EQU >8300 * Workspace R0LB EQU WRKSP+1 * R0 low byte reqd for VDP routines R1LB EQU WRKSP+3 * R1 low byte R2LB EQU WRKSP+5 * R2 low byte R3LB EQU WRKSP+7 * R3 low byte R4LB EQU WRKSP+9 * R4 low byte R5LB EQU WRKSP+11 * R5 low byte R6LB EQU WRKSP+13 * R6 low byte * R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 * 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 SAVR0 DATA 0 PASFAL DATA 0 * Pass/fail flag TXTPAS TEXT 'pass' HEX TEXT '0123456789ABCDEF' SPACE BYTE 32 * space character EVEN GPU DATA >0340 * 3F10 0340 MAIN IDLE DATA >C820 * 3F12 C820 MOV @>3F00,@JINST * Jump opcode to execute at >3F00, >3F01 DATA >3F00 * 3F14 3F00 DATA >3F24 * 3F16 3F24 DATA >04C0 * 3F18 04C0 CLR R0 DATA >020E * 3F1A 020E LI R14,JINST DATA >3F24 * 3F1C 3F24 DATA >C3E0 * 3F1E C3E0 MOV @>3F02,R15 * Flag value at >3F02, >3F03 DATA >3F02 * 3F20 3F02 DATA >0380 * 3F22 0380 RTWP * R14->PC, R15->status flags DATA >0000 * 3F24 0000 JINST DATA 0 * Replaced by opcode DATA >0580 * 3F26 0580 INC R0 * Only executed if jump falls through DATA >C800 * 3F28 C800 MOV R0,@>3F00 * R0 result in >3F00, >3F01 DATA >3F00 * 3F2A 3F00 DATA >0460 * 3F2C 0460 B @MAIN DATA >3F10 * 3F2E 3F10 GPUEND MAIN LIMI 0 LWPI WRKSP * F18A blind unlock, no testing for success or failure. * Perform the Enhance Register Mode (ERM) unlock sequence * for the F18A. LI R0,>391C * VR1/57, value 00011100 BL @VWTR * Write once BL @VWTR * Write twice, unlock LI R0,>01E0 * VR1, value 11100000, a real sane setting BL @VWTR * Write reg * Copy GPU code to VRAM LI R0,>3F10 LI R1,GPU LI R2,GPUEND-GPU BL @VMBW LI R0,33 * 1,1 screen location to start output LI R7,33 * Screen next row LI R3,JLIST LI R13,WRKSP * Used in the RTWP instruction and will never change LI R14,JINST * Instruction loop ILOOP BL @VWAD MOVB *R3+,@VDPWD * Display the instruction name MOVB *R3+,@VDPWD MOVB *R3+,@VDPWD MOVB *R3+,@VDPWD AI R0,5 * Adjust past the name and add a space MOV *R3,@JINST * Copy the jump opcode for execution CLR R15 * Reset the count CLR @PASFAL * Clear the pass/fail flag CLOOP CLR R5 RTWP * R13->WP, R14->PC, R15->status flags JINST DATA 0 * Replaced with jump instruction opcode INC R5 * Copy to VRAM for GPU MOV R0,@SAVR0 * Temp save R0 LI R0,>3F00 BL @VWAD MOV *R3,R0 MOVB R0,@VDPWD * Jump opcode >3F00, >3F01 MOVB @R0LB,@VDPWD MOV R15,R0 MOVB R0,@VDPWD * Current flag to >3F02, >3F03 MOVB @R0LB,@VDPWD * Set the GPU PC which also triggers it LI R0,>363F BL @VWTR LI R0,>3712 BL @VWTR * Compare the result in >3F01 LI R0,>3F01 BL @VRAD MOV @SAVR0,R0 * Restore R0 CB @VDPRD,@R5LB JEQ CNEXT * Test failed, display the value that failed INC @PASFAL BL @HEXDMP JMP BAIL * Skip the rest of the flags CNEXT AI R15,>0400 * Increment the flags JNE CLOOP BAIL * Check if test passed (nothing has been displayed yet) MOV @PASFAL,@PASFAL * Compare to 0 JNE JNEXT * If there was a failure, do not display 'PASS' * If the flags failed, display 'flag', otherwise 'pass' LI R1,TXTPAS LI R2,4 * R0 is already set up BL @VMBW JNEXT AI R7,32 * Next screen row MOV R7,R0 INCT R3 MOV *R3,R4 * Next instruction JEQ DONE B @ILOOP DONE JMP DONE JLIST TEXT 'JEQ ' JEQ $+4 TEXT 'JGT ' JGT $+4 TEXT 'JH ' JH $+4 TEXT 'JHE ' JHE $+4 TEXT 'JL ' JL $+4 TEXT 'JLE ' JLE $+4 TEXT 'JLT ' JLT $+4 TEXT 'JMP ' JMP $+4 TEXT 'JNC ' JNC $+4 TEXT 'JNE ' JNE $+4 TEXT 'JNO ' JNO $+4 TEXT 'JOC ' JOC $+4 TEXT 'JOP ' JOP $+4 DATA 0 ** * Display R15 as a hex number at screen location in R0 * Uses R5 * HEXDMP MOV R11,R5 * Save return address BL @VWAD MOV R5,R11 * Restore return address MOV R15,R5 ANDI R5,>F000 * Isolate the first digit SRL R5,12 * Convert to a number MOVB @HEX(R5),@VDPWD * Convert to ASCII and write to the screen MOV R15,R5 ANDI R5,>0F00 SRL R5,8 MOVB @HEX(R5),@VDPWD MOV R15,R5 ANDI R5,>00F0 SRL R5,4 MOVB @HEX(R5),@VDPWD MOV R15,R5 ANDI R5,>000F MOVB @HEX(R5),@VDPWD MOVB @SPACE,@VDPWD AI R0,5 * 4 digits plus a space B *R11 *// HEXDMP ********************************************************************* * * VDP Set Write Address * * R0 Address to set VDP address counter to * VWAD MOVB @R0LB,@VDPWA * Send low byte of VDP RAM write address ORI R0,>4000 * Set the two MSbits to 01 for write MOVB R0,@VDPWA * Send high byte of VDP RAM write address ANDI R0,>3FFF * Restore R0 top two MSbits B *R11 *// VWAD / VRAD ********************************************************************* * * VDP Set Read Address * * R0 Address to set VDP address counter to * VRAD MOVB @R0LB,@VDPWA * Send low byte of VDP RAM write address ANDI R0,>3FFF * Make sure the two MSbits are 00 for read MOVB R0,@VDPWA * Send high byte of VDP RAM write address B *R11 *// VRAD ********************************************************************* * * VDP Multiple Byte Write * * R0 Starting write address in VDP RAM * R1 Starting read address in CPU RAM * R2 Number of bytes to send to the VDP RAM * * R1 is modified by the value of R2 * R2 is changed to 0 * VMBW MOVB @R0LB,@VDPWA * Send low byte of VDP RAM write address ORI R0,>4000 * Set the two MSbits to 01 for write MOVB R0,@VDPWA * Send high byte of VDP RAM write address VMBWLP MOVB *R1+,@VDPWD * Write byte to VDP RAM DEC R2 * Byte counter JNE VMBWLP * Check if done ANDI R0,>3FFF * Restore R0 top two MSbits B *R11 *// VMBW ********************************************************************* * * VDP Write To Register * * R0 MSB VDP register to write to * R0 LSB Value to write * VWTR MOVB @R0LB,@VDPWA * Send low byte (value) to write to VDP register ORI R0,>8000 * Set up a VDP register write operation (10) MOVB R0,@VDPWA * Send high byte (address) of VDP register ANDI R0,>3FFF * Restore R0 top two MSbits B *R11 *// VWTR END I wonder if we can let gcc output code that can run on the GPU? Quote Link to comment Share on other sites More sharing options...
matthew180 Posted June 3, 2015 Author Share Posted June 3, 2015 Sure, I don't see why not. The F18A's 9900-based GPU has a few differences from the real 9900 CPU though, so GCC needs to avoid the unimplemented instructions. Quote Link to comment Share on other sites More sharing options...
TheMole Posted June 4, 2015 Share Posted June 4, 2015 This are instructions related to the workspace pointer, XOP and X, right? Quote Link to comment Share on other sites More sharing options...
matthew180 Posted June 4, 2015 Author Share Posted June 4, 2015 Copied from the F18A register use spreadsheet: New or modified instructions for the F18A 9900-based GPU Inst Opcode Addressing New name CALL >0C80 0000 1100 10Ts SSSS new CALL Call subroutine, push return address on stack (stack pointer is R15) RET >0C00 0000 1100 0000 0000 new RET Return from subroutine, pop return address from stack (stack pointer is R15) PUSH >0D00 0000 1101 00Ts SSSS new PUSH Push a 16-bit word onto the stack POP >0F00 0000 1111 00Td DDDD new POP Pop a 16-bit word off of the stack SLC >0E00 0000 1110 00Ts SSSS new SLC Shift Left Circular CKON >03A0 mod SPI_EN Sets the chip enable line to the SPI Flash ROM low (enables the ROM) CKOF >03C0 mod SPI_DS Sets the chip enable line to the SPI Flash ROM high (disables the ROM) IDLE >0340 mod IDLE Forces the GPU state machine to the idle state, restart with VR56 trigger LDCR >3000 mod SPI_OUT Writes a byte (always a byte operation) to the SPI Flash ROM STCR >3400 mod SPI_IN Reads a byte (always a byte operation) from the SPI Flash ROM RTWP >0380 mod RTWP Does not use R13, only performs R14->PC, R15->status flags XOP >2C00 mod PIX New dedicated pixel plotting and addressing instruction Unimplemented instructions SBO SBZ TB BLWP STWP LWPI LIMI RSET LREX The GPU does not have a workspace pointer because the registers are real, i.e. built in to the GPU, so instructions like BLWP, STWP, etc. don't apply. There is no CRU either, so the CRU specific instructions are not implemented other than the ones with modified behavior as specified above. The GPU does not have 9900-like vectored interrupts, thus the LIMI instruction is irrelevant. All the other instructions are implemented and regression tested against the real 9900. GCC might actually benefit from a real stack. 1 Quote Link to comment Share on other sites More sharing options...
TheMole Posted June 4, 2015 Share Posted June 4, 2015 GCC might actually benefit from a real stack. Certainly, but that's way beyond my skillset. I'm trying to coerce Insomnia's port to create binaries that can run on the GPU. So far no luck yet, but I'm sure it should be possible (gcc will never emit LIMI or CRU instructions itself, and a cursory overview of assembly generated for any of my projects reveals no BLWP, STWP, etc... instructions). I'm currently trying to figure out the right crt0 setup and linking requirements. Quote Link to comment Share on other sites More sharing options...
TheMole Posted June 12, 2015 Share Posted June 12, 2015 For those that are interested, it is perfectly possible to use gcc to create GPU binaries (at least in the few simple cases that I've tested). Suppose your GPU code resides in a source file called gpucode.c, you can use the following commands to get the pure binary that you can upload to the GPU: tms9900-gcc -std=c99 -Werror -Wall -c gpucode.c -Os -s -o gpucode.o tms9900-ld gpucode.o --section-start .text=0x1900 -o gpucode.elf tms9900-objcopy -O binary gpucode.elf gpucode.bin The resulting gpucode.bin file is a raw binary dump that can then be uploaded to VRAM at location >1900 (or whatever you defined with the --section-start parameter). For more complex stuff (such as when using global variables, or the heap), you need to objcopy some other sections as well, but you can easily work around the need for that in most cases by addressing VRAM directly instead of working with variables. 1 Quote Link to comment Share on other sites More sharing options...
Asmusr Posted June 13, 2015 Share Posted June 13, 2015 For those that are interested, it is perfectly possible to use gcc to create GPU binaries (at least in the few simple cases that I've tested). Suppose your GPU code resides in a source file called gpucode.c, you can use the following commands to get the pure binary that you can upload to the GPU: tms9900-gcc -std=c99 -Werror -Wall -c gpucode.c -Os -s -o gpucode.o tms9900-ld gpucode.o --section-start .text=0x1900 -o gpucode.elf tms9900-objcopy -O binary gpucode.elf gpucode.bin The resulting gpucode.bin file is a raw binary dump that can then be uploaded to VRAM at location >1900 (or whatever you defined with the --section-start parameter). For more complex stuff (such as when using global variables, or the heap), you need to objcopy some other sections as well, but you can easily work around the need for that in most cases by addressing VRAM directly instead of working with variables. Cool. Perhaps we will see some GPU powered raycasting soon? Quote Link to comment Share on other sites More sharing options...
TheMole Posted June 14, 2015 Share Posted June 14, 2015 Cool. Perhaps we will see some GPU powered raycasting soon? Dunno about soon, but it sure is on my list of things to do Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.