moulinaie Posted July 30, 2012 Share Posted July 30, 2012 (edited) Hello, I've modified MLC to compile programs for the GPU that can be found inside the marvellous F18A. Here is my very first test of it. The GPU fills 26 times the screen starting with A and on to Z. So 26 times the whole screen, this is close to 20.000 bytes written. On the video you can see that if I run this once, you only see the last "Z" because it's too fast. And if I run it 10.000 times.... just look to see how much time is needed for those 200MB written: http://www.youtube.com/watch?v=4E3sccArZRw Here is the source code for the program: $MLC F 100 10 3000 300 CALL CLEAR 310 INPUT "Number of loops:":N 320 CALL LINK("TEST",N)::END ; ; here the MLC program that will run "n" times the GPU one ; $TEST GETPARAM 1 N ; number of loops GPURUN &h4000 ; first run of GPU routine with address DO REPEAT ; little loop to wait for the GPU to finish GPUSTATE UNTIL= ; state will be "=" when GPU has finished DEC N ; decrement my counter WHILE<> GPUWAKE ; if not finished, wakes up the GPU for another display !! (SEE NOTE BELOW) LOOP KEYWAIT 0 ; wait for a key $$ ; ; here the GPU program in assembly code ; the compiler sends it to VDP location >4000, the new F18A area ; this example fills the screen with the capital letters from A to Z ; $GPU li r0,26 ; 26 letters from A to Z li r1,>A1A1 ; start with double >A1 = "A" in XB ; letter loop clr r2 ; screen address li r3,384 ; 384 words = 768 bytes = 24*32 positions ; fill loop mov r1,*r2+ ; write two letters on screen dec r3 ; decrement counter jne -3 ; if not finished, jump to fill loop ai r1,>0101 ; next letter (>A2A2, >A3A3 etc...) dec r0 ; decrement counter jne -10 ; if not finished, jump to letter loop idle ; back to idle state $$ $END NOTE: Matthew, I have a problem with the wake function. If I use it then, the GPU program can only be run 4 or 5 times and then it doesn't work correctly. If I replace GPUWAKE (writting 1 to register 56) with GPURUN &h4000 then everything is fine. I may have done something wrong, but it appears that the GPU is more stable if the routine address is reset each time. Guillaume. Edited July 30, 2012 by moulinaie 3 Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/ Share on other sites More sharing options...
matthew180 Posted July 30, 2012 Share Posted July 30, 2012 When you write a '1' to VR56 ("trigger only") the GPU runs from the *current* PC (program counter) location, *not* the location specified in VR54 and VR55. Those are only used to "set and trigger" the GPU. Your routine appears to end with IDLE, which means the PC is pointing to the code in memory immediately following the IDLE instruction, and that is where execution will begin if you just write a '1' to VR56. I suggest you make IDLE the first instruction in your GPU routine and set up the PC with a write to VR54 and VR55. That will trigger the GPU, but the first instruction is IDLE so it goes to sleep. Then you can trigger your routine with a single write to VR56. Your routine runs, then at the end jumps back to the IDLE at the top of the routine. That way when you trigger again the PC is at the correct location, i.e. the top of your routine. DEF MAIN AORG >3000 MAIN IDLE . . Do something in the routine when triggered with a write . of 1 to VR56. . * Routine is done, go back to IDLE and wait to be triggered again. B @MAIN Also, if your GPU routine is going to take a while, an efficient way to wait for it on the host-side (i.e. 99/4A assembly) is: VDPRD EQU >8800 * VDP read data VDPSTA EQU >8802 * VDP status VDPWD EQU >8C00 * VDP write data VDPWA EQU >8C02 * VDP set read/write address . . . LI R0,>0F02 * Set the status port to read SR2 BL @VWTR GWAIT MOVB @VDPSTA,R1 JLT GWAIT * MSbit is '1' while GPU is running, makes the byte a negative value ANDI R1,>FF00 * Mask the GPU's return status data (7-bits in the MSB) LI R0,>0F00 * Set status port to read SR0 BL @VWTR . . . ********************************************************************* * * VDP Write To Register * * R0 MSB VDP register to write to * R0 LSB Value to write * VWTR MOVB @R0LB,@VDPWA * Send low byte (value) to write to VDP register ORI R0,>8000 * Set up a VDP register write operation (10) MOVB R0,@VDPWA * Send high byte (address) of VDP register ANDI R0,>3FFF * Restore R0 top two MSbits B *R11 *// VWTR Of course you must have interrupts disabled in the loop since you are changing the status register to read from the default SR0 to SR2. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2568612 Share on other sites More sharing options...
moulinaie Posted July 31, 2012 Author Share Posted July 31, 2012 I suggest you make IDLE the first instruction in your GPU routine and set up the PC with a write to VR54 and VR55. That will trigger the GPU, but the first instruction is IDLE so it goes to sleep. Then you can trigger your routine with a single write to VR56. Your routine runs, then at the end jumps back to the IDLE at the top of the routine. That way when you trigger again the PC is at the correct location, i.e. the top of your routine. Okay, that's "elegant" ! I like it. Something else, I'm working on the BML support. I'm trying to figure out what's happening when the width of the bitmap is not on a byte boundary. I created a 32x32 bml, so 32 pixels * 2bits = 8 bytes per line. Then I created a 30x32 bml, so 30 pixels * 2bits = 7,5 bytes per line.... If I'm right, this uses 8 bytes per line, but only 7 bytes are used for display, so I only get a BML of 28x32 pixels. Then, that's funny... if I create a 28x32 bml, this is 7 bytes per line, it uses exactly 7 bytes per line but it looks like only 27 pixels are displayed, the last one misses on each line... So here is my question: are there limitations in the width and height of a BML (and also in the X and Y position) ? Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2569213 Share on other sites More sharing options...
matthew180 Posted July 31, 2012 Share Posted July 31, 2012 (edited) The boundaries are always on bytes. Here is the formula: byte = (y * ((w + 3) / 4)) + (x / 4); pix = x & 0x03; or, more broken out: wmul = (w + 3) >> 2 (the number "3" is because w + 3 == w - 1 + 4) byte = (y * wmul) + (x >> 2) pixel-pair index in byte = x & 0x03 So, a 10x4 BML will require 3-bytes per line (the last 4-bits in the 3rd byte will not be used): . . 0 1 2 3 4 5 6 7 8 9 . . . x coord ------------------------------------ 01 23 45 67|01 23 45 67|01 23 45 67 byte bit numbers ------------------------------------- 0 |P0 P1 P2 P3|P0 P1 P2 P3|P0 P1|xx xx| 1 |P0 P1 P2 P3|P0 P1 P2 P3|P0 P1|xx xx| 2 |P0 P1 P2 P3|P0 P1 P2 P3|P0 P1|xx xx| 3 |P0 P1 P2 P3|P0 P1 P2 P3|P0 P1|xx xx| So, lets try the formula for a pixel at xy=6,2 which is byte offset 7 from the base (count bytes starting from 0), and the pixel index of 2 (bits 4 and 5) in that byte. The pixel-pair indexes are: P0 (bits 0 and 1) P1 (bits 2 and 3) P2 (bits 4 and 5) P3 (bits 6 and 7) w = 10 h = 4 (not used) x = 6 y = 2 wmul = (w + 3) >> 2 wmul = (10 + 3) / 4 wmul = 13 / 4 wmul = 3 (this is all integer math, so you lose the fractional part) byte = (y * wmul) + (x >> 2) byte = (2 * 3) + (6 / 4) byte = 6 + 1 byte = 7 pixel-pair index in byte = x & 0x03 pixel-pair = 6 AND 3 pixel-pair = "0110" AND "0011" (binary) pixel-pair = "0010" (binary) pixel-pair = 2 So, starting from the base address, add 7 to get the pixel's byte, then use pixel-pair 2 (bits 4 and 5). The core execution part of the PIX instruction does all these calculations in a single clock, or 10ns. But for calculating the amount of memory your BML will take, always divide your width by 4 and round up Or add 3 to your width, then divide by 4 (which is what the formula above does). That is the number of bytes per line, then multiply by the height. There is no limitation on the x,y location of the BML, and there is no limit on the width or height, other than 0 to 255, since each register is a byte. So, 28x30 BML would require: w = 28 w + 3 / 4 31 / 4 = 7 bytes per line (4 pixels per byte * 7 bytes = 28 pixels) x 0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7 |byte 0 |byte 1 |byte 2 |byte 3 |byte 4 |byte 5 |byte 6 | And to continue past a boundary: w = 29 w + 3 / 4 32 / 4 = 8 bytes x 0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7|8 x x x |byte 0 |byte 1 |byte 2 |byte 3 |byte 4 |byte 5 |byte 6 |byte 7 | w = 30 w + 3 / 4 33 / 4 = 8 bytes x 0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7|8 9 x x |byte 0 |byte 1 |byte 2 |byte 3 |byte 4 |byte 5 |byte 6 |byte 7 | w = 31 w + 3 / 4 34 / 4 = 8 bytes x 0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7|8 9 0 x |byte 0 |byte 1 |byte 2 |byte 3 |byte 4 |byte 5 |byte 6 |byte 7 | w = 32 w + 3 / 4 35 / 4 = 8 bytes x 0 1 2 3 4 5 6 7|8 9 0 1 2 3 4 5|6 7 8 9 0 1 2 3|4 5 6 7|8 9 0 1 |byte 0 |byte 1 |byte 2 |byte 3 |byte 4 |byte 5 |byte 6 |byte 7 | Width: 0 + 3 = 3 / 4 = 0 bytes (width 0 will not display) 1 + 3 = 4 / 4 = 1 byte 2 + 3 = 5 / 4 = 1 byte 3 + 3 = 6 / 4 = 1 byte 4 + 3 = 7 / 4 = 1 byte 5 + 3 = 8 / 4 = 2 bytes 6 + 3 = 9 / 4 = 2 bytes 7 + 3 = 10 / 4 = 2 bytes 8 + 3 = 11 / 4 = 2 bytes . . . 249 + 3 = 252 / 4 = 63 bytes 250 + 3 = 253 / 4 = 63 bytes 251 + 3 = 254 / 4 = 63 bytes 252 + 3 = 255 / 4 = 63 bytes 253 + 3 = 256 / 4 = 64 bytes 254 + 3 = 257 / 4 = 64 bytes 255 + 3 = 258 / 4 = 64 bytes Remember that the xy coords are 0 to w-1 and 0 to h-1. Width 255 is kind of strange since you can't set the width to 256, but a width of 255 and an x coord of 255 will still display the pixel (at least it better, or else I have a bug). Edited July 31, 2012 by matthew180 Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2569325 Share on other sites More sharing options...
moulinaie Posted August 1, 2012 Author Share Posted August 1, 2012 Hi Matthew, I selected another palette than the first one and reduced one by one the width and it works. I think that palette 0 confused me because the 4 colors are not so different. With palette 3, everything is clear. I go back to MLC... Guillaume; Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2569846 Share on other sites More sharing options...
moulinaie Posted August 2, 2012 Author Share Posted August 2, 2012 (edited) Hi again, I included in MLC a start of the BML support, for now you can set the BML size, address and flags and plot a pixel! That's the base. I want to add something like "filled rectangle" and "line". The little video shows the "plot" function in action. A BML of 128x128 is created and "N" pixels are plot, then the palette changes and "N" more pixels are plot, etc... Until the user presses a key. http://www.youtube.com/watch?v=nKwVxZxPR5Q Here is the MLC source code: $MLC F 100 10 3000 300 INPUT "Pixels per run : ",N 310 CALL LINK("TEST",N) 320 GOTO 300 $TEST GETPARAM 1 N ; how many pixels said the user XREGENABLE ; access enabled to extended registers STARTDATA byte &hE0 ; BML def bloc, first byte is flags byte &h40 ; then address (here 64*64 = 4096) bytes 64,32,128,128 ; then x,y,w,h ENDDATA E LET M 4096 ; BML address FOR I 0 4095 PUTTABLE M(I) 0 ; clear all pixels NEXT BMLSET 1 E ; and display the BML REPEAT NDO I N ; N times RND ; RND always return in Z AND Z 127 ; mask to get 0-127 LET X Z ; X = 0-127 RND AND Z 127 LET Y Z ; same Y = 0-127 RND AND Z 3 ; Z = 0-3, the color! BMLPLOT X Y Z ; plot the pixel NLOOP RND AND Z 15 ; Z=0-15 a palette number ADD Z &hE0 ; + flags PUTTABLE E(0) Z ; new flags BMLSET 1 E ; set new parameters for BML KEY 0 ; a key stroke? UNTIL>= ; if >= then no, repeat! PUTTABLE E(0) &h60 ; else, change flags to "BML disabled" BMLSET 1 E ; and set new parameters $$ $END I made some time testings and it appears that 10.000 pixels are plot in 13.8 seconds, not so bad knowing that three calls to RND are performed to get x,y and color for each pixel. Something important: this plot routine is integrated in MLC, executed by the TMS9900 and not the PIX instruction that can be found in the GPU. Why? Two main reasons: - not everyone knows assembler to use the GPU/PIX - the use of such a function (PIX) assumes that the GPU is not in use... wich would limit some programming ideas..!! So the user has the choice. Guillaume. Edited August 2, 2012 by moulinaie Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570370 Share on other sites More sharing options...
rocky007 Posted August 2, 2012 Share Posted August 2, 2012 Really great guillaume ! have you planned some move / scroll instructions ? Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570391 Share on other sites More sharing options...
moulinaie Posted August 2, 2012 Author Share Posted August 2, 2012 Really great guillaume ! have you planned some move / scroll instructions ? Not yet !!! I'll be happy with la LINE instruction..! But who knows... What about your game? Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570404 Share on other sites More sharing options...
rocky007 Posted August 2, 2012 Share Posted August 2, 2012 (edited) i worked on it in july, i hope to finish it this month Edited August 2, 2012 by rocky007 Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570407 Share on other sites More sharing options...
moulinaie Posted August 2, 2012 Author Share Posted August 2, 2012 i worked on it in july, i hope to finish it this month Don't wait..!! As MLC is growing, you may be short in memory in a few weeks... Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570410 Share on other sites More sharing options...
rocky007 Posted August 2, 2012 Share Posted August 2, 2012 i'm already too short in memory i'm very impatient to use the F18A new functions Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570413 Share on other sites More sharing options...
matthew180 Posted August 2, 2012 Share Posted August 2, 2012 I'll be happy with a LINE instruction..! I have most of a GPU line function written based on Michael Abrash's code from "Zen of Graphics Programming". It is a modified Bresenham's algorithm (plots segments instead of just pixels) with special cases for horz and vert lines. I was hoping to include it in the F18A's firmware but I just didn't have time to get all the extras I wanted to include (lines, circles, fills, etc.) No one has asked for any GPU code yet, but if you are interested then I'll finish it up and post it. Also, remember that the F18A has two 32-bit random number generators. One is dedicated to the host-system interface, the other is private to the GPU. The GPU can also do your range modification, i.e. divide the random number to create the desired range. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570460 Share on other sites More sharing options...
lucien2 Posted August 2, 2012 Share Posted August 2, 2012 i'm already too short in memory Welcome to the club! (As we say in french) Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2570565 Share on other sites More sharing options...
moulinaie Posted August 3, 2012 Author Share Posted August 3, 2012 Hi, Here is the LINES example. Two instructions for a line: BMLPLOT x y c ; plots the first pixel BMLDRAWTO x' y' c ; draws to x' y' With the source code: $MLC F 100 10 3000 300 INPUT "How many runs : ":N 310 CALL LINK("TEST",N) 320 END $TEST GETPARAM 1 N ; how many rectangles said the user XREGENABLE ; access enabled to extended registers STARTDATA byte &hE3 ; BML def bloc, first byte is flags with palette 3 byte &h40 ; then address (here 64*64 = 4096) bytes 96,64,64,64 ; then x,y,w,h ENDDATA E BMLSET 0 E ; set my BML without display BMLPLOT 0 0 0 ; upper corner BMLFILLRECT 64 64 ; clear all BMLSET 1 E ; and display the BML LET C 3 ; color NDO I N LET Z 63 FOR X 0 63 BMLPLOT X 0 C BMLDRAWTO Z 63 C DEC Z NEXT LET Z 62 FOR Y 1 62 BMLPLOT 63 Y C BMLDRAWTO 0 Z C DEC Z NEXT DEC C AND C 3 NLOOP PUTTABLE E(0) &h63 ; else, change flags to "BML disabled" BMLSET 1 E ; and set new parameters $$ $END Guillaume. 1 Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571067 Share on other sites More sharing options...
moulinaie Posted August 3, 2012 Author Share Posted August 3, 2012 Hello, The FILL RECTANGLE example. Two instructions to fill a rectangle: BMLPLOT x y c ; plots the upper left corner BMLFILLRECT w h ; fills the rectangle And the source code: $MLC F 100 10 3000 300 INPUT "How many rectangles : ":N 310 CALL LINK("TEST",N) 320 END $TEST GETPARAM 1 N ; how many rectangles said the user XREGENABLE ; access enabled to extended registers STARTDATA byte &hE3 ; BML def bloc, first byte is flags with palette 3 byte &h40 ; then address (here 64*64 = 4096) bytes 64,32,128,128 ; then x,y,w,h ENDDATA E BMLSET 0 E ; set my BML without display BMLPLOT 0 0 0 BMLFILLRECT 128 128 ; clear everything BMLSET 1 E ; and display the BML CLEAR C NDO I N RND AND Z 63 LET X Z ; X upper corner from 0 to 63 RND AND Z 63 ; same for Y BMLPLOT X Z C ; plot upper corner RND AND Z 31 ADD Z 32 LET X Z ; width = 32 to 63 RND AND Z 31 ADD Z 32 ; height from 32 to 63 BMLFILLRECT X Z ; fill W,H from last plot INC C ; next color AND 3 C ; always from 0 to 3 NLOOP KEYWAIT 0 PUTTABLE E(0) &h60 ; else, change flags to "BML disabled" BMLSET 1 E ; and set new parameters $$ $END Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571074 Share on other sites More sharing options...
moulinaie Posted August 3, 2012 Author Share Posted August 3, 2012 I'll be happy with a LINE instruction..! I have most of a GPU line function written based on Michael Abrash's code from "Zen of Graphics Programming". It is a modified Bresenham's algorithm (plots segments instead of just pixels) with special cases for horz and vert lines. I was hoping to include it in the F18A's firmware but I just didn't have time to get all the extras I wanted to include (lines, circles, fills, etc.) No one has asked for any GPU code yet, but if you are interested then I'll finish it up and post it. Also, remember that the F18A has two 32-bit random number generators. One is dedicated to the host-system interface, the other is private to the GPU. The GPU can also do your range modification, i.e. divide the random number to create the desired range. Hello Matthew, AS you see I have added in MLC (not using the GPU) the PLOT, LINES and FILL instructions. This way, a user can plot without any assembly knowledge for the GPU. Or he can plot and reserve the GPU for it's own use. But of course, this would be great to have the GPU routines as well. They will be much faster ! I imagine something like a library installed in >4000 with a function dispacher and the user can easely call then filling a bloc of parameters. That would be easy to add to MLC too. Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571075 Share on other sites More sharing options...
Willsy Posted August 3, 2012 Share Posted August 3, 2012 This is awesome! So, the line drawing code is running on the 4A's 9900, not the GPU? Would you mind sharing the assembly code to do the line drawing and rectangles? I'd like to add bitmap support to TurboForth in the future, and this would make my life much easier Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571108 Share on other sites More sharing options...
moulinaie Posted August 3, 2012 Author Share Posted August 3, 2012 This is awesome! So, the line drawing code is running on the 4A's 9900, not the GPU? Exactly !! The TMS9900 is yet impressive...! What will it be with the GPU..! Would you mind sharing the assembly code to do the line drawing and rectangles? I'd like to add bitmap support to TurboForth in the future, and this would make my life much easier This is not a problem, I'll be happy to share! But is it for working on the BML or for standard BITMAP mode of the TI? Because the encoding is really different. Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571111 Share on other sites More sharing options...
matthew180 Posted August 3, 2012 Share Posted August 3, 2012 I imagine something like a library installed in >4000 with a function dispacher and the user can easely call then filling a bloc of parameters. That would be easy to add to MLC too. Funny you should say that, because there are actually two functions installed at >4000 by default with the F18A is powered on. I didn't have time to write all the routines I wanted to include (lines, circles, sin, cos, etc.), but I did manage to get two in there: 1. Block copy 2. Load font I did use a "dispatch" (or vector table) approach, but I forgot to leave room for user defined functions, so the there is only room for two vectors with the built-in code. I'm pretty mad at myself right now for not thinking about it and just leaving room for 16 vectors, or something like that. This is the call interface code at >4000: LI R15,>47FE * Set the stack pointer to the bottom of the GRAM MAIN IDLE * Start out idle since the GPU is triggered at power-on * Vector jump table. Reads >3F00 for routine number. CLR R1 MOVB @>3F00,R1 * Load routine vector into R1 SRL R1,7 * Multiply by 2 MOV @VECTOR(R1),R0 * Get the address of the routine B *R0 * Branch to routine VECTOR DATA BLKCPY * Block Copy DATA FONTLD * Font Load You can see there are only two, and BLKCPY starts right after the vector table. Heh, I suppose block copy could be used to move the code down and expand the vector table, and the two existing entries could be patched. That would be pretty simple. Or I should have put the vector table at the bottom. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571185 Share on other sites More sharing options...
moulinaie Posted August 3, 2012 Author Share Posted August 3, 2012 Funny you should say that, because there are actually two functions installed at >4000 by default with the F18A is powered on. ... I think you had to send the ordered F18A without wainting. Lots of us were waiting for it. Now that we have them, the ideas can come and someday, an update could be done with the most interesting/useful ones to bring a solid library usable by MLC in XB environment, Forth and Assembler. That common work would benefit to everyone. Something else.... You explained how to manage the 32bits counter from a GPU program. I'd like to do it from MLC reading the registers 37 to 41. Can you explain how does the four bits of the counter control (R37) work? Thanks, Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571194 Share on other sites More sharing options...
matthew180 Posted August 3, 2012 Share Posted August 3, 2012 (edited) There are two 32-bit counters. One is dedicated to the GPU, the other is accessible by the host-system. The GPU can actually access both counters, but the way the GPU interfaces each counter is different. VR37 has 4-bits to control the counter: 0 1 2 3 | 4 | 5 | 6 | 7 | |X X X X |RESET|LOAD|RUN|INC| RESET and LOAD are only effective when you write to the register. RESET: if this bit is set when you write to VR37, it will reset the counter to 0. This does *not* affect VR38 to VR41. LOAD: if this bit is set when you write to VR37, the counter will be loaded with the values from VR38 to VR41. RESET will override LOAD in the case where you have both set to '1' when you write to VR37. Both RESET and LOAD will clear themselves after you write to VR37, they are once-per-write indicators. RUN is a switch that will cause the counter to free-run and increment every 10ns. You could use this to accurately time events or instruction loops. The counter will run from 0 to its max value in about 43.9496 seconds. INC will cause the counter to increment by 1. INC, like RUN is also a "mode" of operation for the counter that comes in to play when reading the counter via status registers SR4 to SR7. Even though INC will remain a '1' after being written, the counter is only incremented when you write to VR37. Some useful count values are might be: . count . | elapsed time --------------+-------------- 100 . . . . . | 1 microsecond 100,000 . . . | 1 millisecond 100,000,000 . | 1 second 1,666,666 . . | 16.6 milliseconds (60Hz) 3,333,300 . . | 33.3 milliseconds (30Hz) 255 . . . . . | 2.56 microseconds 65,535 . . . | 0.65535 milliseconds 16,777,216 . | 0.16777216 seconds 4,294,967,296 | 42.94967296 seconds Reading the counter's four bytes is done via SR4 to SR7, which have a special feature to make getting the data easier. This method also works for the 32-bit RNG which uses SR8 to SR11. By setting VR15 to one of the counter's four status registers (SR7 to SR7), reading the VDP status port will cause an automatic pause of the counter if it is running continuously (the RUN bit is set). The value for the specified byte (based on VR15) will be returned, and the status register to read in VR15 will automatically increment by one. This allows a consecutive read of all four bytes of the counter with only four status register reads (assuming you set VR15 to start with SR4). After reading the LSB of the counter's value (SR7), the counter will automatically resume if the RUN bit is set, or auto increment if the INC bit is set, and the value in VR15 will reset back to SR4. This auto-stop and resume (or increment) feature does not work for the GPU, only the host-interface via the status port. The GPU can read the counter's or RNG's values, but if the counter or RNG are free-running, the GPU would have to stop them first to get a single non-changing 32-bit value. Edited August 3, 2012 by matthew180 Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571236 Share on other sites More sharing options...
moulinaie Posted August 3, 2012 Author Share Posted August 3, 2012 There are two 32-bit counters. One is dedicated to the GPU, the other is accessible by the host-system. The GPU can actually access both counters, but the way the GPU interfaces each counter is different. VR37 has 4-bits to control the counter: 0 1 2 3 | 4 | 5 | 6 | 7 | |X X X X |RESET|LOAD|RUN|INC| RESET and LOAD are only effective when you write to the register. RESET: if this bit is set when you write to VR37, it will reset the counter to 0. This does *not* affect VR38 to VR41. LOAD: if this bit is set when you write to VR37, the counter will be loaded with the values from VR38 to VR41. RESET will override LOAD in the case where you have both set to '1' when you write to VR37. Both RESET and LOAD will clear themselves after you write to VR37, they are once-per-write indicators. RUN is a switch that will cause the counter to free-run and increment every 10ns. You could use this to accurately time events or instruction loops. The counter will run from 0 to its max value in about 43.9496 seconds. Is it possible to write at once binary 00001010 to reset and run the counter? ie to start it from zero in one write? Guillaume. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571242 Share on other sites More sharing options...
matthew180 Posted August 3, 2012 Share Posted August 3, 2012 For the GPU, writing to VR38 to VR41 will set the values to be loaded to the counter, just like the host-system. However, when the GPU *reads* VR38 to VR41, the counter's (or RNG's) current value will be returned, and *not* the values in VR38 to VR41. The GPU can control the counter by writing to VR37 just like the host-system. For the GPU, it is easier to use its dedicated memory-mapped counter. ZERO BYTE 0 ONE BYTE 1 . . . MOVB @ZERO,@>8004 * Stop the counter CLR @>8000 * Clear MSword (works because of the even memory address) CLR @>8002 * Clear LSword MOVB @ONE,@>8004 * Free run The GPU's counter and RNG work the same, only the base address is different: 32-bit counter 8xx0 - MSB 8xx1 8xx2 8xx3 - LSB 8xx4 - write >x1 = free run, >x0 = stop 8xx6 - write >x1 to single step 32-bit Linear Feedback Shift-Register (LFSR) Random Number Generator (RNG) 9xx0 - MSB 9xx1 9xx2 9xx3 - LSB 9xx4 - write >x1 = free run, >x0 = stop 9xx6 - write >x1 to single step The GPU's memory map is: VRAM 14-bit, 16K @ >0000 to >3FFF (0011 1111 1111 1111) GRAM 11-bit, 2K @ >4000 to >47FF (0100 x111 1111 1111) PRAM 7-bit, 128 @ >5000 to >5x7F (0101 xxxx x111 1111) VREG 6-bit, 64 @ >6000 to >6x3F (0110 xxxx xx11 1111) current scanline @ >7000 to >7xx0 (0111 xxxx xxxx xxx0) blanking . . . . @ >7001 to >7xx1 (0111 xxxx xxxx xxx1) 32-bit counter @ >8000 to >8xx6 (1000 xxxx xxxx x110) 32-bit rng . . . @ >9000 to >9xx6 (1001 xxxx xxxx x110) F18A version . . @ >A000 to >Axxx (1010 xxxx xxxx xxxx) GPU status data @ >B000 to >Bxxx (1011 xxxx xxxx xxxx) VRAM = VDP RAM GRAM = GPU only RAM PRAM = Palette RAM - 16-bit access ONLY. Byte instructions WILL NOT update the palette registers VREG = VDP Registers - 1-byte per registers, unused registers will return a value of 0 GPU Status is write only and intended for the host-system to read via SR2. The current scan line, blanking, and F18A version are read only. Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571249 Share on other sites More sharing options...
matthew180 Posted August 3, 2012 Share Posted August 3, 2012 Is it possible to write at once binary 00001010 to reset and run the counter? ie to start it from zero in one write? Yes * Free run test LI R0,>250A * VR37, reset and run BL @VWTR Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2571250 Share on other sites More sharing options...
moulinaie Posted August 5, 2012 Author Share Posted August 5, 2012 (edited) Hi again, I added the TIMER support in MLC. So now we have a counter with 10ns precision, that's great to optimize the speed of short assembly portions of code. The internal counter of the F18A is limited to 32 bits (as Matthew said, that's 43 sec). I extended this with a "TIMER SUM" that cumulates the current value of the TIMER into a 48 bits zone, so now the limit is 2^47 * 10ns = 1 407 374 sec = 391 hours. (don't use the last 48th bit as this would lead to a problem of sign that I didn't manage..) So if you use several times in your program: "TIMER SUM RESET RUN" this sums the current value, then reset the timer and go on counting. And you can bypass the limit of 43 seconds. Then, this value can be easely converted to a float number and sent back to the XB calling program. This is a short example that runs the timer and returns the time: $MLC F 100 10 3000 300 INPUT "Ready : ":K$ 305 CALL LINK("TEST",N) 310 PRINT N*1E-8;" seconds" ; 1E-8 to convert ns into seconds. 320 GOTO 300 $TEST XREGENABLE ; enable use of extra F18A registers TIMER RESET RUN ; reset timer and runs it KEYWAITNEW 0 ; wait for a new key (to prevent from detecting the last "ENTER") TIMER READ TOFLOAT ; read timer value and turns it to FLOAT in float register 0 PUTFLOAT 1(0) ; returns the float value to the first agrument N $$ $END I think that this will be final version 1.30 for the Precompiler and MLC, I will update the manual and the ZIP on my page. Guillaume. Edited August 5, 2012 by moulinaie Quote Link to comment https://forums.atariage.com/topic/200882-first-program-run-on-the-gpuf18a-with-mlc/#findComment-2572339 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.