GT Turbo Posted August 13, 2007 Share Posted August 13, 2007 For people who wants to get the hell out of their Jag, let's have a look here : http://www.jagware.org/index.php?showtopic=464 and here : http://www.jagware.org/index.php?showtopic=465 SCPCD has timed some operations and gives measures. That can help for doing optimisations on Jaguar code and that is real timing. GT Turbo (Jagware) Quote Link to comment Share on other sites More sharing options...
Gorf Posted August 14, 2007 Share Posted August 14, 2007 (edited) Hey Jagware crew.... since I really want to see the difference....and dont have an analyzer... ...A GPU version of the same..... ... try this out in main....just align it on a page boundary and have the 68k start it....stop the 68k after the start though. Lets see the analyzer results. BLITBASE .equr r14; base of blitter registers A1FLAGS EQU 1; register index defines A1CLIP EQU 2 A1PIXEL EQU 3 A1STEP EQU 4 A1FSTEP EQU 5 A1FPIXEL EQU 6 A1INC EQU 7 A1FINC EQU 8 A2BASE EQU 9 A2FLAGS EQU 10 A2MASK EQU 11 A2PIXEL EQU 12 A2STEP EQU 13 BCMD EQU 14 BCOUNT EQU 15 BSRCDH EQU 16 BSRCDL EQU 17 BDSTDH EQU 18 BDSTDL EQU 19 BDSTZH EQU 20 BDSTZL EQU 21 BSRCZ1H EQU 22 BSRCZ1L EQU 23 BSRCZ2H EQU 24 BSRCZ2L EQU 25 BPATDH EQU 26 BPATDL EQU 27 BIINC EQU 28 BZINC EQU 29 BSTOP EQU 30 BLITI0 EQU 31 BLITI1 EQU 32 BLITBASEHI .equr r15; pick up were last index register leaves off....not using most of these here but ; good to have for future endevors....this is the same loaction as BLIT_I2 BLITI3 EQU 1 BLITZ0 EQU 2 BLITZ1 EQU 3 BLITZ2 EQU 4 BLITZ3 EQU 5 moveq #0,r0 movei #A1_BASE,BLITBASE movei #PITCH1|PIXEL32|WID128|XADDPHR,r2 movei #source,r3 movei #destination,r4 movei #$00010400,r5 movei #SRCEN|LFU_REPLACE,r6 movei #G_CTRL,r7 store r2,(BLITBASE+A2FLAGS) store r3,(BLITBASE+A2BASE) store r0,(BLITBASE+A2PIXEL) store r0,(BLITBASE+A2STEP) store r2,(BLITBASE+A1FLAGS) store r4,(BLITBASE+A1BASE) store r0,(BLITBASE+A1PIXEL) store r0,(BLITBASE+A1FPIXEL) store r0,(BLITBASE+A1STEP) store r0,(BLITBASE+A1FSTEP) store r0,(BLITBASE+A1CLIP) store r0,(BLITBASE+A1INC) store r0,(BLITBASE+A1FINC) store r5,(BLITBASE+BCOUNT) store r6,(BLITBASE+BCMD) store r0,(r7) ;done...stop GPU nop nop nop nop Edited August 14, 2007 by Gorf Quote Link to comment Share on other sites More sharing options...
SCPCD Posted August 15, 2007 Share Posted August 15, 2007 (edited) It's a first try, i'll make other test in the futur. I have made some changes to see more easily this onto the LA. move.l #ints,VBL_VECTOR move.w #%1111100000010,INT1;clear all pending int, & enable GPU interrupt gorf_test: move.l #gpu_code_start,G_PC move.l #1,G_CTRL stop #$2100;wait a gpu stop bra gorf_test ints: move.w #%1111100000010,INT1 move.w #0,INT2;68k to normal level rte .qphrase gpu_code_start: .gpu BLITBASE .equr r14; base of blitter registers A1FLAGS EQU 1; register index defines A1CLIP EQU 2 A1PIXEL EQU 3 A1STEP EQU 4 A1FSTEP EQU 5 A1FPIXEL EQU 6 A1INC EQU 7 A1FINC EQU 8 A2BASE EQU 9 A2FLAGS EQU 10 A2MASK EQU 11 A2PIXEL EQU 12 A2STEP EQU 13 BCMD EQU 14 BCOUNT EQU 15 BSRCDH EQU 16 BSRCDL EQU 17 BDSTDH EQU 18 BDSTDL EQU 19 BDSTZH EQU 20 BDSTZL EQU 21 BSRCZ1H EQU 22 BSRCZ1L EQU 23 BSRCZ2H EQU 24 BSRCZ2L EQU 25 BPATDH EQU 26 BPATDL EQU 27 BIINC EQU 28 BZINC EQU 29 BSTOP EQU 30 BLITI0 EQU 31 BLITI1 EQU 32 BLITBASEHI .equr r15; pick up were last index register leaves off....not using most of these here but ; good to have for future endevors....this is the same loaction as BLIT_I2 BLITI3 EQU 1 BLITZ0 EQU 2 BLITZ1 EQU 3 BLITZ2 EQU 4 BLITZ3 EQU 5 moveq #0,r0 moveq #2,r1;for G_CTRL register : interrupt the 68k movei #A1_BASE,BLITBASE movei #PITCH1|PIXEL32|WID128|XADDPHR,r2 movei #source,r3;=somewhere in DRAM phrase aligned movei #destination,r4;=G_RAM+$8000 movei #$00010010,r5 movei #SRCEN|LFU_REPLACE,r6 movei #G_CTRL,r7 store r2,(BLITBASE+A2FLAGS) store r3,(BLITBASE+A2BASE) store r0,(BLITBASE+A2PIXEL) store r0,(BLITBASE+A2STEP) store r2,(BLITBASE+A1FLAGS) store r4,(BLITBASE) store r0,(BLITBASE+A1PIXEL) store r0,(BLITBASE+A1FPIXEL) store r0,(BLITBASE+A1STEP) store r0,(BLITBASE+A1FSTEP) store r0,(BLITBASE+A1CLIP) store r0,(BLITBASE+A1INC) store r0,(BLITBASE+A1FINC) store r5,(BLITBASE+BCOUNT) store r6,(BLITBASE+BCMD) nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop store r1,(r7) ;done...stop GPU and launch a CPU interrupt nop nop .68000 .gpu_code_end: .dc.l 0 Actually I don't know how I can easily trig for your test so I add a repeat of it. But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K. And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority... so I add nop and reduce the length of the blitt for a first try. this is the result : in A : the first 2 instruction of the GPU. We have 13 cycles for 2 consecutive moveq. 12 cycles per each "32-bit GPU instruction" read until "store" instructions. then each 2x"store rn,(rn+x)" takes 14 cycles. When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram ! time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles. time between 2 blitt are 8 cycles. I'll make updates in the futur, now I have others things to do Edited August 15, 2007 by SCPCD Quote Link to comment Share on other sites More sharing options...
Gorf Posted August 15, 2007 Share Posted August 15, 2007 Actually I don't know how I can easily trig for your test so I add a repeat of it.But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K. And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority... Oh...yeah....forgot about the blitter running... ..you can have the GPU wait for the Blitter then have the blitter stop and use the GPU interrrupt to wake the 68k. When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles. time between 2 blitt are 8 cycles. I m not suprised by the faster DRAM instruction reads. I think the pipeline is the issue out in main. It seems to pipeline at 64 bits instead of its internal 32 bits....im guessing this BTW. It makes sense whenyou consider how main code jumps work. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.