Get the hell from your Jag !

GT Turbo · August 13, 2007

For people who wants to get the hell out of their Jag, let's have a look here :

http://www.jagware.org/index.php?showtopic=464

and here :

http://www.jagware.org/index.php?showtopic=465

SCPCD has timed some operations and gives measures. That can help for doing optimisations on Jaguar code and that is real timing.

GT Turbo (Jagware)

Gorf · August 14, 2007

Hey Jagware crew.... since I really want to see the difference....and dont have an analyzer...

...A GPU version of the same.....

... try this out in main....just align it on a page boundary

and have the 68k start it....stop the 68k after the start though.

Lets see the analyzer results.

BLITBASE	.equr 		r14; base of blitter registers

A1FLAGS		EQU		1; register index defines
A1CLIP		EQU		2
A1PIXEL		EQU		3
A1STEP		EQU		4	
A1FSTEP		EQU		5
A1FPIXEL		EQU		6
A1INC		EQU		7
A1FINC		EQU		8
A2BASE		EQU		9
A2FLAGS		EQU		10
A2MASK		EQU		11
A2PIXEL		EQU		12
A2STEP		EQU		13
BCMD		EQU		14
BCOUNT		EQU		15
BSRCDH		EQU		16
BSRCDL		EQU		17
BDSTDH		EQU		18
BDSTDL		EQU		19
BDSTZH		EQU		20
BDSTZL		EQU		21
BSRCZ1H		EQU		22
BSRCZ1L		EQU		23
BSRCZ2H		EQU		24
BSRCZ2L		EQU		25
BPATDH		EQU		26
BPATDL		EQU		27
BIINC		EQU		28
BZINC		EQU		29	
BSTOP		EQU		30
BLITI0 		EQU		31
BLITI1		EQU		32	

BLITBASEHI	.equr 		r15; pick up were last index register leaves off....not using most of these here but
			; good to have for future endevors....this is the same loaction as BLIT_I2
BLITI3		EQU		1		
BLITZ0		EQU		2			
BLITZ1		EQU		3	
BLITZ2		EQU		4	
BLITZ3		EQU		5	


moveq 	#0,r0
movei	#A1_BASE,BLITBASE
movei 	#PITCH1|PIXEL32|WID128|XADDPHR,r2
movei	#source,r3
movei	#destination,r4
movei	#$00010400,r5
movei	#SRCEN|LFU_REPLACE,r6
movei	#G_CTRL,r7

store	r2,(BLITBASE+A2FLAGS)
store	r3,(BLITBASE+A2BASE)
store	r0,(BLITBASE+A2PIXEL)
store	r0,(BLITBASE+A2STEP)
store	r2,(BLITBASE+A1FLAGS)
store	r4,(BLITBASE+A1BASE)
store	r0,(BLITBASE+A1PIXEL)
store	r0,(BLITBASE+A1FPIXEL)
store	r0,(BLITBASE+A1STEP)
store	r0,(BLITBASE+A1FSTEP)
store	r0,(BLITBASE+A1CLIP)
store	r0,(BLITBASE+A1INC)
store	r0,(BLITBASE+A1FINC)
store	r5,(BLITBASE+BCOUNT)
store	r6,(BLITBASE+BCMD)
store	r0,(r7)	;done...stop GPU
nop
nop
nop
nop

Edited August 14, 2007 by Gorf

SCPCD · August 15, 2007

It's a first try, i'll make other test in the futur.

I have made some changes to see more easily this onto the LA.

	move.l		#ints,VBL_VECTOR
move.w		#%1111100000010,INT1;clear all pending int, & enable GPU interrupt

gorf_test:
move.l		#gpu_code_start,G_PC
move.l		#1,G_CTRL
stop		#$2100;wait a gpu stop

bra			gorf_test

ints:
move.w		#%1111100000010,INT1
move.w		#0,INT2;68k to normal level
rte

.qphrase
gpu_code_start:
.gpu

BLITBASE	.equr		 r14; base of blitter registers

A1FLAGS		EQU		1; register index defines
A1CLIP		EQU		2
A1PIXEL		EQU		3
A1STEP		EQU		4	
A1FSTEP		EQU		5
A1FPIXEL		EQU		6
A1INC		EQU		7
A1FINC		EQU		8
A2BASE		EQU		9
A2FLAGS		EQU		10
A2MASK		EQU		11
A2PIXEL		EQU		12
A2STEP		EQU		13
BCMD		EQU		14
BCOUNT		EQU		15
BSRCDH		EQU		16
BSRCDL		EQU		17
BDSTDH		EQU		18
BDSTDL		EQU		19
BDSTZH		EQU		20
BDSTZL		EQU		21
BSRCZ1H		EQU		22
BSRCZ1L		EQU		23
BSRCZ2H		EQU		24
BSRCZ2L		EQU		25
BPATDH		EQU		26
BPATDL		EQU		27
BIINC		EQU		28
BZINC		EQU		29	
BSTOP		EQU		30
BLITI0		 EQU		31
BLITI1		EQU		32	

BLITBASEHI	.equr		 r15; pick up were last index register leaves off....not using most of these here but
		 ; good to have for future endevors....this is the same loaction as BLIT_I2
BLITI3		EQU		1		
BLITZ0		EQU		2			
BLITZ1		EQU		3	
BLITZ2		EQU		4	
BLITZ3		EQU		5	


moveq	#0,r0
moveq	#2,r1;for G_CTRL register : interrupt the 68k
movei	#A1_BASE,BLITBASE
movei	#PITCH1|PIXEL32|WID128|XADDPHR,r2
movei	#source,r3;=somewhere in DRAM phrase aligned
movei	#destination,r4;=G_RAM+$8000
movei	#$00010010,r5
movei	#SRCEN|LFU_REPLACE,r6
movei	#G_CTRL,r7

store	r2,(BLITBASE+A2FLAGS)
store	r3,(BLITBASE+A2BASE)
store	r0,(BLITBASE+A2PIXEL)
store	r0,(BLITBASE+A2STEP)
store	r2,(BLITBASE+A1FLAGS)
store	r4,(BLITBASE)
store	r0,(BLITBASE+A1PIXEL)
store	r0,(BLITBASE+A1FPIXEL)
store	r0,(BLITBASE+A1STEP)
store	r0,(BLITBASE+A1FSTEP)
store	r0,(BLITBASE+A1CLIP)
store	r0,(BLITBASE+A1INC)
store	r0,(BLITBASE+A1FINC)
store	r5,(BLITBASE+BCOUNT)
store	r6,(BLITBASE+BCMD)
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
store	r1,(r7) ;done...stop GPU and launch a CPU interrupt
nop
nop
.68000
.gpu_code_end:
.dc.l	0

Actually I don't know how I can easily trig for your test so I add a repeat of it.

But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K.

And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority...

so I add nop and reduce the length of the blitt for a first try.

this is the result :

in A : the first 2 instruction of the GPU.

We have 13 cycles for 2 consecutive moveq.

12 cycles per each "32-bit GPU instruction" read until "store" instructions.

then each 2x"store rn,(rn+x)" takes 14 cycles.

When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !

time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles.

time between 2 blitt are 8 cycles.

I'll make updates in the futur, now I have others things to do

Edited August 15, 2007 by SCPCD

Gorf · August 15, 2007

Actually I don't know how I can easily trig for your test so I add a repeat of it.
But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K.

And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority...

Oh...yeah....forgot about the blitter running... ..you can have the GPU wait for the Blitter then

have the blitter stop and use the GPU interrrupt to wake the 68k.

When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !
time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles.

time between 2 blitt are 8 cycles.

I m not suprised by the faster DRAM instruction reads. I think the pipeline is the issue out in

main. It seems to pipeline at 64 bits instead of its internal 32 bits....im guessing this BTW.

It makes sense whenyou consider how main code jumps work.

Sign In

Get the hell from your Jag !

Recommended Posts

GT Turbo

Link to comment

Share on other sites

Gorf

Link to comment

Share on other sites

SCPCD

Link to comment

Share on other sites

Gorf

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More