Jump to content
IGNORED

Get the hell from your Jag !


GT Turbo

Recommended Posts

For people who wants to get the hell out of their Jag, let's have a look here :

 

http://www.jagware.org/index.php?showtopic=464

 

and here :

 

http://www.jagware.org/index.php?showtopic=465

 

 

SCPCD has timed some operations and gives measures. That can help for doing optimisations on Jaguar code and that is real timing.

 

 

 

GT Turbo (Jagware) poulpe.gif

Link to comment
Share on other sites

Hey Jagware crew.... since I really want to see the difference....and dont have an analyzer...

 

...A GPU version of the same.....

 

... try this out in main....just align it on a page boundary

and have the 68k start it....stop the 68k after the start though.

 

Lets see the analyzer results.

 

 

BLITBASE	.equr 		r14; base of blitter registers

A1FLAGS		EQU		1; register index defines
A1CLIP		EQU		2
A1PIXEL		EQU		3
A1STEP		EQU		4	
A1FSTEP		EQU		5
A1FPIXEL		EQU		6
A1INC		EQU		7
A1FINC		EQU		8
A2BASE		EQU		9
A2FLAGS		EQU		10
A2MASK		EQU		11
A2PIXEL		EQU		12
A2STEP		EQU		13
BCMD		EQU		14
BCOUNT		EQU		15
BSRCDH		EQU		16
BSRCDL		EQU		17
BDSTDH		EQU		18
BDSTDL		EQU		19
BDSTZH		EQU		20
BDSTZL		EQU		21
BSRCZ1H		EQU		22
BSRCZ1L		EQU		23
BSRCZ2H		EQU		24
BSRCZ2L		EQU		25
BPATDH		EQU		26
BPATDL		EQU		27
BIINC		EQU		28
BZINC		EQU		29	
BSTOP		EQU		30
BLITI0 		EQU		31
BLITI1		EQU		32	

BLITBASEHI	.equr 		r15; pick up were last index register leaves off....not using most of these here but
			; good to have for future endevors....this is the same loaction as BLIT_I2
BLITI3		EQU		1		
BLITZ0		EQU		2			
BLITZ1		EQU		3	
BLITZ2		EQU		4	
BLITZ3		EQU		5	


moveq 	#0,r0
movei	#A1_BASE,BLITBASE
movei 	#PITCH1|PIXEL32|WID128|XADDPHR,r2
movei	#source,r3
movei	#destination,r4
movei	#$00010400,r5
movei	#SRCEN|LFU_REPLACE,r6
movei	#G_CTRL,r7

store	r2,(BLITBASE+A2FLAGS)
store	r3,(BLITBASE+A2BASE)
store	r0,(BLITBASE+A2PIXEL)
store	r0,(BLITBASE+A2STEP)
store	r2,(BLITBASE+A1FLAGS)
store	r4,(BLITBASE+A1BASE)
store	r0,(BLITBASE+A1PIXEL)
store	r0,(BLITBASE+A1FPIXEL)
store	r0,(BLITBASE+A1STEP)
store	r0,(BLITBASE+A1FSTEP)
store	r0,(BLITBASE+A1CLIP)
store	r0,(BLITBASE+A1INC)
store	r0,(BLITBASE+A1FINC)
store	r5,(BLITBASE+BCOUNT)
store	r6,(BLITBASE+BCMD)
store	r0,(r7)	;done...stop GPU
nop
nop
nop
nop

Edited by Gorf
Link to comment
Share on other sites

It's a first try, i'll make other test in the futur.

 

I have made some changes to see more easily this onto the LA.

 

	move.l		#ints,VBL_VECTOR
move.w		#%1111100000010,INT1;clear all pending int, & enable GPU interrupt

gorf_test:
move.l		#gpu_code_start,G_PC
move.l		#1,G_CTRL
stop		#$2100;wait a gpu stop

bra			gorf_test

ints:
move.w		#%1111100000010,INT1
move.w		#0,INT2;68k to normal level
rte

.qphrase
gpu_code_start:
.gpu

BLITBASE	.equr		 r14; base of blitter registers

A1FLAGS		EQU		1; register index defines
A1CLIP		EQU		2
A1PIXEL		EQU		3
A1STEP		EQU		4	
A1FSTEP		EQU		5
A1FPIXEL		EQU		6
A1INC		EQU		7
A1FINC		EQU		8
A2BASE		EQU		9
A2FLAGS		EQU		10
A2MASK		EQU		11
A2PIXEL		EQU		12
A2STEP		EQU		13
BCMD		EQU		14
BCOUNT		EQU		15
BSRCDH		EQU		16
BSRCDL		EQU		17
BDSTDH		EQU		18
BDSTDL		EQU		19
BDSTZH		EQU		20
BDSTZL		EQU		21
BSRCZ1H		EQU		22
BSRCZ1L		EQU		23
BSRCZ2H		EQU		24
BSRCZ2L		EQU		25
BPATDH		EQU		26
BPATDL		EQU		27
BIINC		EQU		28
BZINC		EQU		29	
BSTOP		EQU		30
BLITI0		 EQU		31
BLITI1		EQU		32	

BLITBASEHI	.equr		 r15; pick up were last index register leaves off....not using most of these here but
		 ; good to have for future endevors....this is the same loaction as BLIT_I2
BLITI3		EQU		1		
BLITZ0		EQU		2			
BLITZ1		EQU		3	
BLITZ2		EQU		4	
BLITZ3		EQU		5	


moveq	#0,r0
moveq	#2,r1;for G_CTRL register : interrupt the 68k
movei	#A1_BASE,BLITBASE
movei	#PITCH1|PIXEL32|WID128|XADDPHR,r2
movei	#source,r3;=somewhere in DRAM phrase aligned
movei	#destination,r4;=G_RAM+$8000
movei	#$00010010,r5
movei	#SRCEN|LFU_REPLACE,r6
movei	#G_CTRL,r7

store	r2,(BLITBASE+A2FLAGS)
store	r3,(BLITBASE+A2BASE)
store	r0,(BLITBASE+A2PIXEL)
store	r0,(BLITBASE+A2STEP)
store	r2,(BLITBASE+A1FLAGS)
store	r4,(BLITBASE)
store	r0,(BLITBASE+A1PIXEL)
store	r0,(BLITBASE+A1FPIXEL)
store	r0,(BLITBASE+A1STEP)
store	r0,(BLITBASE+A1FSTEP)
store	r0,(BLITBASE+A1CLIP)
store	r0,(BLITBASE+A1INC)
store	r0,(BLITBASE+A1FINC)
store	r5,(BLITBASE+BCOUNT)
store	r6,(BLITBASE+BCMD)
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
store	r1,(r7) ;done...stop GPU and launch a CPU interrupt
nop
nop
.68000
.gpu_code_end:
.dc.l	0

 

Actually I don't know how I can easily trig for your test so I add a repeat of it.

But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K.

And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority...

 

so I add nop and reduce the length of the blitt for a first try.

 

this is the result :

post-5715-1187195105_thumb.jpg

 

in A : the first 2 instruction of the GPU.

We have 13 cycles for 2 consecutive moveq.

12 cycles per each "32-bit GPU instruction" read until "store" instructions.

then each 2x"store rn,(rn+x)" takes 14 cycles.

 

When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !

time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles.

time between 2 blitt are 8 cycles.

 

 

I'll make updates in the futur, now I have others things to do ;)

Edited by SCPCD
Link to comment
Share on other sites

Actually I don't know how I can easily trig for your test so I add a repeat of it.

But as the 68K is stopped the only possibility to restart when the blitt is finisched is to restart the 68k by an interrupt but there is no Blitt interrupt for the 68K.

And I can not launch the CPU interrupt before the blitter is idled, else the 68k take the priority...

 

Oh...yeah....forgot about the blitter running... :P..you can have the GPU wait for the Blitter then

have the blitter stop and use the GPU interrrupt to wake the 68k.

 

When the blitter start, we can see that there are interleaved of gpu instructions and blitter access, and more interesting : time between each GPU instruction takes less time to read into the dram !

time between 2 "32-bit GPU instruction" are not constant but seems to be about 8 cycles.

time between 2 blitt are 8 cycles.

 

I m not suprised by the faster DRAM instruction reads. I think the pipeline is the issue out in

main. It seems to pipeline at 64 bits instead of its internal 32 bits....im guessing this BTW.

It makes sense whenyou consider how main code jumps work.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...