New GUI for the Atari 8-bit

popmilo · June 10, 2011

...

As for having the right-clipped part as a separate routine: makes sense, although there's got to be some selection logic somewhere (even if in the VBL) which tells the DLI where to branch. I suppose it would be faster to test for byte 39 and then set the DLI hardware vector to point to the appropriate routine. However, this introduces a few extra cycles into the VBL code, unless we carefully position the DLI code on page boundaries and just flip the MSB instead of setting the flag. All these little aspects have a large cumulative effect.

...

Checking inside draw loop takes 16x5=80 cycles minimum...

Instead you could check once at the beginning of the Dli, and just branch to separate part of Dli.

---------------------------------------------------

Wow - didn't think of cartridge capabilities

Make fastest possible fixed shape sprite drawing with something like this:

Use X register for choosing column on a screen.

Use Y register for sprite shift index.

lda screen,x
sta restore_buffer
and Mask_left_part_0,y
ora Shape_left_part_0,y
sta screen,x

lda screen+1,x
sta restore_buffer+1
and Mask_right_part_0,y
ora Shape_right_part_0,y
sta screen+1,x

lda screen+40,x
sta restore_buffer+2
and Mask_left_part_1,y
ora Shape_left_part_1,y
sta screen+40,x

lda screen+41,x
sta restore_buffer+3
and Mask_right_part_1,y
ora Shape_right_part_1,y
sta screen+41,x

...
..
.
rts

481 byte routine for drawing pointer on a specific scanline.

481*200 = 96200 bytes for 200 Y coordinates

Little less than that because last 15 lines have less to draw. And column 39 also half of that...

Could even use immediate mode and 8 times more memory for shapes and masks

Restore is even simpler:

lda restore_buffer
sta screen,x
lda restore_buffer+1
sta screen+1,x
lda restore_buffer+2
sta screen+40,x
lda restore_buffer+2
sta screen+41,x
...
..
.
rts

200*193=38600 bytes.

Little over 128Kb for a mouse routine that draws pointer in around 700 cycles ~ 44 per scanline

Too fast - it would start deleting pointer before it gets to be shown

But fun to think about...

Cartridge based routines are far away from being explored in its full potential...

flashjazzcat · June 10, 2011

Here is my take on delete routine (starting from bottom of sprite).

Advantages are:

-not using X register - one less INX = -2 cycles per loop

-Using Y register for loop counter. no need for additional counter = -5 cycles per loop (if you are using Zeropage location for your counter, -6 if you are using absolute).

-Quick jump to end when Y<0 = not wasting time on one more addition.

Disadvantage:

- mscr2 should be set to (mscr + (15*40)-30) - preparing that takes few additions. You could maybe derive it from the last value of mscr in the draw part ?
               sec
               ldy #31          ;I assumed mouse pointer is 16 pixels high ?
loop_delete	
	lda restorebuf,y
	sta (mscr2),y
	dey
	lda restorebuf,y
	sta (mscr2),y
	dey
	bmi end
	lda mscr2
	sbc #38
	sta mscr2
	bcs loop_delete
	dec mscr2+1
	sec
	bcs loop_delete
end		
I don't think it can get better then this unless you go with unrolled code and take couple of Kb for it...
200 x [
	lda restorebuf,y
	sta screen+(scanline*40),x
	iny
	lda restorebuf,y
	sta screen+(scanline*40)+1,x
	iny
]
Put a RTS instead of INY in a proper place and jump at the right line (Unrolled_code+14*Mouse_y) ...

Much, much faster... but 14x200 = 2800 bytes.... That hurts...

Looking more closely at your first code snippet, the snag is that PIXOFF is a variable offset into the restore buffer. So, depending upon the position of the mouse, we can't assume an initial index value of 31. This would require the routine which saves the background to be coded differently, and use yet another index into a fixed-length 32 byte background buffer. As it stands - for the sake of speed - the background is stashed using the same indexes into the pre-shifted mouse pointer bitmaps.

PIXOFF is derived from OLDPIXTAB+(MOUSEX MOD :

oldpixtab
.byte 0,32,64,96,128,160,192,224

So the initial index into the background buffer can be anything from 32 to 224 (this is also the index into the pre-shifted pointer bitmaps, each 32 bytes long).

So to use your method, we require - I think - linear addressing of the background buffer, covering 32 bytes.

In any case, prior to attempting that, this is what I'm working with:

erase_pointer ; x already points to last byte of restore buffer; y is set to 1
lda #mouseheight
sta mtmp5 ; reset line counter
restore_loop2
lda restorebuf,x
sta (mscr),y
dey
dex
lda restorebuf,x
sta (mscr),y
dec mtmp5
beq erase_done
iny ; bump y back up to 1
dex
lda mscr
sec
sbc #40
sta mscr
bcs restore_loop2
dec mscr+1
bne restore_loop2

This makes use of the exit values of MSCR, X and Y on completion of the render loop. It works well, although it isn't significantly quicker. I've bypassed redundant subtraction at the end (also in the render loop).

If we look at the render loop again:

	ldx pixoff
	lda #mouseheight
sta mtmp5 ; line counter
drawloop
ldy #0
lda (mscr),y
sta restorebuf,x
and masks,x
ora shapes,x
sta (mscr),y
iny
inx
lda (mscr),y
sta restorebuf,x
bit mousexhi
bmi noplotright
and masks,x
ora shapes,x
sta (mscr),y
noplotright
dec mtmp5
beq erase_pointer
inx
lda mscr
clc
adc #40
sta mscr
bcc drawloop
	inc mscr+1
bne drawloop

If we can use Y as an index into the restore buffer (via manipulation of MSCR), your erase routine becomes viable. How about:

 	ldx pixoff
ldy #0
bit mousexhi
bmi drawloop2

drawloop
lda (mscr),y
sta restorebuf,y
and masks,x
ora shapes,x
sta (mscr),y
iny
inx
lda (mscr),y
sta restorebuf,y
and masks,x
ora shapes,x
sta (mscr),y
cpy #31
beq erase_pointer
inx
iny
lda mscr
clc
adc #38
sta mscr
bcc drawloop
	inc mscr+1
bne drawloop

drawloop2 ; come here when only drawing left byte of pointer
lda (mscr),y
sta restorebuf,y
and masks,x
ora shapes,x
sta (mscr),y
iny
inx
lda (mscr),y
sta restorebuf,y
cpy #31
beq erase_pointer
inx
iny
lda mscr
clc
adc #38
sta mscr
bcc drawloop2
	inc mscr+1
bne drawloop2
	
erase_pointer ; y is set to 31

restore_loop2
lda restorebuf,y
sta (mscr),y
dey
lda restorebuf,y
sta (mscr),y
dey
bmi erase_done
lda mscr
sec
sbc #38
sta mscr
bcs restore_loop2
dec mscr+1
bne restore_loop2

erase_done

This works. We could go further and start Y at 31 in the draw loop and use it to count down, using BMI to check for completion. This would mean the restore buffer was addressed in reverse, and would require some alteration of the routine which calculates the initial value of MSCR. Unfortunately the benefit of that would be offset by having to count UP again in the erase loop.

Someone want to count the cycles used in this version (I'm not absolutely sure how to use the Altirra profiler for this)?

EDIT: amended to avoid the branch for the right-hand byte.

Edited June 10, 2011 by flashjazzcat

popmilo · June 10, 2011

...

This works. We could go further and start Y at 31 in the draw loop and use it to count down, using BMI to check for completion. This would mean the restore buffer was addressed in reverse, and would require some alteration of the routine which calculates the initial value of MSCR. Unfortunately the benefit of that would be offset by having to count UP again in the erase loop.

Someone want to count the cycles used in this version (I'm not absolutely sure how to use the Altirra profiler for this)?

EDIT: amended to avoid the branch for the right-hand byte.

This looks OK.

To get rid of those INX and CPY in draw loop, you would need additional calculations in prepare-part...

Should calculate precisely how much is which method...

Just use that Change COLBK before and after DLI to estimate how much time routine takes...

Altirra profile looks interesting but haven't figured out how to select part of code to be analyzed ?

There is a mention of label files and how to make them with compiler...

I'm reading Altirra help file now - for the first time

flashjazzcat · June 10, 2011

This looks OK.

To get rid of those INX and CPY in draw loop, you would need additional calculations in prepare-part...

Should calculate precisely how much is which method...

Just use that Change COLBK before and after DLI to estimate how much time routine takes...

Altirra profile looks interesting but haven't figured out how to select part of code to be analyzed ?

There is a mention of label files and how to make them with compiler...

I'm reading Altirra help file now - for the first time

I think I'll follow suit and read the help. I know the address of the DLI entry point, and this shows up in the profiler - but so do all the addresses of the subsequent instructions in the routine. Hard to decypher.

It occurs to me that the routine is so flexible that we could even allow the user to choose whether he wants flicker-free mouse operation, or faster redraws with the mouse pointer disabled: simply clearing the interrupt enable bit does the job.

potatohead · June 10, 2011

This looks OK.

To get rid of those INX and CPY in draw loop, you would need additional calculations in prepare-part...

Should calculate precisely how much is which method...

Just use that Change COLBK before and after DLI to estimate how much time routine takes...

Altirra profile looks interesting but haven't figured out how to select part of code to be analyzed ?

There is a mention of label files and how to make them with compiler...

I'm reading Altirra help file now - for the first time

I think I'll follow suit and read the help. I know the address of the DLI entry point, and this shows up in the profiler - but so do all the addresses of the subsequent instructions in the routine. Hard to decypher.

It occurs to me that the routine is so flexible that we could even allow the user to choose whether he wants flicker-free mouse operation, or faster redraws with the mouse pointer disabled: simply clearing the interrupt enable bit does the job.

If that doesn't cost much in the overall scheme of things, you have a winner! Could be a given application could offer that option as well.

flashjazzcat · June 10, 2011

Well, I can't actually find any usage notes for the profiler. If I'm using it correctly (function sampling), then we have 3,384 cycles used per redraw, which equates to 169,200 per second on a PAL machine. A little more costly on NTSC, of course. I think it can be brought down to 150,000 without loop unrolling. Given that the "old" system performed some simple comparisons to establish whether the mouse had moved every VBLANK - regardless of whether a redraw was required - this isn't an actual increase of 169,000 cycles. Indeed, the new VBL routine can be optimized, and the Pokey mouse sampling is somewhat more consistent than the old DLI sampling method, so I've been able to knock the frequency down to about 400 Hz while maintaining very smooth and quick pointer movement.

Of course, using a Pokey timer with the old method would have made that more efficient too. However, we're approaching an acceptable compromise in terms of balancing efficiency and aesthetics.

Quite apart from user preference, it seems sensible that an application (such as a word processor) whose full window refresh is time critical should simply turn off the mouse during the redraw. A flag should also be implemented to tell the event handler whether non-client area events requiring redraws should disable the mouse pointer.

Flicker's gone, by the way: I just bumped the DL offset to 1 if Y=0. Not a single STA WSYNC in sight.

I'll release an executable as soon as I can get the back window to come to the front.

Edited June 10, 2011 by flashjazzcat

flashjazzcat · June 10, 2011

Well - here it is - the first demo:

gui_10_06_11.zip

Guidelines and notes:

Run this with BASIC off and no carts present (SDX users use "X GUI.XEX"). You can boot the file with an XEX loader (for example, in an emulator), since it doesn't yet require DOS.
You need a mouse on port 2.
Use File->New to open more windows on the desktop. They're all called "C:>*.*" at the moment.
Use Tools->Hide Mouse to toggle mouse pointer hiding when doing redraws. When the option is ticked, the mouse pointer will disappear and flicker, etc. Personally I can hardly tell the difference performance wise.
Single-click registering is very slow: this is just a tweaking issue I was too tired to rectify this evening. To bring a back window to the front or use the maximize button (which doesn't toggle yet), you'll need to hold the left button down for a good second.
You shoudn't need to single-click a back window before dragging its scroll handle or size box, etc, but you do at the moment. This is because I haven't finished the buffered event pipe yet.
You can't close windows down yet.
The desktop icons aren't "live" yet (i.e. they don't do anything).
Don't be surprised if you manage to make a window wrap around and create screen garbage. The boundary checks are rough at the moment.
The border flash is part of some debug code.
Please no complaints about full desktop redraws. I'm well aware of what needs to be done. There are no back buffers or any "smart" redraws at the moment. That comes later.
There are several really terrific bugs just waiting to be discovered.

There's a very odd bug which I need to track down which creates all kinds of problems when new code gets inserted or old code removed. I figure there's some rogue write going on in RAM which does or doesn't upset something important depending on the size of the code. Gonna be fun tracking that one down.

I've probably forgotten to mention some other salient points. Hopefully I'll get some feedback over the weekend.

I optimized the VBL and slashed a load of cycles out of it this evening: it now uses the font renderer bit-shift lookup table to divide MOUSE_X by 8. I also created a lookup table for the dynamic DLI to save doing computations on MOUSE_Y in relation to the display list. This has probably saved several thousand cycles per second.

Lots of changes in the pipeline, so hopefully this will be the first of many regular updates.

Heh... I've already found another bug.

Edited June 10, 2011 by flashjazzcat

+Stephen · June 10, 2011

Damn it - of all the times to not know where my ST mouse is. CMI08 adapter - I think that will work :twisted:

hitchcock4 · June 11, 2011

Well - here it is - the first demo:

gui_10_06_11.zip

Can someone post a video too, for those of us temporarily away from the 8-bit?

Thanks, sounds great.

flashjazzcat · June 11, 2011

I'd post one, but I'm on the laptop and recording is jittery. Seriously... check out the last video posted in the thread. It's reasonably representative.

+MrFish · June 11, 2011

Indeed, the new VBL routine can be optimized, and the Pokey mouse sampling is somewhat more consistent than the old DLI sampling method, so I've been able to knock the frequency down to about 400 Hz while maintaining very smooth and quick pointer movement.

The pointer looks smooth. Nice job so far. The only thing I noticed is that with the new sampling rate I can get ahead of the pointer if I move the mouse too quickly... more so than in the previous demo.

fibrewire · June 11, 2011

Looks awesome! I can't wait to stuff this on a I have no idea what I intended to say here, that was a pre-coffee moment

Would standardizing on one platform (130XE) and minimal feature set (ROM only cart) make the development any easier?

flashjazzcat · June 11, 2011

The pointer looks smooth. Nice job so far. The only thing I noticed is that with the new sampling rate I can get ahead of the pointer if I move the mouse too quickly... more so than in the previous demo.

I guess it can be tweaked some. I thought the sampling was perhaps a bit heavy in the previous version, but like virtually everything it could be made user-tunable if necessary.

Would standardizing on one platform (130XE) and minimal feature set (ROM only cart) make the development any easier?

Yes it would. The primary reason for exploring the RAM-cart was that it provides a way to run "legacy" apps from the desktop without worrying that they'll clatter anything that the GUI has already loaded up into extended memory. The GUI would use the cart RAM exclusively, and wouldn't care whether there was any extended on-board RAM or not. However, this is a much more difficult proposition and will take some time to get right. I think the sensible plan is to target generic flash carts and machines with 128KB+ first. This also provides the widest possible tester-base.

popmilo · June 11, 2011

...

Use Tools->Hide Mouse to toggle mouse pointer hiding when doing redraws. When the option is ticked, the mouse pointer will disappear and flicker, etc. Personally I can hardly tell the difference performance wise.

...

I also can't see difference in speed of window redraw, but the pointer definitely looks better without "hide-mouse".

That drawing of the mouse in Dli while raster beams across that part of screen sure is one of the best routines I have seen this year

Great job!

flashjazzcat · June 11, 2011

I also can't see difference in speed of window redraw, but the pointer definitely looks better without "hide-mouse".

That drawing of the mouse in Dli while raster beams across that part of screen sure is one of the best routines I have seen this year

Great job!

Many thanks indeed!

Thanks also for your help and to those who provided the original ideas and suggestions.

This revised pointer draw is sure contentious if it does prove to be too cycle-hungry, but as you say - I struggle to see the difference. When you select "hide mouse", that DLI isn't firing at all while things redraw.

But the DLI pointer redraw has really spiced up this project for me, and it gave me the incentive to release the demo. That's surely got to be a good thing, since - after all - the whole project is worthless with either mouse redraw method unless real progress is being made.

+MrFish · June 11, 2011

The pointer looks smooth. Nice job so far. The only thing I noticed is that with the new sampling rate I can get ahead of the pointer if I move the mouse too quickly... more so than in the previous demo.

I guess it can be tweaked some. I thought the sampling was perhaps a bit heavy in the previous version, but like virtually everything it could be made user-tunable if necessary.

Even on the previous demo I could easily get ahead of the sampling, although not unusably so. On this version it seems to be crossing the border.

flashjazzcat · June 11, 2011

Even on the previous demo I could easily get ahead of the sampling, although not unusably so. On this version it seems to be crossing the border.

I'll crank up the sampling and see if this buggers up the new pointer rendering.

+Philsan · June 11, 2011

So it's not fake Jon!

Gorgeous on real hardware!

I found a problem with my trackball mouse.

If I don't move the ball slowly the pointer doesn't move.

BTW, I love to use Best Electronics Atari/Amiga track ball.

It distinguishes my Atari from nowadays and 16 bit computers.

flashjazzcat · June 11, 2011

So it's not fake Jon!

Gorgeous on real hardware!

I found a problem with my trackball mouse.

If I don't move the ball slowly the pointer doesn't move.

BTW, I love to use Best Electronics Atari/Amiga track ball.

It distinguishes my Atari from nowadays and 16 bit computers.

Heh... Symbos was (understandably) considered fake for a long time. However, I've used that and it's real too.

I love running the GUI on my XE with SC1435 monitor. The tube on that set is outstanding (with the high gloss finish).

Anyway, mouse sampler has been given "the treatment":

mouse ; pokey interrupt-driven mouse routine
txa
pha
tya
pha
mousea
ldx $d300
lda shift_table+1024,x ; shift right by 4 bits
and #15 ; still needed because shift table is really a table of rotated values
tay ; save value
and #3
ora oldx
tax
lda mousetab3,x
sta oldx
lda mousetab,x
bmi mousy ; action?
bne mouse1
mouse0 ; increment x
lda mousex
cmp mousexmax
lda mousex+1
sbc mousexmax+1
bcs mousy
inc mousex
bne mousy
inc mousex+1
bne mousy
mouse1 ; decrement x
lda mousexmin
cmp mousex
lda mousexmin+1
sbc mousex+1
bcs mousy
lda mousex
bne *+4
dec mousex+1
dec mousex
mousy
lda mousetab2,y
ora oldy
tax
lda mousetab3,x
sta oldy
ldy mousey
lda mousetab,x
bmi mousexit
bne mouse2
cpy mouseymax ; increment y
bcs mousexit
iny
bne mousexit
mouse2 ; decrement y
cpy mouseymin
bcc mousexit
beq mousexit
dey
mousexit
sty mousey
pla
tay
pla
tax
pla
rti
;
; mouse index table

mousetab
.byte 255,1,0,255,0,255,255,1,1,255,255,0,255,0,1,255
;
mousetab2
.byte 0,0,0,0,1,1,1,1
.byte 2,2,2,2,3,3,3,3

mousetab3
.byte 0,4,8,12
.byte 0,4,8,12
.byte 0,4,8,12
.byte 0,4,8,12

I got rid of a bunch of left and right bit shifts using the newly introduced MOUSETAB2 and MOUSETAB3. Don't you just love LUTs? It can probably be attacked further... all that argy-bargy at the top is just to get the upper four bits into a lower four bit range. I suppose it would be quicker to use stick 0...

No idea why the bring-to-front routine was sending a re-draw message to the desktop, other than I had coded it that way! I've put that right now. It looks like this, for those interested:

lda #0	; bring window to the front
jsr object_order ; 0 in Acc means move to last position in linked list
lda object
sta active_window_handle
ldx object+1
stx active_window_handle+1
lda #MESSAGE.DRAW
jsr send_message

Here's the updated demo. Sampling rate is a little higher. Phil - let us know if this helps with the trackball issues.

gui_11_06_11.zip

Video as requested:

http://www.youtube.com/watch?v=aXRBy_dEDB8

Edited June 11, 2011 by flashjazzcat

+Philsan · June 11, 2011

Just tested new version with trackball and Atari ST1 mouse too (both new).

Exactly the same issue (perhaps little differences).

If you move the mouse or the ball quickly, the pointer doesn't move (in fact it moves a bit in all directions).

+JAC! · June 11, 2011

Thanks for releasing your GUI demo version. It's really looking great. Regarding the "Hiding the mouse durinng redraw" I have to say it looks strange and "unstable" then. The mouse pointer should be fixed. Please add a short segment to the start of file to switch BASIC off. It's always annoying to toggle BASIC manually if the program can do it. Regarding the DLI/VBI routine I see some potential to save more cycles. The SEI in the VBI is not required (you're in NMI, it's set anyway). The CLI at then end, well don't know if you really need the IRQ enabled for sampling during the rest of the VBI. Also double indexed access can be shorter by using "Y,X"

ldy $c7
lda $6b79,y
tay
lda $9eee,y
=>
ldx $c7
ldy $6b79,x
lda $9eee,y

2609    INY                   ; 2cyc ; C8
260A    INX                   ; 2cyc ; E8
...
2618    CPY #$1F              ; 2cyc ; C0 1F
261A    BEQ $2652             ; 2cyc ; F0 36
261C    INX                   ; 2cyc ; E8
261D    INY                   ; 2cyc ; C8
261E    LDA $54 ;ROWCRS       ; 3cyc ; A5 54
2620    CLC                   ; 2cyc ; 18
2621    ADC #$26              ; 2cyc ; 69 26
2627    INC $55 ;COLCRS       ; 5cyc ; E6 55
2629    BNE $25FC             ; 2cyc ; D0 D1


Since Y is counting up from 0, carry will always be cleared already, so CLC can be removed from the loop.
Also the loop cross a page boundary with BNE which will cost 4 cycles instead of 3.
You can also avoid the double INX be arranging the mouse AND/OR masks as

AAAAAAA
BBBBBBB

instead of

ABABABAB
ABABABAB

I'm into VCS coding recently, so I'm sensitive to things like this - but don't let yourself be distracted by my comments. And of course you should leave enough things unoptimized to give the sales people arguments why we should buy the upgrade ;-;

+JAC! · June 11, 2011

And for the mouse routine:

- You do not need to use Y and save the time for push/pop

- CMP A,B/BCC/BEQ => CMP B,A/BCS

- AND #3 not required, the LUT is 16 bytes, right?

org $2000

mouse ; pokey interrupt-driven mouse routine
       txa
       pha
;        tya
;        pha
mousea
       ldx $d300
       lda shift_table+1024,x ; shift right by 4 bits
       and #15 ; still needed because shift table is really a table of rotated values
       pha ; save value
;        and #3 Required?
       ora oldx
       tax
       lda mousetab3,x
       sta oldx

       lda mousetab,x
       bmi mousy ; action?
       bne mouse1
mouse0 ; increment x
       lda mousex
       cmp mousexmax
       lda mousex+1
       sbc mousexmax+1
       bcs mousy
       inc mousex
       bne mousy
       inc mousex+1
       bne mousy
mouse1 ; decrement x
       lda mousexmin
       cmp mousex
       lda mousexmin+1
       sbc mousex+1
       bcs mousy
       lda mousex
       bne *+4
       dec mousex+1
       dec mousex

mousy	pla
tax
       lda mousetab2,x
       ora oldy
       tax
       lda mousetab3,x
       sta oldy

       lda mousetab,x
       bmi mousexit
       bne mouse2
       lda mousey
       cmp mouseymax ; increment y
       bcs mousexit
       inc mousey
       bne mousexit
mouse2 ; decrement y
lda mouseymin
       cmp mousey
       bcs mousexit
       dec mousey
mousexit
;        pla
;        tay
       pla
       tax
       pla
       rti

flashjazzcat · June 11, 2011

Just tested new version with trackball and Atari ST1 mouse too (both new).

Exactly the same issue (perhaps little differences).

If you move the mouse or the ball quickly, the pointer doesn't move (in fact it moves a bit in all directions).

OK Phil - thanks for the feedback. I guess I sampled at 600Hz in the original version for a reason.

Please add a short segment to the start of file to switch BASIC off. It's always annoying to toggle BASIC manually if the program can do it.

Good idea - will do.

Regarding the DLI/VBI routine I see some potential to save more cycles. The SEI in the VBI is not required (you're in NMI, it's set anyway). The CLI at then end, well don't know if you really need the IRQ enabled for sampling during the rest of the VBI. Also double indexed access can be shorter by using "Y,X"
ldy $c7
lda $6b79,y
tay
lda $9eee,y
=>
ldx $c7
ldy $6b79,x
lda $9eee,y

2609    INY       			; 2cyc ; C8
260A    INX       			; 2cyc ; E8
...
2618    CPY #$1F              ; 2cyc ; C0 1F
261A    BEQ $2652 			; 2cyc ; F0 36
261C    INX       			; 2cyc ; E8
261D    INY       			; 2cyc ; C8
261E    LDA $54 ;ROWCRS   	; 3cyc ; A5 54
2620    CLC       			; 2cyc ; 18
2621    ADC #$26              ; 2cyc ; 69 26
2627    INC $55 ;COLCRS   	; 5cyc ; E6 55
2629    BNE $25FC 			; 2cyc ; D0 D1


Since Y is counting up from 0, carry will always be cleared already, so CLC can be removed from the loop.
Also the loop cross a page boundary with BNE which will cost 4 cycles instead of 3.
You can also avoid the double INX be arranging the mouse AND/OR masks as

AAAAAAA
BBBBBBB

instead of

ABABABAB
ABABABAB
I'm into VCS coding recently, so I'm sensitive to things like this - but don't let yourself be distracted by my comments. And of course you should leave enough things unoptimized to give the sales people arguments why we should buy the upgrade ;-;

Thanks - some great ideas there. Clearly many hands make light work. I'd completely forgotten about LDY $nnnn,x - I know the addressing modes are inconsistently applied, but missing that one was my mistake.

And for the mouse routine:

- You do not need to use Y and save the time for push/pop

- CMP A,B/BCC/BEQ => CMP B,A/BCS

- AND #3 not required, the LUT is 16 bytes, right?

org $2000

mouse ; pokey interrupt-driven mouse routine
       txa
       pha
;        tya
;        pha
mousea
       ldx $d300
       lda shift_table+1024,x ; shift right by 4 bits
       and #15 ; still needed because shift table is really a table of rotated values
       pha ; save value
;        and #3 Required?
       ora oldx
       tax
       lda mousetab3,x
       sta oldx

       lda mousetab,x
       bmi mousy ; action?
       bne mouse1
mouse0 ; increment x
       lda mousex
       cmp mousexmax
       lda mousex+1
       sbc mousexmax+1
       bcs mousy
       inc mousex
       bne mousy
       inc mousex+1
       bne mousy
mouse1 ; decrement x
       lda mousexmin
       cmp mousex
       lda mousexmin+1
       sbc mousex+1
       bcs mousy
       lda mousex
       bne *+4
       dec mousex+1
       dec mousex

mousy	pla
tax
       lda mousetab2,x
       ora oldy
       tax
       lda mousetab3,x
       sta oldy

       lda mousetab,x
       bmi mousexit
       bne mouse2
       lda mousey
       cmp mouseymax ; increment y
       bcs mousexit
       inc mousey
       bne mousexit
mouse2 ; decrement y
lda mouseymin
       cmp mousey
       bcs mousexit
       dec mousey
mousexit
;        pla
;        tay
       pla
       tax
       pla
       rti

Not sure about AND #3. The idea is that the upper two bits are combined with the lower two bits to form an index into the direction table. Without the AND, surely the upper two bits might be set when they're supposed to be clear.

Other than that - great suggestions - thanks!

EDIT: This "stuck mouse" issue is a real headache. Throwing a higher sampling rate at it does not help. It's possible to duplicate the problem with the older, DLI-based sampling, but it's much less prevalent. The issue tends not to show up in emulation, but is quite bad on real hardware. Quite why bunching all the sampling up during the screen redraw resulted in smoother motion, I'm not sure. I would have thought Pokey sampling would have been far more evenly spread.

Suggestions welcome.

Edited June 11, 2011 by flashjazzcat

flashjazzcat · June 11, 2011

...With regard disk I/O, I'm fairly certain the best solution will be to invoke the old VBI mouse renderer. This will at least keep the mouse visible, and even mobile as long as the stage 1 VBL and Pokey timers are running. The DLI renderer is far to intensive to leave running during serial I/O, and unless we turn it off, we'll end up with a flicker fest at best. We can simply wrap calls to CIO with switches to and from the old method. Either that, or turn the pointer off altogether during I/O. I'd rather have it visible, though. Even if the VBL or mouse sampling don't fire using the old method, the worse we'll get is limited/stilted movement during I/O. Supporting both methods means a little more code space devoted to interrupt routines, but we get the best of both worlds, and a mouse pointer we can move while apps are loading up.

+David_P · June 11, 2011

... and will there be a handler for a joystick, as well? Some folks don't have ST mouses, but do have a large collection of sticks...

New GUI for the Atari 8-bit

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members