Jump to content
IGNORED

New GUI for the Atari 8-bit


flashjazzcat

Recommended Posts

Yeah, we mentioned that OS earlier in that thread.

 

I played with it in an emulator, it's stunning, it really feels like working with Windows 9x ... amazing what such a small computer and assembler language is capable of.

 

On the Fujiama 2010, someone brought a real APE with Symbos running: http://andymanone.dyndns.org/atarixle/event/fuji2010/data/images/img_0834.jpg and http://andymanone.dyndns.org/atarixle/event/fuji2010/data/images/img_0835.jpg ... you know, it's even more amazing seeing this on the real machine :)

Link to comment
Share on other sites

Yeah, we mentioned that OS earlier in that thread.

 

I played with it in an emulator, it's stunning, it really feels like working with Windows 9x ... amazing what such a small computer and assembler language is capable of.

 

On the Fujiama 2010, someone brought a real APE with Symbos running: http://andymanone.dy...es/img_0834.jpg and http://andymanone.dy...es/img_0835.jpg ... you know, it's even more amazing seeing this on the real machine :)

Absolutely amazing. I had a play with Symbos on an emulator the other week and I was blown away by it.

 

Finally got the mouse driver working off a Pokey timer. I suffered a hell of a lot of baffling crashes until I discovered the OS pushes the accumulator onto the stack before jumping through the interrupt vector. :ponder:

 

Anyway, this leaves me free to have another crack at the DLI mouse drawing routine I tried but couldn't get working a while ago (Re: Format War). I was playing with the Amiga over the weekend and I fancy a simulated hardware sprite. If the DLIs don't have to worry about mouse sampling, I can simply calculate the mouse screen address in the VBL, which will also set bit 7 of the DL entry on the appropriate line.

 

Once it's working, I'll be doing a bit of cycle counting to see which method is the most efficient.

Edited by flashjazzcat
Link to comment
Share on other sites

It's movie time (again):

 

http://www.youtube.com/watch?v=e0qBWDjprEA

 

This is a bare-bones version of the mouse driver from months back, rewritten to sample the mouse via a Pokey timer, and render the mouse using a DLI. This idea was first suggested by andym00, and at the time I felt it would be more cycle-intensive than the original method (which drew the mouse during the VBI, so that it was effectively "visible" to underlying screen redraw operations). This may still be the case, but I felt the urge to test it properly. The least cycle-hungry method will win.

 

What we're seeing in the video is a region of the screen being inverted using LDA/EOR/STA operations without any regard for the position of the mouse pointer. The application ignores the mouse pointer because it doesn't exist outside of the blocking DLI code. The VBL sets up the screen address pointer for the next mouse redraw, and sets the interrupt bit on the appropriate line in the display list (this part needs refinement to deal with the offsets caused by LMS operations, etc). The DLI then retrieves the mouse shape (pre-rendered in all eight different bit-shifted positions), draws it to the screen, then immediately erases it again once the electron beam has drawn the last line of the pointer. Currently the code just "happens" to be long enough not to require any STA WSYNC waits (to avoid areas of the mouse being erased before they have been seen).

 

Doubtless this is a more aesthetically pleasing approach, although I've yet to incorporate it into a test version of the full-fat GUI. There are also potential issues with disk I/O to be considered (if the DLI code doesn't execute, the mouse will vanish, or flicker like hell unless the DLI is explicitly turned off first, which is an argument against using this method, given that I'd wanted the mouse to remain mobile and visible during all but the most time-critical SIO operations).

Link to comment
Share on other sites

...This is a bare-bones version of the mouse driver from months back, rewritten to sample the mouse via a Pokey timer, and render the mouse using a DLI. This idea was first suggested by andym00, and at the time I felt it would be more cycle-intensive than the original method (which drew the mouse during the VBI, so that it was effectively "visible" to underlying screen redraw operations). This may still be the case, but I felt the urge to test it properly. The least cycle-hungry method will win.

...

Hey, that looks really "hardware like". :thumbsup:

Hope it doesn't mess with disko i/o too much...

Link to comment
Share on other sites

Hey, that looks really "hardware like". :thumbsup:

Hope it doesn't mess with disko i/o too much...

What I could do with is a means of profiling the cycle consumption, although I guess I could quite easily work it out on paper. The "other" method was extremely light, especially when there was no mouse movement (basically, most of the DLI and VBL code didn't fire at all). The proof will be in the pudding, though, so to speak. It feels really nice on real hardware. The most obvious optimisation will be extreme tightening of the DLI code itself:

 

draw_mouse ; DLI mouse renderer
pha
txa
pha
tya
pha
lda mscr
pha
lda mscr+1
pha
ldx pixoff
lda #mouseheight
sta tmp5 ; line counter
drawloop
ldy #0
lda (mscr),y
sta restorebuf,x
and masks,x
ora shapes,x
sta (mscr),y
iny
inx
lda (mscr),y
sta restorebuf,x
and masks,x
ora shapes,x
sta (mscr),y
inx
lda mscr
clc
adc #40
sta mscr
bcc *+4
inc mscr+1
dec tmp5
bne drawloop
pla
sta mscr+1
pla
sta mscr
ldx pixoff
lda #mouseheight
sta tmp5
eraseloop
ldy #0
lda restorebuf,x
sta (mscr),y
iny
inx
lda restorebuf,x
sta (mscr),y
inx
lda mscr
clc
adc #40
sta mscr
bcc *+4
inc mscr+1
dec tmp5
bne eraseloop
pla
tay
pla
tax
pla
rti

Edited by flashjazzcat
Link to comment
Share on other sites

Hate to say this, due to overall code size, but it would be interesting to see how falling back to the other method during SIO compares to just letting this one take the hit.

 

Looks great BTW. Seems to me, this simulated hardware method is superior to the other one.

Link to comment
Share on other sites

Hate to say this, due to overall code size, but it would be interesting to see how falling back to the other method during SIO compares to just letting this one take the hit.

 

Looks great BTW. Seems to me, this simulated hardware method is superior to the other one.

How this will behave "unfettered" during I/O is anyone's guess. We have Pokey timer, DLI, VBL - all working together to make that mouse move. This is all well and good in a game environment, but making it work with DOS is another matter. I recall pointing all that out at the time.

 

Using this method implies I get rid of the masked screen draw routine (used by the code which inverts areas of the screen, and called - for example - by the menu handler) and enjoy the speed benefits that brings. Having screen writes which use both methods introduces extra overhead. However, nothing should be redrawing under the mouse while disk I/O is happening, so I suppose the stage 1 VBL could take charge. I suppose that will all come out in the wash.

 

If I'm using Altirra's profiler correctly, the above code equates to around 500 instructions, or roughly 3750 cycles per frame.

Edited by flashjazzcat
Link to comment
Share on other sites

Interesting...

 

Well, one downside is the mouse consumes the timer, VBL, etc... Maybe that won't matter for applications, but it could. That and the overall cycle count. Is it lower for the other method?

 

If so, that's probably worth a lot of consideration.

 

IMHO, there are GUI environments that look cool, and there are ones that work, sometimes both happen. Having one that works well, trumps looking cool, IMHO.

Link to comment
Share on other sites

Simpler is usually better. This seems kind of complicated in comparison, in terms of interactions with the system. I take it the problem with the other method is you have to deal with the pointer image bits being in the screen during redraws and so forth?

Agreed regarding simplicity. There's no problem as such with the other method; it simply means one has two options when writing to the screen: turn the mouse pointer off, or draw around the mouse. The system is capable of both, although obviously drawing around the mouse (with the masking that involves) is much slower, and is therefore reserved for those occasions where mouse flicker would be intolerably distracting (such as scrolling the mouse over the options of a drop down menu). The ST appears to use a very similar approach, with mouse flicker during some operations, but not during others. So naturally, any operation which currently draws around the mouse would benefit greatly from the "new" method. The downside is a uniform, constant performance hit with regard to interrupts. It's a complex task to balance this interrupt overhead with the work we save by not removing and redrawing the mouse either side of a screen update in mainline code, or drawing around it.

 

Interesting...

 

Well, one downside is the mouse consumes the timer, VBL, etc... Maybe that won't matter for applications, but it could. That and the overall cycle count. Is it lower for the other method?

 

If so, that's probably worth a lot of consideration.

 

IMHO, there are GUI environments that look cool, and there are ones that work, sometimes both happen. Having one that works well, trumps looking cool, IMHO.

With the "old" method, the mouse was sampled in a DLI but rendered in the VBL. With the new method, instead of a dozen or so DLI calls per frame, we have one, although it's longer. The new method has a much shorter VBL, of course, although the benefit of that is minimal. And with the new method, we have taken the work previously done by the DLI (sampling), and delegated it to a Pokey timer interrupt.

 

Looking at it another way, the work previously done by the VBL (erasing the pointer and drawing it in the new position - code which isn't executed at all if the mouse is stationary), is being executed 50-60 times more frequently. More so, if you consider that the redraw occurs once per frame, even if the mouse is stationary.

 

So really, the old method must surely be more cycle efficient. However, I've had to actually action the new method in order to reach these conclusions. It pretty much backs up the original points I made when Andy suggested the DLI renderer. However, it was a valid suggestion, and it's been fun exploring the possibilities.

Link to comment
Share on other sites

Yeah, true that. Never hurts to give things a go.

 

I like the solid drawing, but I don't think some mouse artifacts are a big deal overall. Seems to me, the PM would be just cake, but... it's not a pixel perfect alignment, would be coarse for picking and movement horizontally. Too bad, because a nice, red mouse, on that great would rock!

 

(SGI used a red mouse, and I loved it)

 

Anyway, I'm just following with interest, chattering where there is some value. Nice work. I'm really enjoying you sharing the development. Impressive to me, given where it's at, and the limitations you are working under. Rock on!

 

If the code doesn't run with the old method, when the mouse is not moved, it seems to me, something like the last word could really use the cycles to keep up on text edits.

 

So then, with the old method, a application could just ask for the mouse to be drawn, the moment it has finished doing it's thing, right?

Edited by potatohead
Link to comment
Share on other sites

Yeah, true that. Never hurts to give things a go.

Looking again at what I wrote above regarding machine cycles, I realize it was complete rubbish. Must have had a brain-fade moment.

 

Let's reiterate: the "old" method does a mouse re-draw once per frame if the mouse is moving (and then the coordinates only change every VBLANK if the mouse is travelling really quickly). This redraw is still performed once per frame, but now via a DLI. So really, the workload there is about the same, if we disregard the redundancy of redrawing a stationary pointer for a moment.

 

The sampling has been delegated to a Pokey timer interrupt. No real change there either (apart from possible SIO side-effects; likewise with the DLI-based rendering). The sampling is always a constant hit.

 

So really, it's not quite as clear-cut as my previous confused conclusion.

 

Anyway, I'm just following with interest, chattering where there is some value. Nice work. I'm really enjoying you sharing the development. Impressive to me, given where it's at, and the limitations you are working under. Rock on!

Thanks. It's alternately great fun and completely daunting. :) Always rewarding, though.

 

If the code doesn't run with the old method, when the mouse is not moved, it seems to me, something like the last word could really use the cycles to keep up on text edits.

Well, by my calculations, we're looking (with the "new" method) at a fixed cost of 187,500 cycles for PAL and 225,000 cycles for NTSC. We make some modest savings elsewhere, on balance.

 

So then, with the old method, a application could just ask for the mouse to be drawn, the moment it has finished doing it's thing, right?

The application doesn't usually have to worry about hiding/revealing the mouse unless it really wants to. Most of the time, you send a redraw message to an object tree, and the mouse is hidden at the top level, and revealed again when the entire sub-tree is redrawn. However, a text editor, which manually redraws its window, would naturally call "mouse_off" prior to drawing the document.

 

Despite prior brain-fade, I can see where this will end up going. However, it's been fun playing with this DLI renderer and trying to get it right. Who knows - it may have applications elsewhere. There are also some interesting, unresolved challenges:

 

  • Triggering the DLI on the correct line in relation to the pointer's Y coordinate. Too late, and the top of the pointer is chopped off. Too early, and the mouse is partially erased before it's been seen. It's difficult to judge which of these two problems is actually happening when testing (both faults look similar).
  • Getting the timing right when the DLI is triggered by an interrupt above the top of the playfield. I think a bunch of single blank lines will be required following the first two blank instructions.

Obviously, we want the absolute minimum number of STA WSYNCs. Here's what the pointer looks like at a particular "bad" spot at the moment:

 

post-21964-0-34805200-1307637093_thumb.jpg

 

There are several such positions on the screen, where the pointer is clipped.

Edited by flashjazzcat
Link to comment
Share on other sites

...

Despite prior brain-fade, I can see where this will end up going. However, it's been fun playing with this DLI renderer and trying to get it right. Who knows - it may have applications elsewhere.

 

There are also some interesting, unresolved challenges:

 

Triggering the DLI on the correct line in relation to the pointer's Y coordinate. Too late, and the top of the pointer is chopped off. Too early, and the mouse is partially erased before it's been seen.

...

I started thinking about how could we speed-up dli-routine (starting from bottom of pointer and drawing upwards + using only Y for indexing both screen and sprite data)...

But realized that raster beam is drawing sprite in opposite direction :)

 

And if routine is too fast (or triggered to early) it will delete parts of sprite before raster beam passes them :)

 

So I guess not much use of trying to get speed up more then what is really necessary. Quick calc - your main drawing loop is around 77 cycles in most cases, and there are 57 cycles free in bitmap mode scanline...

Kinda tough to calculate where Dli should start...

 

------------------------

Ultimate routine would be 400 blocks of lda-sta-and-ora-sta with x as X-coordinate offset, and self modified code that would jump into appropriate line and jump out when sprite is done ;)

.

..

...

lda screen,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen,x

iny

lda screen+1,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen+1,x

iny

lda screen+40,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen+40,x

iny

lda screen+41,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen+41,x

iny

...

..

.

 

54 cycles per sprite line (<57), and >5Kb long ;)

 

--------------------------------------

Awesome problems to throw around the head at nice afternoon - thanks for challenge FlashJazzcat;)

Link to comment
Share on other sites

...if routine is too fast (or triggered to early) it will delete parts of sprite before raster beam passes them :)

Precisely so. It's a process of trial and error.

 

So I guess not much use of trying to get speed up more then what is really necessary. Quick calc - your main drawing loop is around 77 cycles in most cases, and there are 57 cycles free in bitmap mode scanline...

Kinda tough to calculate where Dli should start...

Since we only have one DLI, the only timing stipulations are that a) a given line of the mouse pointer must be rendered in the video RAM before it's displayed, and b) the mouse shouldn't be erased before the electron beam has displayed the last line. Yes - it's desirable to have the fastest possible routine, but we must not reach the erase portion too fast. Of course, once we're erasing, we can really go all-out for speed. :)

 

Ultimate routine would be 400 blocks of lda-sta-and-ora-sta with x as X-coordinate offset, and self modified code that would jump into appropriate line and jump out when sprite is done ;)

.

..

...

lda screen,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen,x

iny

lda screen+1,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen+1,x

iny

lda screen+40,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen+40,x

iny

lda screen+41,x

sta (restore_buffer),y

and (mask),y

ora (shape),y

sta screen+41,x

iny

...

..

.

 

54 cycles per sprite line (<57), and >5Kb long ;)

 

--------------------------------------

Awesome problems to throw around the head at nice afternoon - thanks for challenge FlashJazzcat;)

I suppose the quicker we draw a given line of the pointer, the later we can trigger the interrupt. We also have a minor code overhead not seen in the code I posted before: the VBL set-up routine sets a flag if the right-byte of the mouse isn't to be drawn, and the right-byte render loop is skipped if bit 7 is set. As for clipping at the foot of the screen: well, we just rely on ROM being above the end of the screen at the moment. :)

 

In any case, DAMN, it looks nice:

 

http://www.youtube.com/watch?v=wXOW6AuuOZA

 

Current bugs: if mouse crosses into lower portion of screen, LMS gets messed up (easy fix). More troubling is the fact that I get screen garbage if I move the mouse to the top-left corner. Bound to be a logical explanation...

 

Here's the current DLI code:

 

draw_mouse
pha
txa
pha
tya
pha
	sei
	ldx pixoff
	lda #mouseheight
sta mtmp5 ; line counter
drawloop
ldy #0
lda (mscr),y
sta restorebuf,x
and masks,x
ora shapes,x
sta (mscr),y
iny
inx
lda (mscr),y
sta restorebuf,x
bit mousexhi
bmi noplotright
and masks,x
ora shapes,x
sta (mscr),y
noplotright
inx
lda mscr
clc
adc #40
sta mscr
bcc *+4
	inc mscr+1
dec mtmp5
bne drawloop

ldx pixoff
lda #mouseheight
sta mtmp1 ; line counter
restore_loop2
ldy #0
lda restorebuf,x
sta (oldscr),y
iny
inx
lda restorebuf,x
sta (oldscr),y
inx
lda oldscr
clc
adc #40
sta oldscr
bcc *+4
inc oldscr+1
dec mtmp1
bne restore_loop2

cli
pla
tay
pla
tax
pla
rti

I doubt any other interrupt (Pokey) can interrupt a DLI - if so, I guess the SEI/CLI pair is useless. OLDSCR gets set up with the same pointer as MSCR by the VBL for quickness.

 

Once the timing is fine-tuned, we can go ahead and do a real-world comparison. It seems pretty fast, though. It looks much better, but at what cost....?

Edited by flashjazzcat
Link to comment
Share on other sites

Looking really fluid... Hardware-like 100%, - I bet it looks even better on real machine :)

 

Nice touch with that no-need-to-draw-right-part, and ignore-write-over-rom :)

 

What sort of "garbage" do you get in upper left corner ? (is it maybe vbi taking too long and catching up with dli ?)

Link to comment
Share on other sites

Looking really fluid... Hardware-like 100%, - I bet it looks even better on real machine :)

 

Nice touch with that no-need-to-draw-right-part, and ignore-write-over-rom :)

 

What sort of "garbage" do you get in upper left corner ? (is it maybe vbi taking too long and catching up with dli ?)

Oddly, garbage problem just went away after I got rid of the masked screen region invert routine (just blasts through with direct screen addressing now, so the menus are a little smoother yet).

 

Just some slight flicker when the mouse is at the top of the screen now. I know for a fact it's to do with triggering the interrupt in the set of blank lines above the playfield. There must be a sweet-spot. What's the cycle usage of a single blank line?

 

If we can drop the cycle overhead to the absolute minimum, I'll consider sticking with this. That's the fun bit: optimisation!

 

STA (MSCR),Y takes what is it... six cycles?

Edited by flashjazzcat
Link to comment
Share on other sites

...

Just some slight flicker when the mouse is at the top of the screen now. I know for a fact it's to do with triggering the interrupt in the set of blank lines above the playfield. There must be a sweet-spot. What's the cycle usage of a single blank line?

...

Here is a very nice info about it:

Antic timing.txt

 

Regular bitmap line has 57 cycles free.

Empty line without dma should have full 114 cycles free.

 

One more thing (excuse me if you already know this ;) ):

Dli starts on last scanline of mode line.

In bitmap mode it means you can put dli on every scanline.

But if you put into your dlist something like this:

 

DLIST DTA 112,112,112+128

 

Dli will start at the 8th of those 8 empty scanlines, one scanline above first bitmap scanline - which is probably not enough. You need to start Dli earlier, and if using 112 for empty part of screen - that means 8 or 16 scanlines above bitmap.

 

I don't know how you calculate where to start Dli but check it out... timing sure is special-case for that part of screen.

 

You could use 0 (single empty scanline mode):

 

DLIST           DTA 0,0,0,0,0,0,0,0
               DTA 0,0,0,0,0,0,0,0
               DTA 0,0,0,0,0,0,0,0

If you add 128 to any of these you get same precision as with bitmap modes...

Link to comment
Share on other sites

...

Just some slight flicker when the mouse is at the top of the screen now. I know for a fact it's to do with triggering the interrupt in the set of blank lines above the playfield. There must be a sweet-spot. What's the cycle usage of a single blank line?

...

Here is a very nice info about it:

Antic timing.txt

Regular bitmap line has 57 cycles free.

Empty line without dma should have full 114 cycles free.

Many thanks - that looks like excellent reading!

 

One more thing (excuse me if you already know this ;) ):

Dli starts on last scanline of mode line.

In bitmap mode it means you can put dli on every scanline.

But if you put into your dlist something like this:

 

DLIST DTA 112,112,112+128

 

Dli will start at the 8th of those 8 empty scanlines, one scanline above first bitmap scanline - which is probably not enough. You need to start Dli earlier, and if using 112 for empty part of screen - that means 8 or 16 scanlines above bitmap.

 

I don't know how you calculate where to start Dli but check it out... timing sure is special-case for that part of screen.

 

You could use 0 (single empty scanline mode):

 

DLIST   		DTA 0,0,0,0,0,0,0,0
               DTA 0,0,0,0,0,0,0,0
               DTA 0,0,0,0,0,0,0,0

If you add 128 to any of these you get same precision as with bitmap modes...

Yes, I knew about the DLI trigger point, so to speak. The GUI uses a custom display list (200 lines of Antic 15) with 20 blank lines at the top instead of 24 to balance the extra height. I already replaced the third 8-blank lines instruction with four single blank line instructions in order to achieve greater control of where the DLI gets triggered "off screen", as it were. Really, the first three or four lines are now the only place where the pointer flickers.

 

I still think, in all honesty, that the cycle cost is a little high, although the system does not "feel" noticeably slower. If I can get each line of the pointer to render in fifty-odd cycles, and come up with some awesomely quick erase routine, we might be getting somewhere.

Edited by flashjazzcat
Link to comment
Share on other sites

...

Yes, I knew about the DLI trigger point, so to speak. The GUI uses a custom display list (200 lines of Antic 15) with 20 blank lines at the top instead of 24 to balance the extra height. I already replaced the third 8-blank lines instruction with four single blank line instructions in order to achieve greater control of where the DLI gets triggered "off screen", as it were. Really, the first three or four lines are now the only place where the pointer flickers.

Could you change background color at the beginning and the end of Dli so we could see how long does it take ?

 

Did you try Altirra's debugger ?

It is an awesome tool - I started pulling my hair out for a few days when I had problems with Nmis and Dlis not wanting to work together...

And then discovered step-by-step debugger of Altirra... :)

Solved a problem in few minutes... (turned out I wrote $40 instead of $04 in one place :) ).

 

I still think, in all honesty, that the cycle cost is a little high, although the system does not "feel" noticeably slower. If I can get each line of the pointer to render in fifty-odd cycles, and come up with some awesomely quick erase routine, we might be getting somewhere.

Here is my take on delete routine (starting from bottom of sprite).

Advantages are:

-not using X register - one less INX = -2 cycles per loop

-Using Y register for loop counter. no need for additional counter = -5 cycles per loop (if you are using Zeropage location for your counter, -6 if you are using absolute).

-Quick jump to end when Y<0 = not wasting time on one more addition.

 

Disadvantage:

- mscr2 should be set to (mscr + (15*40)-30) - preparing that takes few additions. You could maybe derive it from the last value of mscr in the draw part ?

 

               sec
               ldy #31          ;I assumed mouse pointer is 16 pixels high ?
loop_delete	
	lda restorebuf,y
	sta (mscr2),y
	dey
	lda restorebuf,y
	sta (mscr2),y
	dey
	bmi end
	lda mscr2
	sbc #38
	sta mscr2
	bcs loop_delete
	dec mscr2+1
	sec
	bcs loop_delete
end		

 

I don't think it can get better then this unless you go with unrolled code and take couple of Kb for it...

 

200 x [
	lda restorebuf,y
	sta screen+(scanline*40),x
	iny
	lda restorebuf,y
	sta screen+(scanline*40)+1,x
	iny
]

 

Put a RTS instead of INY in a proper place and jump at the right line (Unrolled_code+14*Mouse_y) ...

 

Much, much faster... but 14x200 = 2800 bytes.... That hurts...

 

ps. One more thing: I think that "dont-draw-right-part" should be separate routine - it is only needed at 1 in 40 cases, no need to add that +5,6 cycles in every loop of drawing code.

        bit mousexhi
       bmi noplotright 

 

Good night!

Link to comment
Share on other sites

...

Just some slight flicker when the mouse is at the top of the screen now. I know for a fact it's to do with triggering the interrupt in the set of blank lines above the playfield. There must be a sweet-spot. What's the cycle usage of a single blank line?

...

Here is a very nice info about it:

Antic timing.txt

Regular bitmap line has 57 cycles free.

Empty line without dma should have full 114 cycles free.

Many thanks - that looks like excellent reading!

 

Here's another doc on Antic timings: Antic Timings.pdf

Link to comment
Share on other sites

Could you change background color at the beginning and the end of Dli so we could see how long does it take ?

Good idea. This should help with the off-playfield section.

 

Did you try Altirra's debugger ?

It is an awesome tool - I started pulling my hair out for a few days when I had problems with Nmis and Dlis not wanting to work together...

And then discovered step-by-step debugger of Altirra... :)

Solved a problem in few minutes... (turned out I wrote $40 instead of $04 in one place :) ).

Yes - it's an excellent facility. I love the profiler too - it's really easy to get a handle on cycle consumption. At the end of the day, if a blank line has roughly twice as many free cycles as a bitmap line, then if we - say - trigger the DLI two lines above the pointer position on the lower lines, when it's triggered in the blank lines, it should only be one scan line above the mouse pointer position (this is just a for instance, but the idea is that the bitmap offset is half the blank line offset).

 

Here is my take on delete routine (starting from bottom of sprite).

Advantages are:

-not using X register - one less INX = -2 cycles per loop

-Using Y register for loop counter. no need for additional counter = -5 cycles per loop (if you are using Zeropage location for your counter, -6 if you are using absolute).

-Quick jump to end when Y<0 = not wasting time on one more addition.

 

Disadvantage:

- mscr2 should be set to (mscr + (15*40)-30) - preparing that takes few additions. You could maybe derive it from the last value of mscr in the draw part ?

 

               sec
               ldy #31          ;I assumed mouse pointer is 16 pixels high ?
loop_delete    
       lda restorebuf,y
       sta (mscr2),y
       dey
       lda restorebuf,y
       sta (mscr2),y
       dey
       bmi end
       lda mscr2
       sbc #38
       sta mscr2
       bcs loop_delete
       dec mscr2+1
       sec
       bcs loop_delete
end        

 

I don't think it can get better then this unless you go with unrolled code and take couple of Kb for it...

 

200 x [
       lda restorebuf,y
       sta screen+(scanline*40),x
       iny
       lda restorebuf,y
       sta screen+(scanline*40)+1,x
       iny
]

 

Put a RTS instead of INY in a proper place and jump at the right line (Unrolled_code+14*Mouse_y) ...

 

Much, much faster... but 14x200 = 2800 bytes.... That hurts...

 

ps. One more thing: I think that "dont-draw-right-part" should be separate routine - it is only needed at 1 in 40 cases, no need to add that +5,6 cycles in every loop of drawing code.

        bit mousexhi
       bmi noplotright 

Nice... I like the erase from the end idea. This gets rid of the separate screen pointer, so we also avoid having to set that up in the VBL. Unrolled code? It would be nice, and if it was in the banked region of a cart we could go crazy. However, this code needs to be in the unbanked region, and we're going to be VERY short of space for interrupt code anyway, so I think it's no-go. As for having the right-clipped part as a separate routine: makes sense, although there's got to be some selection logic somewhere (even if in the VBL) which tells the DLI where to branch. I suppose it would be faster to test for byte 39 and then set the DLI hardware vector to point to the appropriate routine. However, this introduces a few extra cycles into the VBL code, unless we carefully position the DLI code on page boundaries and just flip the MSB instead of setting the flag. All these little aspects have a large cumulative effect.

 

Here's another doc on Antic timings: Antic Timings.pdf

Brilliant - thanks. I'll give you a chance to test this out later; I want to know what you think.

 

Can you easily include lightpen support? It would be really unique for a windowing system to have this option.

I'll look into it - it would certainly be unique. I suppose it makes sense to have a driver system for the mouse, so we can support joysticks, etc.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...