Jump to content
IGNORED

The hunt for slowdown


Recommended Posts

Hi there,

 

On my last game, I'm hunting the reason for slowdown:

I wrote a vblank handler to count the number of frames my main loop takes.

I so found it needs most of the times 2 vblank (with a peak of 5 but it occurs at a specific moment which doesn't hurt the gameplay)

If I narrow down the most hungry calls (very basic profiling!), I unfortunately stop on a call I can't optimize => tgi_sprite :(

I have 4 calls to tgi_sprite by loop

- one to clean the screen drawing a 1x1 zoomed full screen 

- one for the common game sprites (about 100!)

- one for some specific sprites related to the current game screen

- one for every texts and UI

Apart the first one, it's based on chained sprites.

 

I tried to remove the first call : 1 and only 1 (mega) zoomed sprite

=> To my surprise, I clearly see a boost : I'm a lot more at 1 vblank than 2 vblank by loop

 

It seems some sprites are faster to draw than others, and when you draw 100s of them, the impact could be huge.
So I'd like to know if info are available or some measures were done to understand the impact of zoomed sprites, 1/2/4 bits sprites, type of sprites (TYPE_NONCOLL,....), scretched sprites, ....

If not, how could this be done ? using the hbl timer ? or since suzy is a lot faster, HBL is too slow ?

 

 

 

Link to comment
Share on other sites

Posted (edited)

1) you really DONT need the call to clean the screen so just remove that first call

2) follow lord BS advice, one tgi call, not more

3) i get that you need 100 SCB for all the bubble that could potentially be on your screen BUT, most of the time you have wayyy less bubble. So simply update the next pointer to null of the last visible bubble.

Edited by LordKraken
Link to comment
Share on other sites

agree with you nop90, but I'm suspecting that @KanedaFr is doing one-pixel movement, so that's 30 pixels / second and in tate mode, that means a couple of seconds for a bubble to cross the screen. And yes maybe you want to increase the speed of your bubble in the end.

Link to comment
Share on other sites

1 hour ago, KanedaFr said:

I have 4 calls to tgi_sprite by loop

- one to clean the screen drawing a 1x1 zoomed full screen 

Check if you really need to clean the complete screen. If only parts, do align on even X pixel to avoid RMW cycle.

Link to comment
Share on other sites

Like i said, I removed the first call to clear screen.

To make a unique chained list, I 'only' had to update the next_sprite attribute of the last sprite of each list.

Not sure it will boost anything but I'll give it a try.

 

For the hundred of bubbles to draw, I set the SKIP bit or not. 

If I had to reorder the bubbles on the chained list to get only the needed ones, I'll only translate the problem, any draw gain => logic/code lost

 

It's not an one-pixel movement, it's actually 2.

I could easily increase the speed but it won't solve the problem : you'll notice the slow down when I drop to 2 vblank when I'm at 1 vblank 75% of the times.

The quick way to handle it would be to stay at 2 vblank per loop whatever happen and increase speed / animation but I'm not sure I'll be happy with this trick 😊

 

Link to comment
Share on other sites

24 minutes ago, 42bs said:

Check if you really need to clean the complete screen. If only parts, do align on even X pixel to avoid RMW cycle.

Oh?! does it mean sprite at odd X are "slower" to draw than sprites at even X ?!

I'm not familiar with "RMW cycle", I'll gather information about this....Is it ASL, DEC, INC, LSR, ROL, and ROR ?

Link to comment
Share on other sites

25 minutes ago, KanedaFr said:

The quick way to handle it would be to stay at 2 vblank per loop whatever happen and increase speed / animation but I'm not sure I'll be happy with this trick 😊

Which is actually a trick most games use. Rather run always with 30FPS then have a sudden slow down.

Link to comment
Share on other sites

23 minutes ago, 42bs said:

Which is actually a trick most games use. Rather run always with 30FPS then have a sudden slow down.

What I mean is using this trick without optimizing first isn't something I'd like.

If I wrote an ugly code, I can't just say "ok, let's use to 15FPS trick to hide my laziness" ;)

Link to comment
Share on other sites

6 minutes ago, KanedaFr said:

I used SCB_REHV_PAL almost everywhere....

I'm currently trying to use SCB_RENONE by default when possible to see if it's improve something or not

You have to use a full sprite structure only the first time you call the suzy blitter or if you have to change/reset some of it's internal registers.

 

E.g if you call tgi_clear() at the beginning of a new frame, it changes the vsize anf hsize values so the next sprite to draw has to set the correct value for those registers.

 

And remember that the suzy math calls can change some of the blitter registers. Suzy registers are very limited at low level and the same registers can be mapped to several memory locations for different uses.

 

Usually my sprite chain starts with a full SCB_REHV_PAL sprite for the background, than all the following sprites that don't change size or palette colors are SCB_RENONE, and in the end I put the right SCB structures for the sprites that has special effect like zooming or blinking colors.

Link to comment
Share on other sites

Thanks for details @Nop90 !

 

I though every sprite has its own settings (like most of the systems I worked on from now)

I learnt only today, by mistake, that it's in fact using suzy registers and impact all the next sprites drawn, if not clear or re-set.

 

Interesting...very powerful if used correctly !

and so I probably gain some ticks if I don't set suzy registers for EACH sprite

  • Like 1
Link to comment
Share on other sites

the skip sprite option is probably not the most optimized thing to do since even if you have only 3 bubbles on screen, the lynx will still parse the whole chained list with the 97 other being on skip. Much faster to just set the next parameter of you last visible SCB to 0.

Link to comment
Share on other sites

So, 

- I removed the first zoomed full sprite

- I carefully selected the right SCB for each sprite

- I optimized my loops over 100 sprites with a local var (pushed on stack) converted to a global one (memory direct access)

 

The result is awesome!

There is still one slow down I have to fight with (when the bubbles shake) but the improvement is clearly visible.

Thanks everyone!

  • Like 3
Link to comment
Share on other sites

3 minutes ago, KanedaFr said:

I optimized my loops over 100 sprites with a local var (pushed on stack) converted to a global one (memory direct access)

Just to be sure: You are using char for the loop variable?

Not sure, but I think cc65 allows  to set a Zeropage attribute, or move the variable to ZP.

Link to comment
Share on other sites

1 hour ago, 42bs said:

Just to be sure: You are using char for the loop variable?

Not sure, but I think cc65 allows  to set a Zeropage attribute, or move the variable to ZP.

yes, I use char the most I can (hopefully I have less than 256 sprites ;) )

 

In this case, it was a pointer to scb so i didn't use zero page.

With cc65, it's faster to have

ptr = &scb[i];
ptr->vpos = xx;
ptr->nnn = xx;

when you have multiple attributs to update than

scb[i].vpos = xx;
scb[i].nnn = xx;

 

even more if ptr is a global var (reused on every method which needs it)

Edited by KanedaFr
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...