KanedaFr Posted May 14 Share Posted May 14 Hi there, On my last game, I'm hunting the reason for slowdown: I wrote a vblank handler to count the number of frames my main loop takes. I so found it needs most of the times 2 vblank (with a peak of 5 but it occurs at a specific moment which doesn't hurt the gameplay) If I narrow down the most hungry calls (very basic profiling!), I unfortunately stop on a call I can't optimize => tgi_sprite I have 4 calls to tgi_sprite by loop - one to clean the screen drawing a 1x1 zoomed full screen - one for the common game sprites (about 100!) - one for some specific sprites related to the current game screen - one for every texts and UI Apart the first one, it's based on chained sprites. I tried to remove the first call : 1 and only 1 (mega) zoomed sprite => To my surprise, I clearly see a boost : I'm a lot more at 1 vblank than 2 vblank by loop It seems some sprites are faster to draw than others, and when you draw 100s of them, the impact could be huge. So I'd like to know if info are available or some measures were done to understand the impact of zoomed sprites, 1/2/4 bits sprites, type of sprites (TYPE_NONCOLL,....), scretched sprites, .... If not, how could this be done ? using the hbl timer ? or since suzy is a lot faster, HBL is too slow ? Quote Link to comment Share on other sites More sharing options...
42bs Posted May 14 Share Posted May 14 https://github.com/42Bastian/lynx_hacking/tree/master/chained_scbs Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 Thanks @42bs I saw your chained sprites test (it's why I use chained sprites) I'll try to hack it to test with other type of sprites Quote Link to comment Share on other sites More sharing options...
LordKraken Posted May 14 Share Posted May 14 (edited) 1) you really DONT need the call to clean the screen so just remove that first call 2) follow lord BS advice, one tgi call, not more 3) i get that you need 100 SCB for all the bubble that could potentially be on your screen BUT, most of the time you have wayyy less bubble. So simply update the next pointer to null of the last visible bubble. Edited May 14 by LordKraken Quote Link to comment Share on other sites More sharing options...
Nop90 Posted May 14 Share Posted May 14 BTW if the main loop takes 2 VBLANKs your code runs at 30FPS, that is a good result with so many chained sprites Quote Link to comment Share on other sites More sharing options...
LordKraken Posted May 14 Share Posted May 14 agree with you nop90, but I'm suspecting that @KanedaFr is doing one-pixel movement, so that's 30 pixels / second and in tate mode, that means a couple of seconds for a bubble to cross the screen. And yes maybe you want to increase the speed of your bubble in the end. Quote Link to comment Share on other sites More sharing options...
42bs Posted May 14 Share Posted May 14 1 hour ago, KanedaFr said: I have 4 calls to tgi_sprite by loop - one to clean the screen drawing a 1x1 zoomed full screen Check if you really need to clean the complete screen. If only parts, do align on even X pixel to avoid RMW cycle. Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 Like i said, I removed the first call to clear screen. To make a unique chained list, I 'only' had to update the next_sprite attribute of the last sprite of each list. Not sure it will boost anything but I'll give it a try. For the hundred of bubbles to draw, I set the SKIP bit or not. If I had to reorder the bubbles on the chained list to get only the needed ones, I'll only translate the problem, any draw gain => logic/code lost It's not an one-pixel movement, it's actually 2. I could easily increase the speed but it won't solve the problem : you'll notice the slow down when I drop to 2 vblank when I'm at 1 vblank 75% of the times. The quick way to handle it would be to stay at 2 vblank per loop whatever happen and increase speed / animation but I'm not sure I'll be happy with this trick 😊 Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 24 minutes ago, 42bs said: Check if you really need to clean the complete screen. If only parts, do align on even X pixel to avoid RMW cycle. Oh?! does it mean sprite at odd X are "slower" to draw than sprites at even X ?! I'm not familiar with "RMW cycle", I'll gather information about this....Is it ASL, DEC, INC, LSR, ROL, and ROR ? Quote Link to comment Share on other sites More sharing options...
42bs Posted May 14 Share Posted May 14 Each byte contains 2 pixels. So if you set only one of these, then you first have to read the byte, mask the pixel, add the new one and write it back. Quote Link to comment Share on other sites More sharing options...
42bs Posted May 14 Share Posted May 14 25 minutes ago, KanedaFr said: The quick way to handle it would be to stay at 2 vblank per loop whatever happen and increase speed / animation but I'm not sure I'll be happy with this trick 😊 Which is actually a trick most games use. Rather run always with 30FPS then have a sudden slow down. Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 23 minutes ago, 42bs said: Which is actually a trick most games use. Rather run always with 30FPS then have a sudden slow down. What I mean is using this trick without optimizing first isn't something I'd like. If I wrote an ugly code, I can't just say "ok, let's use to 15FPS trick to hide my laziness" Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 I used SCB_REHV_PAL almost everywhere.... I'm currently trying to use SCB_RENONE by default when possible to see if it's improve something or not Quote Link to comment Share on other sites More sharing options...
Nop90 Posted May 14 Share Posted May 14 6 minutes ago, KanedaFr said: I used SCB_REHV_PAL almost everywhere.... I'm currently trying to use SCB_RENONE by default when possible to see if it's improve something or not You have to use a full sprite structure only the first time you call the suzy blitter or if you have to change/reset some of it's internal registers. E.g if you call tgi_clear() at the beginning of a new frame, it changes the vsize anf hsize values so the next sprite to draw has to set the correct value for those registers. And remember that the suzy math calls can change some of the blitter registers. Suzy registers are very limited at low level and the same registers can be mapped to several memory locations for different uses. Usually my sprite chain starts with a full SCB_REHV_PAL sprite for the background, than all the following sprites that don't change size or palette colors are SCB_RENONE, and in the end I put the right SCB structures for the sprites that has special effect like zooming or blinking colors. Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 Thanks for details @Nop90 ! I though every sprite has its own settings (like most of the systems I worked on from now) I learnt only today, by mistake, that it's in fact using suzy registers and impact all the next sprites drawn, if not clear or re-set. Interesting...very powerful if used correctly ! and so I probably gain some ticks if I don't set suzy registers for EACH sprite 1 Quote Link to comment Share on other sites More sharing options...
LordKraken Posted May 14 Share Posted May 14 the skip sprite option is probably not the most optimized thing to do since even if you have only 3 bubbles on screen, the lynx will still parse the whole chained list with the 97 other being on skip. Much faster to just set the next parameter of you last visible SCB to 0. Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 14 Author Share Posted May 14 The idea was to have a constant number of sprites to draw (or not) I tried to avoid speed slow down where the number of bubbles increase. Unfortunately, it seems to be a fail Quote Link to comment Share on other sites More sharing options...
LordKraken Posted May 14 Share Posted May 14 Im not convinced by the strategy to slow down the game entirely to avoid some potential slow down when the screen is full 1 Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 15 Author Share Posted May 15 So, - I removed the first zoomed full sprite - I carefully selected the right SCB for each sprite - I optimized my loops over 100 sprites with a local var (pushed on stack) converted to a global one (memory direct access) The result is awesome! There is still one slow down I have to fight with (when the bubbles shake) but the improvement is clearly visible. Thanks everyone! 3 Quote Link to comment Share on other sites More sharing options...
42bs Posted May 15 Share Posted May 15 3 minutes ago, KanedaFr said: I optimized my loops over 100 sprites with a local var (pushed on stack) converted to a global one (memory direct access) Just to be sure: You are using char for the loop variable? Not sure, but I think cc65 allows to set a Zeropage attribute, or move the variable to ZP. Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 15 Author Share Posted May 15 (edited) 1 hour ago, 42bs said: Just to be sure: You are using char for the loop variable? Not sure, but I think cc65 allows to set a Zeropage attribute, or move the variable to ZP. yes, I use char the most I can (hopefully I have less than 256 sprites ) In this case, it was a pointer to scb so i didn't use zero page. With cc65, it's faster to have ptr = &scb[i]; ptr->vpos = xx; ptr->nnn = xx; when you have multiple attributs to update than scb[i].vpos = xx; scb[i].nnn = xx; even more if ptr is a global var (reused on every method which needs it) Edited May 15 by KanedaFr Quote Link to comment Share on other sites More sharing options...
42bs Posted May 15 Share Posted May 15 Did you try making the loop variable (i) also global? Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 15 Author Share Posted May 15 13 minutes ago, 42bs said: Did you try making the loop variable (i) also global? I'm currently refactoring the code to use a ZP loop variable but i must be careful since i have some nested loops (yeah, I know, it's bad....but it's the best way to detect collide on this case) Quote Link to comment Share on other sites More sharing options...
42bs Posted May 15 Share Posted May 15 7 minutes ago, KanedaFr said: detect collide What about Suzy`s collision buffer? Won't it help? Quote Link to comment Share on other sites More sharing options...
KanedaFr Posted May 15 Author Share Posted May 15 1 minute ago, 42bs said: What about Suzy`s collision buffer? Won't it help? not on this case, it's not a real collision but to detect which sprites are around it's a puzzle bubble game (dedicated post in progress), so I need to detect if several sprites of the same color are around Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.