VladR Posted July 2, 2019 Author Share Posted July 2, 2019 Damn, that's really nice and high-level. load/init/play - doesn't get any more high-level than that I'm sold 1 Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 2, 2019 Share Posted July 2, 2019 1 hour ago, VladR said: Disregarding the smelly Jaguar Fumes brought here by The Regime Collaborants, I got some first interesting benchmark data. Day 6: - wrote both a Suzy-based and CPU-based scanline drawing in assembler. Set the scanline length to 160 pixels. - Plugged it into the yesterday's timer (64 us, set to repeat 128 times). - Somehow, in the same timeframe (128 * 64 us), CPU managed to draw the 160-pixel scanline 336x while Suzy only 259x, so CPU is 30% faster So, Suzy is actually slower on my emulator And it's not a 3-pixel scanline, it's full 160 pixels! That's the best case Suzy can hope for, though mostly it'll be much much shorter scanline. Obviously, I don't believe that could possibly be the case with the real HW, but it's pretty funny nonetheless. I gotta clean the code up, perhaps provide some text messages too and make sure I'm using exact same timer data before I provide a download. Handy isn't cycle exact yet. If you wish I'm able to run it on my Lynx 2. Quote Link to comment Share on other sites More sharing options...
42bs Posted July 2, 2019 Share Posted July 2, 2019 1 hour ago, VladR said: - Somehow, in the same timeframe (128 * 64 us), CPU managed to draw the 160-pixel scanline 336x while Suzy only 259x, so CPU is 30% faster So, Suzy is actually slower on my emulator And it's not a 3-pixel scanline, it's full 160 pixels! That's the best case Suzy can hope for, though mostly it'll be much much shorter scanline. Modified demo0006.asm: On Handy, copying a PIC to screen takes 50ms, using Suzy to draw 102 sprites (including sinus wave offset calculation) takes 33ms. So Suzy is 50% faster. 1 Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 I cleaned it up, and now am running both codepaths, separately without recompiling (like before), so you can see both results at the same time. I attached it to this post. Not sure how it works with new forum, as it's my first time I attach anything other than pic. But, it does show attached on my end. lynxproj.lnx First Row: First Number (number of times scanline was drawn) is CPU, Second is Suzy. Next 3 numbers are just the three 8-bit counters, for each of them. Please run it both on your emulator and Lynx. I want to figure out the reason for this anomaly. I suspect it's the palette within the sprite, but that was the first version of sprite I got working. Thanks! Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 54 minutes ago, Cyprian_K said: Handy isn't cycle exact yet. If you wish I'm able to run it on my Lynx 2. Yeah, I understand that. But, it still shouldn't have been slower than CPU. That just makes no sense to me at this point. If you can, please download the benchmark and post the screenshot with your numbers from your Lynx. Thanks! Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 33 minutes ago, 42bs said: Modified demo0006.asm: On Handy, copying a PIC to screen takes 50ms, using Suzy to draw 102 sprites (including sinus wave offset calculation) takes 33ms. So Suzy is 50% faster. Interesting. So, on your end, the emulator is faster for HW than SW. I presume, the load is identical (e.g. you have 2 codepaths - HW and SW) ? Quote Link to comment Share on other sites More sharing options...
+karri Posted July 2, 2019 Share Posted July 2, 2019 On Mednafen I got this: Now when you mention the penpal in the sprite. I tried to use a sprite without a penpal but never got it working. There could be something wrong in the tgi library. You might want to re-write this part of the tgi library. It is fairly simple to grab the lynx-160-102-16.s file from the cc65 sources, change the segment from "JUMPTABLE" to "CODE", add a label _lynx_160_102_16: in front of the jumptable, export that label and add it to your Makefile. It will then replace the stock-driver. Or you could just call your own asm draw routine instead of tgi_sprite(). As you can see from the code the CPU polls for SPRSYS instead of just moving on to do something useful. draw_sprite: ; Draw it in render buffer sta SCBNEXTL stx SCBNEXTH lda DRAWPAGEL ldx DRAWPAGEH sta VIDBASL stx VIDBASH lda #1 sta SPRGO stz SDONEACK @L0: stz CPUSLEEP lda SPRSYS lsr bcs @L0 stz SDONEACK lda #TGI_ERR_OK sta ERROR rts Quote Link to comment Share on other sites More sharing options...
42bs Posted July 2, 2019 Share Posted July 2, 2019 23 minutes ago, VladR said: I cleaned it up, and now am running both codepaths, separately without recompiling (like before), so you can see both results at the same time. I attached it to this post. Not sure how it works with new forum, as it's my first time I attach anything other than pic. But, it does show attached on my end. lynxproj.lnx 27.98 kB · 1 download First Row: First Number (number of times scanline was drawn) is CPU, Second is Suzy. Next 3 numbers are just the three 8-bit counters, for each of them. Please run it both on your emulator and Lynx. I want to figure out the reason for this anomaly. I suspect it's the palette within the sprite, but that was the first version of sprite I got working. Thanks! Are you drawing the font line by line?! It seems like. Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 13 minutes ago, karri said: On Mednafen I got this: Now when you mention the penpal in the sprite. I tried to use a sprite without a penpal but never got it working. There could be something wrong in the tgi library. Wow. That's almost exactly same! How is that possible. You must have same CPU as I do ! Well, I couldn't possibly claim that I understand all Suzy's registers. But after half an hour I gave up and instead got the pen drawing working, so that's what I have now. But now, that I am disassociated from TGI , by running asm code from separate Asm file, I can go and try to get the Sprite without penpalette working again. I am presuming the HW must waste some bandwidth on reading palette. Which obviously must add up quickly. But, it's the first working version so it's good enough for now... Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 3 minutes ago, 42bs said: Are you drawing the font line by line?! It seems like. Yeah, why? It's for debug purposes only, so speed is irrelevant. Besides it was written in C, in like, 10 minutes... Quote Link to comment Share on other sites More sharing options...
42bs Posted July 2, 2019 Share Posted July 2, 2019 1 minute ago, VladR said: Yeah, why? It's for debug purposes only, so speed is irrelevant. Besides it was written in C, in like, 10 minutes... Sure, but even for debugging, why not draw a character as one sprite? Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 2 minutes ago, 42bs said: Sure, but even for debugging, why not draw a character as one sprite? Oh, I got fuc*ed real nasty by Jaguar on that one Burnt a loooot of time on that one. Non-replicable bugs are worst. You then chase a different lead because Blitter is a mess. Only to find out later that it was Blitter. Then again, on Jaguar, I run: GPU Risc code in parallel with DSP Risc code in parallel with 68000 code in parallel with Blitter drawing, so... So, to be 100% safe, I only trust CPU. I prefer my sanity intact Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 2, 2019 Share Posted July 2, 2019 47 minutes ago, VladR said: Yeah, I understand that. But, it still shouldn't have been slower than CPU. That just makes no sense to me at this point. If you can, please download the benchmark and post the screenshot with your numbers from your Lynx. Thanks! ok, I'ill do that today evening 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 2, 2019 Share Posted July 2, 2019 (edited) 42 minutes ago, VladR said: Wow. That's almost exactly same! How is that possible. You must have same CPU as I do ! Well, I couldn't possibly claim that I understand all Suzy's registers. But after half an hour I gave up and instead got the pen drawing working, so that's what I have now. But now, that I am disassociated from TGI , by running asm code from separate Asm file, I can go and try to get the Sprite without penpalette working again. I am presuming the HW must waste some bandwidth on reading palette. Which obviously must add up quickly. But, it's the first working version so it's good enough for now... Handy should give on any PC the same results. Though it is not 100% cycle accurate, the frame time is. Anyway, a reason is probably the palette. For a single colored line, a two pen palette would be sufficient. BTW: You doubly send CPU to sleep in the sprite drawing routine. Changing the first "stz fd91" into NOPs results in 517 lines! Compared to 257! Edited July 2, 2019 by 42bs Fix address 1 Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 Wow. That's interesting behavior Oh, it probably flip flopped? I am utterly confused how the flag behaves. Would help if I had real HW... The first stz is outside of loop, second is inside the waiting loop. So, am I actually drawing double amount of scanlines now on Suzy? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 2, 2019 Share Posted July 2, 2019 1 minute ago, VladR said: Wow. That's interesting behavior Oh, it probably flip flopped? I am utterly confused how the flag behaves. Would help if I had real HW... The first stz is outside of loop, second is inside the waiting loop. So, am I actually drawing double amount of scanlines now on Suzy? Nope. First "stz $fd91" sends CPU to sleep. It wakes up on the next interrupt or by Suzy if done. Second one sends CPU again to sleep until the next interrupt. Suzy draws only once. The loop is because sleep is broken:http://www.monlynx.de/lynx/lynx10.html#_18 Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 Damn, that's probably the most hilarious fuc*-up I ever pulled It's kinda evil, because it's just drawing same line over and over, so impossible to notice otherwise. Really, really thanks. I will test it tomorrow, going to sleep now... Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 1 hour ago, karri said: On Mednafen I got this: Now when you mention the penpal in the sprite. I tried to use a sprite without a penpal but never got it working. There could be something wrong in the tgi library. You might want to re-write this part of the tgi library. It is fairly simple to grab the lynx-160-102-16.s file from the cc65 sources, change the segment from "JUMPTABLE" to "CODE", add a label _lynx_160_102_16: in front of the jumptable, export that label and add it to your Makefile. It will then replace the stock-driver. Or you could just call your own asm draw routine instead of tgi_sprite(). As you can see from the code the CPU polls for SPRSYS instead of just moving on to do something useful. draw_sprite: ; Draw it in render buffer sta SCBNEXTL stx SCBNEXTH lda DRAWPAGEL ldx DRAWPAGEH sta VIDBASL stx VIDBASH lda #1 sta SPRGO stz SDONEACK @L0: stz CPUSLEEP lda SPRSYS lsr bcs @L0 stz SDONEACK lda #TGI_ERR_OK sta ERROR rts There's a source for tgi lib? Damn, that could be gold. All the working code! That waiting code looks exactly like the one I found while searching forums for a hint why my asm Sprite code don't work. I only had the stz , not full loop. Of course, when I copy pasted the loop, the original stz stayed , creating a hilarious evil bug Quote Link to comment Share on other sites More sharing options...
Fadest Posted July 2, 2019 Share Posted July 2, 2019 (edited) 1 hour ago, VladR said: I cleaned it up, and now am running both codepaths, separately without recompiling (like before), so you can see both results at the same time. I attached it to this post. Not sure how it works with new forum, as it's my first time I attach anything other than pic. But, it does show attached on my end. lynxproj.lnx 27.98 kB · 3 downloads For people who would like to try on real Lynx, this is not a real ROM but a single executable. You have to create yourself a .lyx or .lnx in order to flash it or use .o option on Bernd's Flashcard. This is what I did, and here is the result on Lynx 2 : 37 228 37 228 0 0 0 0 I'm pretty sure first number is 37 but due to the fact it is written in 0,0 position, cannot be 100% sure Oh, and the line is not display above numbers Edited July 2, 2019 by Fadest 1 Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 2, 2019 Share Posted July 2, 2019 (edited) @Fadest thx for that hint, fortunately there were no need to modify the file, it works fine with Saint's Lynx SD card Interesting is that my figures are a bit different than Fadest's one: 229 vs 228 And also on my Lynx II there is no a white horizontal line Edited July 2, 2019 by Cyprian_K 1 Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 11 hours ago, 42bs said: Handy should give on any PC the same results. Though it is not 100% cycle accurate, the frame time is. Anyway, a reason is probably the palette. For a single colored line, a two pen palette would be sufficient. BTW: You doubly send CPU to sleep in the sprite drawing routine. Changing the first "stz fd91" into NOPs results in 517 lines! Compared to 257! Sure, frame time will be same, but number of instructions executed during that 1/60s by PC's CPU isn't. But, clearly, Lynx's emulator coders are smart people and unlike on Jaguar, don't just let the emulation run at full speed of the local CPU, hence the emulator results are actually comparable among different host machines Which feels incredible, btw. I can confirm that removing first stz did, indeed, double the performance (on emulator) to 517 ! 10 hours ago, Fadest said: For people who would like to try on real Lynx, this is not a real ROM but a single executable. You have to create yourself a .lyx or .lnx in order to flash it or use .o option on Bernd's Flashcard. This is what I did, and here is the result on Lynx 2 : 37 228 37 228 0 0 0 0 I'm pretty sure first number is 37 but due to the fact it is written in 0,0 position, cannot be 100% sure Oh, and the line is not display above numbers OK, two things: 1. I find it hard to believe it would be just 37 (and not 137, or 237 or 337). Suzy is only 16 MHz, not 40 MHz I did consider (for 10 seconds) issue of pixel 0,0 being slightly off-screen, but this is my confirmation, so I'll go fix that. On another hand, it appears that the two rows of 0s are visible in full, so perhaps it is, indeed, just 37... but, because of my hilarious stz bug, the Suzy number should be ~double, e.g. ~458 and that's more than an order of magnitude faster, which really seems unrealistic. So, there must be something else going on. 2. What do I do to create the *.lx ? I only have object files in the intermediate directory. 3 hours ago, Cyprian_K said: @Fadest thx for that hint, fortunately there were no need to modify the file, it works fine with Saint's Lynx SD card Interesting is that my figures are a bit different than Fadest's one: 229 vs 228 And also on my Lynx II there is no a white horizontal line Thanks for the photo ! If you guys can answer my questions before I create another build, it would be great! I'll make sure to display numbers in the safe region and this time create slightly more useful benchmark of drawing a filled quad. Quote Link to comment Share on other sites More sharing options...
Cyprian Posted July 2, 2019 Share Posted July 2, 2019 14 minutes ago, VladR said: OK, two things: 1. I find it hard to believe it would be just 37 (and not 137, or 237 or 337). Suzy is only 16 MHz, not 40 MHz I did consider (for 10 seconds) issue of pixel 0,0 being slightly off-screen, but this is my confirmation, so I'll go fix that. On another hand, it appears that the two rows of 0s are visible in full, so perhaps it is, indeed, just 37... but, because of my hilarious stz bug, the Suzy number should be ~double, e.g. ~458 and that's more than an order of magnitude faster, which really seems unrealistic. So, there must be something else going on. 2. What do I do to create the *.lx ? I only have object files in the intermediate directory. Thanks for the photo ! 1) I have no idea, I didn't check CPU copy code. is it LDA $XX,Y / STA $XX,Y? 2) Your file was fine for me, Saint's Lynx SD card easily accepts that format. what about horizontal line? It is visible under Handy, but not on the real hardware. Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 1 minute ago, Cyprian_K said: 1) I have no idea, I didn't check CPU copy code. is it LDA $XX,Y / STA $XX,Y? 2) Your file was fine for me, Saint's Lynx SD card easily accepts that format. what about horizontal line? It is visible under Handy, but not on the real hardware. The CPU drawing code is like this: sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny sta (160),Y iny I'm gonna post the new code, once it's done. The missing line is disturbing, but for new build I will have top half of screen dedicated to the CPU drawing and bottom screen half for the Suzy. Numbers should be somewhere in the middle. That approach should be solid, in theory... Quote Link to comment Share on other sites More sharing options...
VladR Posted July 2, 2019 Author Share Posted July 2, 2019 Actually, I shouldn't draw the half screen on CPU, because if the value of 37 is indeed correct, then the timer comparison will become invalid, as CPU's slice would be longer than Suzy's. Which would have to be indexed (to be comparable)and there's no need for further confusing the results. So, it will be safer, if it's just couple scanlines, not half of screen. Of course, if I had the Lynx, that'd take me about 10 minutes to figure out by deploying two builds... Quote Link to comment Share on other sites More sharing options...
+bhall408 Posted July 2, 2019 Share Posted July 2, 2019 13 hours ago, VladR said: Please run it both on your emulator and Lynx. I want to figure out the reason for this anomaly. I suspect it's the palette within the sprite, but that was the first version of sprite I got working. Thanks! Not surprisningly, I get same values as above for Handy on Android. I saw the values others have seen for Mednafen using OpenEmu on Mac (which is Mednafen based). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.