Jump to content
IGNORED

Lynx 3D Experimenting


VladR

Recommended Posts

1 hour ago, VladR said:

Disregarding the smelly Jaguar Fumes brought here by The Regime Collaborants, I got some first interesting benchmark data.

 

Day 6:

- wrote both a Suzy-based and CPU-based scanline drawing in assembler. Set the scanline length to 160 pixels.

- Plugged it into the yesterday's timer (64 us, set to repeat 128 times).

- Somehow, in the same timeframe (128 * 64 us), CPU managed to draw the 160-pixel scanline 336x while Suzy only 259x, so CPU is 30% faster :)

 

So, Suzy is actually slower on my emulator :lol: And it's not a 3-pixel scanline, it's full 160 pixels! That's the best case Suzy can hope for, though mostly it'll be much much shorter scanline.

Obviously, I don't believe that could possibly be the case with the real HW, but it's pretty funny nonetheless.

I gotta clean the code up, perhaps provide some text messages too and make sure I'm using exact same timer data before I provide a download.

Handy isn't cycle exact yet.

If you wish I'm able to run it on my Lynx 2.

 

 

Link to comment
Share on other sites

1 hour ago, VladR said:

- Somehow, in the same timeframe (128 * 64 us), CPU managed to draw the 160-pixel scanline 336x while Suzy only 259x, so CPU is 30% faster :)

 

So, Suzy is actually slower on my emulator :lol: And it's not a 3-pixel scanline, it's full 160 pixels! That's the best case Suzy can hope for, though mostly it'll be much much shorter scanline.

Modified demo0006.asm: On Handy, copying a PIC to screen takes 50ms, using Suzy to draw 102 sprites (including sinus wave offset calculation) takes 33ms. So Suzy is 50% faster.

  • Like 1
Link to comment
Share on other sites

I cleaned it up, and now am running both codepaths, separately without recompiling (like before), so you can see both results at the same time.

 

I attached it to this post. Not sure how it works with new forum, as it's my first time I attach anything other than pic. But, it does show attached on my end.

lynxproj.lnx

 

First Row: First Number (number of times scanline was drawn) is CPU, Second is Suzy.

Next 3 numbers are just the three 8-bit counters, for each of them.

Lynx09_CPUvsSuzy.thumb.GIF.f720e479a258b61a5caa26824098787e.GIF

 

Please run it both on your emulator and Lynx. I want to figure out the reason for this anomaly. I suspect it's the palette within the sprite, but that was the first version of sprite I got working.

 

Thanks!

 

Link to comment
Share on other sites

54 minutes ago, Cyprian_K said:

Handy isn't cycle exact yet.

If you wish I'm able to run it on my Lynx 2.

Yeah, I understand that. But, it still shouldn't have been slower than CPU. That just makes no sense to me at this point.

 

If you can, please download the benchmark and post the screenshot with your numbers from your Lynx. Thanks!

Link to comment
Share on other sites

33 minutes ago, 42bs said:

Modified demo0006.asm: On Handy, copying a PIC to screen takes 50ms, using Suzy to draw 102 sprites (including sinus wave offset calculation) takes 33ms. So Suzy is 50% faster.

Interesting. So, on your end, the emulator is faster for HW than SW. I presume, the load is identical (e.g. you have 2 codepaths - HW and SW) ?

Link to comment
Share on other sites

On Mednafen I got this:

2070435149_Screenshotfrom2019-07-0212-53-31.png.45ae5d29c95bfb68b2c26c61fe1bdcab.png

 

Now when you mention the penpal in the sprite. I tried to use a sprite without a penpal but never got it working. There could be something wrong in the tgi library. You might want to re-write this part of the tgi library. It is fairly simple to grab the lynx-160-102-16.s file from the cc65 sources, change the segment from "JUMPTABLE" to "CODE", add a label _lynx_160_102_16: in front of the jumptable, export that label and add it to your Makefile. It will then replace the stock-driver. Or you could just call your own asm draw routine instead of tgi_sprite(). As you can see from the code the CPU polls for SPRSYS instead of just moving on to do something useful.

 

draw_sprite:                    ; Draw it in render buffer
        sta     SCBNEXTL
        stx     SCBNEXTH
        lda     DRAWPAGEL
        ldx     DRAWPAGEH
        sta     VIDBASL
        stx     VIDBASH
        lda     #1
        sta     SPRGO
        stz     SDONEACK
@L0:    stz     CPUSLEEP
        lda     SPRSYS
        lsr
        bcs     @L0
        stz     SDONEACK
        lda     #TGI_ERR_OK
        sta     ERROR
        rts


 

Link to comment
Share on other sites

23 minutes ago, VladR said:

I cleaned it up, and now am running both codepaths, separately without recompiling (like before), so you can see both results at the same time.

 

I attached it to this post. Not sure how it works with new forum, as it's my first time I attach anything other than pic. But, it does show attached on my end.

lynxproj.lnx 27.98 kB · 1 download

 

First Row: First Number (number of times scanline was drawn) is CPU, Second is Suzy.

Next 3 numbers are just the three 8-bit counters, for each of them.

 

Please run it both on your emulator and Lynx. I want to figure out the reason for this anomaly. I suspect it's the palette within the sprite, but that was the first version of sprite I got working.

 

Thanks!

 

Are you drawing the font line by line?! It seems like.

 

Link to comment
Share on other sites

13 minutes ago, karri said:

On Mednafen I got this:

2070435149_Screenshotfrom2019-07-0212-53-31.png.45ae5d29c95bfb68b2c26c61fe1bdcab.png

 

Now when you mention the penpal in the sprite. I tried to use a sprite without a penpal but never got it working. There could be something wrong in the tgi library.

Wow. That's almost exactly same! How is that possible. You must have same CPU as I do !

 

 

Well, I couldn't possibly claim that I understand all Suzy's registers.

 

But after half an hour I gave up and instead got the pen drawing working, so that's what I have now.

 

But now, that I am disassociated from TGI , by running asm code from separate Asm file, I can go and try to get the Sprite without penpalette working again.

 

I am presuming the HW must waste some bandwidth on reading palette. Which obviously must add up quickly. But, it's the first working version so it's good enough for now...

Link to comment
Share on other sites

1 minute ago, VladR said:

Yeah, why? It's for debug purposes only, so speed is irrelevant. Besides it was written in C, in like, 10 minutes...

Sure, but even for debugging, why not draw a character as one sprite?

Link to comment
Share on other sites

2 minutes ago, 42bs said:

Sure, but even for debugging, why not draw a character as one sprite?

Oh, I got fuc*ed real nasty by Jaguar on that one :lol:

 

Burnt a loooot of time on that one. Non-replicable bugs are worst. You then chase a different lead because Blitter is a mess. Only to find out later that it was Blitter.

 

Then again, on Jaguar, I run:

GPU Risc code in parallel with

DSP Risc code in parallel with

68000 code in parallel with

Blitter drawing, so...

 

So, to be 100% safe, I only trust CPU. I prefer my sanity intact :)

 

Link to comment
Share on other sites

47 minutes ago, VladR said:

Yeah, I understand that. But, it still shouldn't have been slower than CPU. That just makes no sense to me at this point.

 

If you can, please download the benchmark and post the screenshot with your numbers from your Lynx. Thanks!

ok, I'ill do that today evening

  • Thanks 1
Link to comment
Share on other sites

42 minutes ago, VladR said:

Wow. That's almost exactly same! How is that possible. You must have same CPU as I do !

 

 

Well, I couldn't possibly claim that I understand all Suzy's registers.

 

But after half an hour I gave up and instead got the pen drawing working, so that's what I have now.

 

But now, that I am disassociated from TGI , by running asm code from separate Asm file, I can go and try to get the Sprite without penpalette working again.

 

I am presuming the HW must waste some bandwidth on reading palette. Which obviously must add up quickly. But, it's the first working version so it's good enough for now...

Handy should give on any PC the same results. Though it is not 100% cycle accurate, the frame time is.

 

Anyway, a reason is probably the palette. For a single colored line, a two pen palette would be sufficient.

BTW: You doubly send CPU to sleep in the sprite drawing routine. Changing the first "stz fd91" into NOPs results in 517 lines! Compared to 257!

 

Edited by 42bs
Fix address
  • Like 1
Link to comment
Share on other sites

Wow. That's interesting behavior :) Oh, it probably flip flopped? I am utterly confused how the flag behaves. Would help if I had real HW...

 

The first stz is outside of loop, second is inside the waiting loop.

 

So, am I actually drawing double amount of scanlines now on Suzy?

Link to comment
Share on other sites

1 minute ago, VladR said:

Wow. That's interesting behavior :) Oh, it probably flip flopped? I am utterly confused how the flag behaves. Would help if I had real HW...

 

The first stz is outside of loop, second is inside the waiting loop.

 

So, am I actually drawing double amount of scanlines now on Suzy?

Nope. First "stz $fd91" sends CPU to sleep. It wakes up on the next interrupt or by Suzy if done. Second one sends CPU again to sleep until the next interrupt. Suzy draws only once.
The loop is because sleep is broken:http://www.monlynx.de/lynx/lynx10.html#_18

 

Link to comment
Share on other sites

Damn, that's probably the most hilarious fuc*-up I ever pulled :lol:

 

It's kinda evil, because it's just drawing same line over and over, so impossible to notice otherwise.

 

Really, really thanks. I will test it tomorrow, going to sleep now...

Link to comment
Share on other sites

1 hour ago, karri said:

On Mednafen I got this:

2070435149_Screenshotfrom2019-07-0212-53-31.png.45ae5d29c95bfb68b2c26c61fe1bdcab.png

 

Now when you mention the penpal in the sprite. I tried to use a sprite without a penpal but never got it working. There could be something wrong in the tgi library. You might want to re-write this part of the tgi library. It is fairly simple to grab the lynx-160-102-16.s file from the cc65 sources, change the segment from "JUMPTABLE" to "CODE", add a label _lynx_160_102_16: in front of the jumptable, export that label and add it to your Makefile. It will then replace the stock-driver. Or you could just call your own asm draw routine instead of tgi_sprite(). As you can see from the code the CPU polls for SPRSYS instead of just moving on to do something useful.

 


draw_sprite:                    ; Draw it in render buffer
        sta     SCBNEXTL
        stx     SCBNEXTH
        lda     DRAWPAGEL
        ldx     DRAWPAGEH
        sta     VIDBASL
        stx     VIDBASH
        lda     #1
        sta     SPRGO
        stz     SDONEACK
@L0:    stz     CPUSLEEP
        lda     SPRSYS
        lsr
        bcs     @L0
        stz     SDONEACK
        lda     #TGI_ERR_OK
        sta     ERROR
        rts


 

There's a source for tgi lib? Damn, that could be gold. All the working code!

 

That waiting code looks exactly like the one I found while searching forums for a hint why my asm Sprite code don't work. 

 

 

I only had the stz , not full loop.

 

Of course, when I copy pasted the loop, the original stz stayed , creating a hilarious evil bug

:lol:

 

Link to comment
Share on other sites

1 hour ago, VladR said:

I cleaned it up, and now am running both codepaths, separately without recompiling (like before), so you can see both results at the same time.

 

I attached it to this post. Not sure how it works with new forum, as it's my first time I attach anything other than pic. But, it does show attached on my end.

lynxproj.lnx 27.98 kB · 3 downloads

 

 

For people who would like to try on real Lynx, this is not a real ROM but a single executable.

You have to create yourself a .lyx or .lnx in order to flash it or use .o option on Bernd's Flashcard.

This is what I did, and here is the result on Lynx 2 :

37		228

37		228
0		0
0		0

I'm pretty sure first number is 37 but due to the fact it is written in 0,0 position, cannot be 100% sure

 

Oh, and the line is not display above numbers

Edited by Fadest
  • Like 1
Link to comment
Share on other sites

 

 

20190702_205614.thumb.jpg.190ebce11ce83351bc1603108d242e93.jpg

 

 

 

@Fadest thx for that hint, fortunately there were no need to modify the file, it works fine with Saint's Lynx SD card

 

Interesting is that my figures are a bit different than Fadest's one:  229 vs 228

 

And also on my Lynx II  there is no a white horizontal line

Edited by Cyprian_K
  • Like 1
Link to comment
Share on other sites

11 hours ago, 42bs said:

Handy should give on any PC the same results. Though it is not 100% cycle accurate, the frame time is.

 

Anyway, a reason is probably the palette. For a single colored line, a two pen palette would be sufficient.

BTW: You doubly send CPU to sleep in the sprite drawing routine. Changing the first "stz fd91" into NOPs results in 517 lines! Compared to 257!

 

Sure, frame time will be same, but number of instructions executed during that 1/60s by PC's CPU isn't.

 

But, clearly, Lynx's emulator coders are smart people and unlike on Jaguar, don't just let the emulation run at full speed of the local CPU, hence the emulator results are actually comparable among different host machines :)

Which feels incredible, btw.

 

I can confirm that removing first stz did, indeed, double the performance (on emulator) to 517 !

 

 

10 hours ago, Fadest said:

For people who would like to try on real Lynx, this is not a real ROM but a single executable.

You have to create yourself a .lyx or .lnx in order to flash it or use .o option on Bernd's Flashcard.

This is what I did, and here is the result on Lynx 2 :


37		228

37		228
0		0
0		0

I'm pretty sure first number is 37 but due to the fact it is written in 0,0 position, cannot be 100% sure

 

Oh, and the line is not display above numbers

OK, two things:

1. I find it hard to believe it would be just 37 (and not 137, or 237 or 337). Suzy is only 16 MHz, not 40 MHz :) I did consider (for 10 seconds) issue of pixel 0,0 being slightly off-screen, but this is my confirmation, so I'll go fix that. On another hand, it appears that the two rows of 0s are visible in full, so perhaps it is, indeed, just 37... but, because of my hilarious stz bug, the Suzy number should be ~double, e.g. ~458 and that's more than an order of magnitude faster, which really seems unrealistic. So, there must be something else going on.

2. What do I do to create the *.lx ? I only have object files in the intermediate directory.

3 hours ago, Cyprian_K said:

 

 

20190702_205614.thumb.jpg.190ebce11ce83351bc1603108d242e93.jpg

 

 

 

@Fadest thx for that hint, fortunately there were no need to modify the file, it works fine with Saint's Lynx SD card

 

Interesting is that my figures are a bit different than Fadest's one:  229 vs 228

 

And also on my Lynx II  there is no a white horizontal line

Thanks for the photo !

 

If you guys can answer my questions before I create another build, it would be great! I'll make sure to display numbers in the safe region and this time create slightly more useful benchmark of drawing a filled quad.

Link to comment
Share on other sites

14 minutes ago, VladR said:

OK, two things:

1. I find it hard to believe it would be just 37 (and not 137, or 237 or 337). Suzy is only 16 MHz, not 40 MHz :) I did consider (for 10 seconds) issue of pixel 0,0 being slightly off-screen, but this is my confirmation, so I'll go fix that. On another hand, it appears that the two rows of 0s are visible in full, so perhaps it is, indeed, just 37... but, because of my hilarious stz bug, the Suzy number should be ~double, e.g. ~458 and that's more than an order of magnitude faster, which really seems unrealistic. So, there must be something else going on.

2. What do I do to create the *.lx ? I only have object files in the intermediate directory.

Thanks for the photo !


1) I have no idea, I didn't check CPU copy code. is it LDA $XX,Y / STA $XX,Y?

2) Your file was fine for me, Saint's Lynx SD card easily accepts that format.

 

what about horizontal line? It is visible under Handy, but not on the real hardware.

Link to comment
Share on other sites

1 minute ago, Cyprian_K said:


1) I have no idea, I didn't check CPU copy code. is it LDA $XX,Y / STA $XX,Y?

2) Your file was fine for me, Saint's Lynx SD card easily accepts that format.

 

what about horizontal line? It is visible under Handy, but not on the real hardware.

The CPU drawing code is like this:

			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny
			sta (160),Y
			iny

I'm gonna post the new code, once it's done.

 

The missing line is disturbing, but for new build I will have top half of screen dedicated to the CPU drawing and bottom screen half for the Suzy. Numbers should be somewhere in the middle. That approach should be solid, in theory...

Link to comment
Share on other sites

Actually, I shouldn't draw the half screen on CPU, because if the value of 37 is indeed correct, then the timer comparison will become invalid, as CPU's slice would be longer than Suzy's. Which would have to be indexed (to be comparable)and there's no need for further confusing the results.

 

So, it will be safer, if it's just couple scanlines, not half of screen. Of course, if I had the Lynx, that'd take me about 10 minutes to figure out by deploying two builds...

Link to comment
Share on other sites

13 hours ago, VladR said:

 

Lynx09_CPUvsSuzy.thumb.GIF.f720e479a258b61a5caa26824098787e.GIF

 

Please run it both on your emulator and Lynx. I want to figure out the reason for this anomaly. I suspect it's the palette within the sprite, but that was the first version of sprite I got working.

 

Thanks!

 

 

Not surprisningly, I get same values as above for Handy on Android.

 

I saw the values others have seen for Mednafen using OpenEmu on Mac (which is Mednafen based).

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...