Jump to content
IGNORED

Lynx 3D Experimenting


VladR

Recommended Posts

3 hours ago, enthusi said:

Yes, in particular DIVs are much faster with well established table-math. Actually, the log or square tables are faster than the SUSY mul as well, just less parallel.

MUL appears pretty darn fast. Although, we should probably also count the consumed cycles from all those six registers that need to be set up. Seven, if we go for signed math. That's going to add up, for sure. Although, I suspect, one shouldn't have to write out the register that turns on signed math after every operation, right ?

 

I just noticed in docs there's some accumulation that's also possible. Does it mean I can write out to those registers from CPU, add some values to it, and MUL/DIV will just process it ? That would be real handy, if it worked like that. If it's limited to the output of MUL/DIV, then it'd suck, as I'd have to extract the value from there, and process it in RAM, burning yet more cycles...

 

BTW, I'm not complaining, I love that it's far from straightforward to extract full power from this toy :)

 

And let's face it, there are not many technically advanced games since the commercial era for the Lynx.

Yeah, it's not like the Lynx community is flooded with coders.

karri looks like he's pretty close with his space shooter game, though! He clearly put a lot of effort into it so far.

3 hours ago, Cyprian_K said:

yep, but it is for free, you can do other things in parallel

True, that's probably the most appealing feature. While not much can be done during MUL (well, almost nothing in 11c - that's like CLC, LDA, ADC, STA - so about one addition), quite a lot could be done during division.

Unfortunately, during 3d transform stage, where it's most needed, there's not much else to do.

Perhaps if I kept processing 2 vertices in parallel - e.g. write out the first one, while the second one is undergoing MUL/DIV...

I'll keep thinking about it, but must resist playing with it, otherwise I'll burn all the time just on the engine...

 

One other consideration is that, unlike on regular 64 KB Atari, which consumes just 7.5 KB for 2 Framebuffers (160x96), here we loose ~16 KB. So, it's much harder to fit big MUL.DIV tables, unless you cut features elsewhere.

 

Then again, there's games like Star Raiders, which were just 8 KB, IIRC...

Link to comment
Share on other sites

6 hours ago, VladR said:

Thanks. I presume you used 64 us timers for this measurement ? You also used $f192 few times, which is surely just a typo and you meant $fC92, correct ?

 

Yes, 64µs timer. No, no typo. As I wrote, I first run with dummy addresses so I get the time for the STAs and LDAs.

Link to comment
Share on other sites

1 minute ago, enthusi said:

I use a *10 table for example usually. As short as LDA timesten,X.

Not generic tables for all sorts of multiplications. I saw some overuse of SUSY math on such simple constants sometimes.

Yes, perfectly right.
I see the same in my day to day business: Programmers use "float" where a simple fixed-point math would do. There is no feeling for elegant code today.

  • Like 1
Link to comment
Share on other sites

8 hours ago, 42bs said:

Yes, 64µs timer. No, no typo. As I wrote, I first run with dummy addresses so I get the time for the STAs and LDAs.

Oh, right. Now I see.

7 hours ago, 42bs said:

Replacing MUL by a table lookup does not seem to save cycles, but replacing HW-DIV by invers-table + HW-MUL saves a lot of cycles.

Yeah, not for generic mul. But for a specific constant(e.g. 5x, 13x,etc.), that you need for a certain effect at splash screen or other effect, I often use [disposable] LDA multable,X as they're super fast to generate anyway and don't have to occupy the RAM whole time.

5 hours ago, enthusi said:

I use a *10 table for example usually. As short as LDA timesten,X.

Not generic tables for all sorts of multiplications. I saw some overuse of SUSY math on such simple constants sometimes.

Yeah, surprisingly, that kind of table even resulted in faster execution on Jaguar. Well, as long as the code was on 68000, as GPU can do MUL.

Link to comment
Share on other sites

I just spent a day trying to debug why certain vertices get messed up and I'm hoping I merely messed up the Suzy register set-up, but for some reason I can't seem to get the signed division to produce correct results despite signed bit being set (and signed multiplies working just fine). Not even in C.

 

I did notice that when the MUL result is in HGFE, for the division I cannot rewrite the H otherwise division doesn't work. Hoping it's some kind of quirk like that (just slightly different).

 

Somebody please confirm to me, that Suzy does signed division, given all the talk about "optional signed math" in docs.

Link to comment
Share on other sites

Wow, that just sucks. The only reason I entertained an idea of using Suzy for math is handling signed math (I got fooled by the "sign bit"), but it can actually only handle the part that I don't really need to (the signed mul).

 

Awesome. I should have stayed with my div tables from 6502. At least that solution was fully debugged and working, I just wanted less steep 3D perspective because I figured it's doing 32-bit div, so I can afford longer view distance without skewing it. I could have that a week ago.

 

Now what ? Introduce yet another slow-down via per-vertex condition and resulting conversion ? On top of all the overhead of its registers and wait loop ? I intentionally use a power-of-two FrontPlane, so I can do mul in just few 2-cycle shifts. And with the slightly skewed 8-bit clipspace, I do a division via a simple LDA zTable,X. So, this is truly ridiculous, but it's a learning experience with new HW...

 

 

Thanks for confirmation. Apologies for frustration. I really need to think what to do now. Butcher the view distance ? Further slow-down the transform stage ? Not sure now...

Link to comment
Share on other sites

9 hours ago, karri said:

Now I feel lucky that I did not choose Stardreamer for the competition.

Given the sheer magnitude of Stardreamer (mainly the work still remaining) and the fixed timeframe,  I'm sure you very quickly came to the proper conclusion as to exactly how realistic it was  :)

Seems like I just learned why my 3D engine sucks.

I'm only complaining vocally, so that the next guy who chooses to do this on Lynx, has only one thread to go through and has all the information upfront thus avoiding the costly missteps.

?

Did you also get fooled by the "signed math bit" ?

Perhaps we should have a complaint evening over  pint at PRGE.

Portland Late Summer Beer BitchFest ? ? I'm in :)   I simply cannot, in good conscience, pass on an opportunity to have a drink with Finn. I only have the best memories of my former Finn colleagues :)

  • Like 1
Link to comment
Share on other sites

16 hours ago, karri said:

Perhaps we should have a complaint evening over  pint at PRGE.

16 hours ago, karri said:

 Perhaps we should have a complaint evening over  pint at PRGE. Now I feel lucky that I did not choose Stardreamer for the competition.

 

6 hours ago, VladR said:

Portland Late Summer Beer BitchFest ? ? I'm in :) 

 

Me too. I'm up for that!

  • Like 1
Link to comment
Share on other sites

3 hours ago, bhall408 said:

Me too. I'm up for that!

Awesome ?  ! I was just thinking of PM'ing you, but you read this anyway :)

 

I'll defer to the local expertise of JagChris for selection of a proper venue. He might join us too.

  • Like 1
Link to comment
Share on other sites

I was debugging the issue of random "illegal opcode at PC = $XYZW" messages I was receiving from emulator. By narrowing the sections of code down, I traced it to drawing the scanline via Suzy (though not 100% reproducible).

 

The moment the (xpos+xl) >135, garbage starts appearing across screen, although the original scanline will also appear drawn. The garbage leads me to believing that that's the source of illegal opcode, as eventually the screen will become overdrawn with garbage (as would other memory, hence illegal opcodes). This could explain why my previous public build didn't show HW-drawn scanlines (only SW-drawn scanlines).

Since 135 < 160, it's not even related to the Suyz's capability of clipping, but something entirely different.

 

Has anybody experienced the above behavior or has any ideas ?  The only suspicion I have right now is that it might be somehow related to the concept of "world/imaginary" window coordinate system. But, as I haven't changed the default values, that shouldn't be the core issue (unless you are supposed to initialize it, which makes no sense, as then it shouldn't even work in the first place (assuming emulator is not botched in that particular use case)).

Link to comment
Share on other sites

3 hours ago, Cyprian_K said:

would be possible to isolate somehow a small part of the code with that issue?

Sure, here's my DrawScanline method:

.proc _draw_scanline_asm_suzy
	; dbg values
;	lda #50
;	sta _single_pixel_sprite_asm_ypos
		; scanline length
	lda #63
	sta _single_pixel_sprite_asm_xl+1
	lda #00
	sta _single_pixel_sprite_asm_xl

	FoldStart ; vidPtr : Unless Suzy thrashes these values, this should be set just once per frame
			; VIDBASL/H
		lda _vidIdx
		beq set_two
			set_one:
		lda #24
		ldx #224
		sta $FC08	; VIDBASL,H Base Address of Video Build Buffer
		stx $FC09
		bra set_done
			set_two:
		lda #56
		ldx #192
		sta $FC08	; VIDBASL,H Base Address of Video Build Buffer
		stx $FC09
			set_done:
	FoldEnd

	FoldStart	; SCBNEXTL/H: : Unless Suzy thrashes these values, this should be set just once per frame
		lda #<(single_pixel_sprite_asm)
		ldx #>(single_pixel_sprite_asm)
		sta $FC10	; SCBNEXTL,H Address of Next SCB
		stx $FC11
	FoldEnd

	lda #01
	sta $FC91	; SPRGO
	stz $FD90	; SDONEACK

	loop00:    
		stz $FD91	; CPUSLEEP : Uncommenting this appears to invoke "illegal opcode $4F at PC=$0fC1
		lda $FC92	; SPRSYS
		lsr
	bcs loop00
	stz $FD90	; SDONEACK
	
	rts
.endproc
2 hours ago, Fadest said:

Which values for xpos and xl ?

And which type (char, unsigned char, int) ?

 

For example: xl=64, xp=72, yp = 50. Note that the scanline still appears on screen (at correct color, position and length), but random garbage starts filling the screen and shortly afterwards, the illegal opcode warning (which I presume is because not only FrameBuffer got thrashed, but also regular memory).

 

My transition to ASM rendering code is almost finished so those variables are declared from within ASM :

       .bss 
_vidIdx:					.res 1

_xpStart:					.res 1
_xpLeft:					.res 1
_xpRight:					.res 1
_ypScanline:				.res 1

_colorScanline:				.res 1
_colorScanlineByte:			.res 1
_colorScanlineLeft:			.res 1
_lenScanline:				.res 1

These are the related C accessors (mostly for print-style debugging and transition C->Asm):

extern unsigned char		xpStart, xpLeft, xpRight, ypScanline, colorScanline, colorScanlineLeft, colorScanlineByte, lenScanline;

I should also mention that my clip-space (last pipeline stage where the screen coordinates are still 16-bit) guarantees there are no negative values and I only pass 8-bit values <0,255> into the next pipeline stage (those are the ones that are stored). I triple confirmed the values. I can print all the pairs per each scanline for each quad to make sure there's no guessing.

 

 

Obviously, if I had the real HW, I would keep on debugging (assuming that would even be happening on HW in the first place). But, I'm not fully convinced this is best use of the time, so I figured I'd ask first.

 

The concept of "virtual windows" or whatever they are called - does one have to set any values for that ? Yes ? No ? The best guess I can pull out of my ass is No.

 

Link to comment
Share on other sites

I just spent an hour writing a scanline rasterizer in C that PEEK/POKEs its way through the scanline, one byte at a time (including handling of the edges). So, it's obvious that the scanline values (xp,yp,len) and all 3D pipeline stages are working correct:

 

Lynx12_SW_Scanline.thumb.PNG.13b93dc9e3ed4ac9e814eb6e61e721a7.PNG

 

It's something to do with the Suzy.

  • Like 1
Link to comment
Share on other sites

VladR, if you like, I can run your code on real HW if it is an ".o" file, not a complete .lnx file as I cannot yet flash cards.
"real" Suzy can only write to the video buffer and I would wonder if Handy has such a grave bug and nobody detected it so far.

  • Like 1
Link to comment
Share on other sites

10 minutes ago, 42bs said:

VladR, if you like, I can run your code on real HW if it is an ".o" file, not a complete .lnx file as I cannot yet flash cards.
"real" Suzy can only write to the video buffer and I would wonder if Handy has such a grave bug and nobody detected it so far.

I can also test on real HW (I have a "saint" SD card.  Obviously nothing I receive would ever be shared.

Link to comment
Share on other sites

I can also test on real hardware. I have multiple Lynxes, and multiple ways to test.

 

Side note: PRGE is looking good so far this year (I hated not being able to go last year). So I look forward to meeting any of you who can come.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...