new_bjl update: polygon

42bs · May 25, 2022

18 minutes ago, SCPCD said:

many cycles are lost due to register write back conflicts

I started to heavily reorder to reduce stalls. But there was not much benefit. At least non visible.

I tried to move the "LOAD" of the next X/Y after the divide

, but this resulted in a glitches. So there seems to be another bug in the GPU.

21 minutes ago, SCPCD said:

Not exactly : HIDATA is crashed after any external LOAD, but is keeped for any STORE(internal & external) and internal LOAD.

I have different experience. But need to find out. Maybe I did something weird.

23 minutes ago, SCPCD said:

ou can check my ST2JAG optimised code

I will!

+selgus · May 25, 2022

What I've done with vector processors in the past (like on the Playstation) is unroll the loop and process however deep your pipeline is, on the different stages of your calculations (in different registers). You can eliminate the stalls, still keep your code readable, and if you have an instruction cache, use that as a constraint of how much to unroll.

Normally the divide pipelines are pretty deep, so you can hide those stalls by not using the results right away, and do a bunch other logic needed for your loop, before finishing your perspective correction.

42bs · May 25, 2022

There can be only one divide which takes 16 cycles. So these must be filled somehow before the result is use (and the next divide may be issued).

Maybe it is possible to do the next rotation/transformation. But I doubt that the resulting code is readable. ?

+selgus · May 25, 2022

That is what I mean, you can do the beginning stages of the next vertex while waiting for the previous to complete, then grab the result.

You can keep it readable if you disregards some of the advice around not commenting and document what you are doing.

jguff · May 25, 2022

7 hours ago, 42bs said:

I still think, the biggest "killer" are the two divides per projected point. In a real 3D game, there are likely more ways to optimize this.

Those two divides seem to be far outweighed by other instructions, unless i'm missing something.

You might be able to define a normal for each poly/face, then for each frame:
1) rotate each polys normal
2) analyze each polys normal to determine if poly will be hidden
3) rotate only vertices that corresponds to non-hidden polys
4) project only vertices that correspond to non-hidden polys

Would require some reorganization of data layout probably.

Probably speed up objects with high number of polygons (eg: kugel). Probably slow down objects with fewer polygons (eg: cube).

42bs · May 25, 2022

18 minutes ago, selgus said:

document what you are doing

Oh, no. I am to old to learn writing comments ?

+selgus · May 25, 2022

4 minutes ago, 42bs said:

Oh, no. I am to old to learn writing comments ?

Never too old to learn anything

How I approach these types of functions is to write out one iteration, with NOPs filling in the pipeline before being able to use the results, for each instruction. Then get this working properly. Next I look at the NOPs and see how many times I can unroll the loop, filling in (preferably the same operation) for the next vertex, using different registers. Taking care of some house keeping leading-in and leading-out operations, you can get the optimal use of the processor cycles.

Though I've never done this on the Jaguar's RISC processors, I have on other architectures. I wouldn't discount all the other stalls and focus only on the divides.. the others might add up too.

42bs · May 25, 2022

2 minutes ago, jguff said:

2) analyze each polys normal to determine if poly will be hidden

Good point. I do it after projection. But should rather do it after rotation/translation.

But this means re-ordering the code heavily from point orientated to face oriented. And also have to cache points which have been rotated/projected for other faces.

42bs · May 25, 2022

2 minutes ago, selgus said:

with NOPs filling in the pipeline

Yes, I often did so. And then it hurts, if you still have some NOPs left, and no idea how to replace them.

For an intro (not yet released) I decided to calculate 4 pixels in parallel to keep the pipeline filled as much as possible.

But sometime the Jaguar bites back and what seems to be a valid interleaving makes things worse.

jguff · May 25, 2022

I'm worst case numberwanging 6000 cycles: 2 divs per point, 150 points, 20 cycles per div.

Is my numberwang wrong?

42bs · May 27, 2022

No, I think it is correct. But in a large scene one has likely more than just 150 points.

Positron5 · May 27, 2022

As an experiment I tried to compile poly_mmu from the makefile and lyxass2022( also rmac) and I got a load of errors.

One such warning was file poly_mmu equ is missing. ??

42bs · May 28, 2022

Just tried the current version on github together with the latest lyxass,rmac and rln: No problem.

.equ is generated when assembling the .js file.

Positron5 · May 28, 2022

35 minutes ago, 42bs said:

Just tried the current version on github together with the latest lyxass,rmac and rln: No problem.

.equ is generated when assembling the .js file.

That's good, it is probably a problem with my tool chain.

42bs · May 29, 2022

If you cannot get it to compile, feel free to pm the output.

Positron5 · May 29, 2022

pm sent

Positron5 · June 5, 2022

I would like to report on my attempts to compile @42BS's poly_mmu_68k source from:

https://github.com/42Bastian/new_bjl/tree/main/exp/poly_mmu

with further advice from Bastian his code needs his modified rmac( that provides option -4) and rln and his latest

lyxass that can be obtained from his github repository.

The variable BJL_ROOT needs to be declared( see export command)

I have to say that I'm a user of cygwin64 under Windows8.1 and this causes an "can't find file or directory" error that breaks

the build.

If anyone wants to try this I would be interested to hear your results.

new_bjl update: polygon

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members