Jump to content

new_bjl update: polygon


Recommended Posts

18 minutes ago, SCPCD said:

many cycles are lost due to register write back conflicts

I started to heavily reorder to reduce stalls. But there was not much benefit. At least non visible.


I tried to move the "LOAD" of the next X/Y after the divide

, but this resulted in a glitches. So there seems to be another bug in the GPU.

21 minutes ago, SCPCD said:

Not exactly : HIDATA is crashed after any external LOAD, but is keeped for any STORE(internal & external) and internal LOAD.

I have different experience. But need to find out. Maybe I did something weird.


23 minutes ago, SCPCD said:

ou can check my ST2JAG optimised code

I will!

Link to comment
Share on other sites

What I've done with vector processors in the past (like on the Playstation) is unroll the loop and process however deep your pipeline is, on the different stages of your calculations (in different registers). You can eliminate the stalls, still keep your code readable, and if you have an instruction cache, use that as a constraint of how much to unroll.


Normally the divide pipelines are pretty deep, so you can hide those stalls by not using the results right away, and do a bunch other logic needed for your loop, before finishing your perspective correction.

Link to comment
Share on other sites

There can be only one divide which takes 16 cycles. So these must be filled somehow before the result is use (and the next divide may be issued).

Maybe it is possible to do the next rotation/transformation. But I doubt that the resulting code is readable. ?

Link to comment
Share on other sites

That is what I mean, you can do the beginning stages of the next vertex while waiting for the previous to complete, then grab the result.


You can keep it readable if you disregards some of the advice around not commenting and document what you are doing. :)

Link to comment
Share on other sites

7 hours ago, 42bs said:

I still think, the biggest "killer" are the two divides per projected point. In a real 3D game, there are likely more ways to optimize this.

Those two divides seem to be far outweighed by other instructions, unless i'm missing something.


You might be able to define a normal for each poly/face, then for each frame:
1) rotate each polys normal
2) analyze each polys normal to determine if poly will be hidden
3) rotate only vertices that corresponds to non-hidden polys
4) project only vertices that correspond to non-hidden polys

Would require some reorganization of data layout probably.

Probably speed up objects with high number of polygons (eg: kugel).  Probably slow down objects with fewer polygons (eg: cube).

Link to comment
Share on other sites

4 minutes ago, 42bs said:

Oh, no. I am to old to learn writing comments ?

Never too old to learn anything :)


How I approach these types of functions is to write out one iteration, with NOPs filling in the pipeline before being able to use the results, for each instruction. Then get this working properly. Next I look at the NOPs and see how many times I can unroll the loop, filling in (preferably the same operation) for the next vertex, using different registers. Taking care of some house keeping leading-in and leading-out operations, you can get the optimal use of the processor cycles.


Though I've never done this on the Jaguar's RISC processors, I have on other architectures. I wouldn't discount all the other stalls and focus only on the divides.. the others might add up too. 

  • Like 1
Link to comment
Share on other sites

2 minutes ago, jguff said:

2) analyze each polys normal to determine if poly will be hidden

Good point. I do it after projection. But should rather do it after rotation/translation.


But this means re-ordering the code heavily from point orientated to face oriented. And also have to cache points which have been rotated/projected for other faces.

Link to comment
Share on other sites

2 minutes ago, selgus said:

with NOPs filling in the pipeline

Yes, I often did so. And then it hurts, if you still have some NOPs left, and no idea how to replace them.

For an intro (not yet released) I decided to calculate 4 pixels in parallel to keep the pipeline filled as much as possible.

But sometime the Jaguar bites back and what seems to be a valid interleaving makes things worse.



  • Like 1
Link to comment
Share on other sites

I would like to report on my attempts to compile @42BS's poly_mmu_68k source from:


with further advice from Bastian his code needs his modified rmac( that provides option -4) and rln  and his latest

lyxass that can be obtained from his github repository.

The variable BJL_ROOT needs to be declared( see export command)

I have to say that I'm a user of cygwin64 under Windows8.1 and this causes an "can't find file or directory" error that breaks

the build.

If anyone wants to try this I would be interested to hear your results.




  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...