42bs Posted July 11, 2022 Share Posted July 11, 2022 I have an algo which I prototyped in Processing and ported to GPU. Below three pictures are the same algo. Weird: VJ looks more or less identical, but running on real HW looks completely different? Processing: VirtualJaguar The real thing (TM): Quote Link to comment Share on other sites More sharing options...
jguff Posted July 12, 2022 Share Posted July 12, 2022 8 hours ago, 42bs said: I have an algo which I prototyped in Processing and ported to GPU. Below three pictures are the same algo. Weird: VJ looks more or less identical, but running on real HW looks completely different? Processing: VirtualJaguar The real thing (TM): Q1: What does input to the algorithm look like? Q2: What is algorithm supposed to do? Q3: The slanted horizontal(ish) line (below numbers) in The Real Thing is just an optical illusion created by snapping photo of tv/monitor at an angle, correct? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 12, 2022 Author Share Posted July 12, 2022 It is going to be a new demo for Sillyventure, so I will not disclose yet much. So much: The algo creates this image by doing simple adds, subs and shifts. Regarding Q3, it is not optical illusion, the picture generated on the Jaguar really looks completely different and has this horizontal line. Quote Link to comment Share on other sites More sharing options...
Zerosquare Posted July 12, 2022 Share Posted July 12, 2022 Have you double-checked that all memory is initialized? Quote Link to comment Share on other sites More sharing options...
42bs Posted July 12, 2022 Author Share Posted July 12, 2022 25 minutes ago, Zerosquare said: Have you double-checked that all memory is initialized? Yes. I even changed it from zeroing to some other value. Normally I use the blitter to clear memory, so switched to 68k loop. Last what I did not yet check is, to remove the IRQ code from GPU and move OBL update back to the 68k. Quote Link to comment Share on other sites More sharing options...
Zerosquare Posted July 12, 2022 Share Posted July 12, 2022 If you can't do single-stepping, what I'd do is write a bit of code that computes a checksum of your memory buffer, and display it. Simplify/remove parts of code until you get the same results on VJ and real hardware. 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 12, 2022 Author Share Posted July 12, 2022 4 minutes ago, Zerosquare said: f you can't do single-stepping, what I'd do is write a bit of code that computes a checksum of your memory buffer The idea with the checksum is neat! Quote Link to comment Share on other sites More sharing options...
42bs Posted July 12, 2022 Author Share Posted July 12, 2022 This code behaves differently on VJ than on HW: add x,tmp0 add y,tmp1 and mask,tmp0 and mask,tmp1 add map_base,tmp0 shlq #10,tmp1 add tmp0,tmp1 jump (LR) loadb (tmp1),r0 This subroutine is called about 200k times but only one call makes the problems where tmp0 and tmp1 are both zero?! mask is 1023, x/y range from 0 to 1023+delta. 1 Quote Link to comment Share on other sites More sharing options...
42bs Posted July 12, 2022 Author Share Posted July 12, 2022 (edited) Ok, again a pipeline/score board issue Original code: moveq #0,tmp0 moveq #0,tmp1 BL (GET_H) move r1,ptr0 move step,r0 GET_H is above subroutine which read r0. But only once in the loop, r0 is not used, GET_H is only used to get the pointer into tmp1/r1. The following "move step,r0" ends _before_ the "loadb" and therefore r0 gets overwritten by the memory contents (which is 0). I really should make MACROs for load which just writes the loaded value to some dummy register. Edited July 12, 2022 by 42bs 4 Quote Link to comment Share on other sites More sharing options...
ArneCRosenfeldt Posted July 31, 2022 Share Posted July 31, 2022 Maybe also change the coding style? Cyclomatic complexity is evil, even though branches seem to be cheap on JRISC. Immutable variables are best. We have 64 registers on the Jag. So it should be possible not to reuse them too often. Anyway, no I learned: SRAM seems to be fast, but it is not guaranteed to. So better try to cram everything into registers and sadly reuse them or even pack data. 4 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.