kool kitty89 Posted December 24, 2009 Share Posted December 24, 2009 So my guess is that the 40MHz rumor started from some 3rd party developer who misinterpreted the manual. Has anyone seen any authoritative quotes on the subject? - KS Thank you. (I though it was Gorf who mentioned it much earlier in the thread, but I'm not sure now) Quote Link to comment Share on other sites More sharing options...
JagChris Posted December 24, 2009 Share Posted December 24, 2009 Yes, a C compiler could generate good GPU/DSP code, but still paging in/out of local memory was needed. ( I think Doom's renderer was compiled as well ) Fixing the main memory bug would allow any C code to run on GPU/DSP - maybe it wouldn't be wonderful optimised assembly, but it would still be faster than 68k asm ( or even 68k C ) and Atari would have a single toolchain to optimise ( or 3rd parties could provide better compilers ) The main memory bug and running C on the riscs has nothing to do with each other. Running gpu out of main is not always faster than the 68k. It would have to be "wonderfully optimized assembly" in order to be useful in main. They are related if you intend to write an entire game in C - not just a group of small routines. It's interesting that you find cases where the 68k is faster than the GPU running from main? What are they? good question. That would be lower than 20% of capacity according to AO's website. Let's look at it from a maths point of view... I believe the speed of the Atari ST 68k at 8MHz was reported as being very roughly about 1MIP... Therefore at 13.3MHz this should be about 1.65MIPS... in theory the GPU can reach 26.6MIPS, in practice this tends to be more like 17MIPS in other words 10x the speed of the 68k, even if we run at 20% its still twice as fast as the 68k and that's not even taking into account the effects on BUS. Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 24, 2009 Share Posted December 24, 2009 They are related if you intend to write an entire game in C - not just a group of small routines. It's interesting that you find cases where the 68k is faster than the GPU running from main? What are they? Are they situations which would run faster if the memory bugs had been fixed? Mainly and usually in small tight loops. Other wise an unrolled loop runs as fast. So careful coding out in main is still always faster then the 68k by plenty. Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 24, 2009 Share Posted December 24, 2009 So my guess is that the 40MHz rumor started from some 3rd party developer who misinterpreted the manual. Has anyone seen any authoritative quotes on the subject? - KS I nay have mentioned another guy who actually reclokced a Jaguar at 40MHZ. Said he put in HoverStrike Cart that ran at a ridiculous 60 FPS for about a minute and then the unit blew up. Quote Link to comment Share on other sites More sharing options...
Crazyace Posted December 24, 2009 Share Posted December 24, 2009 Thanks for the info. Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 24, 2009 Share Posted December 24, 2009 The main memory bug and running C on the riscs has nothing to do with each other. Running gpu out of main is not always faster than the 68k. It would have to be "wonderfully optimized assembly" in order to be useful in main. Holy shit! You ARE still alive! Quote Link to comment Share on other sites More sharing options...
Atari_Owl Posted December 29, 2009 Share Posted December 29, 2009 I've yet to see a case where the 68k was quicker than the GPU running from main Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 29, 2009 Share Posted December 29, 2009 I've yet to see a case where the 68k was quicker than the GPU running from main There is one case where the tight loop senario indeed chokes the GPU. But we are talking very few instructions in between the loop points. Once unrolled it actually ran as fast as it did in local. Quote Link to comment Share on other sites More sharing options...
JagChris Posted December 30, 2009 Share Posted December 30, 2009 I've yet to see a case where the 68k was quicker than the GPU running from main There is one case where the tight loop senario indeed chokes the GPU. But we are talking very few instructions in between the loop points. Once unrolled it actually ran as fast as it did in local. your saying in that situation the 68k could of done it faster? Quote Link to comment Share on other sites More sharing options...
Crazyace Posted December 30, 2009 Share Posted December 30, 2009 I've yet to see a case where the 68k was quicker than the GPU running from main There is one case where the tight loop senario indeed chokes the GPU. But we are talking very few instructions in between the loop points. Once unrolled it actually ran as fast as it did in local. your saying in that situation the 68k could of done it faster? It sounds like a very extreme example though - and the kind of thing that could be optimised even in C code for gpu. ( ie - if there's a tight loop, unroll it a few times in C code ) Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 30, 2009 Share Posted December 30, 2009 Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand. My guess is it has something to do with SMAC not handling something properly. My guess is it has to do with the broken JR instruction hadling. Quote Link to comment Share on other sites More sharing options...
JagMod Posted December 30, 2009 Share Posted December 30, 2009 I've yet to see a case where the 68k was quicker than the GPU running from main There is one case where the tight loop senario indeed chokes the GPU. But we are talking very few instructions in between the loop points. Once unrolled it actually ran as fast as it did in local. your saying in that situation the 68k could of done it faster? It sounds like a very extreme example though - and the kind of thing that could be optimised even in C code for gpu. ( ie - if there's a tight loop, unroll it a few times in C code ) Not extreme at all, in fact, a C compiler would just make this worse as it would usually try to minimize instructions and create this case. Quote Link to comment Share on other sites More sharing options...
JagMod Posted December 30, 2009 Share Posted December 30, 2009 Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand. My guess is it has something to do with SMAC not handling something properly. My guess is it has to do with the broken JR instruction hadling. The problem was with smac and mac. The point is, there is no magical solution to fixing and optimizing all the quirks of the jaguar system. smac helps a lot by allowing code to be executed from main, but it doesn't keep the programmer from writing bad code. Quote Link to comment Share on other sites More sharing options...
Crazyace Posted December 30, 2009 Share Posted December 30, 2009 Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop ) Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 30, 2009 Share Posted December 30, 2009 Its the only example and I am pretty sure it was done using the SMAC assembler and not by hand. My guess is it has something to do with SMAC not handling something properly. My guess is it has to do with the broken JR instruction hadling. The problem was with smac and mac. The point is, there is no magical solution to fixing and optimizing all the quirks of the jaguar system. smac helps a lot by allowing code to be executed from main, but it doesn't keep the programmer from writing bad code. And this is exactly why I've said time and time again, write code for the J-RISC's out in main by hand assembly as anything else is bound to cause you headaches. Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 30, 2009 Share Posted December 30, 2009 Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop ) Um...it was assembler using SMAC...it has nothing to do with gcc. GCC would not help at all, nor would any C compiler. Thre are going to be several places, no C compiler will avoid such a tight loop. Quote Link to comment Share on other sites More sharing options...
JagMod Posted December 30, 2009 Share Posted December 30, 2009 Interesting. What code snippet was it? ( 'unroll loops' in gcc generally increases the number of instructions within a loop ) Um...it was assembler using SMAC...it has nothing to do with gcc. GCC would not help at all, nor would any C compiler. Thre are going to be several places, no C compiler will avoid such a tight loop. You can tell the C compiler to unroll loops, but then as Owl has suggested many times, everything is a trade off. If you always unroll loops, now your code grows significantly. Quote Link to comment Share on other sites More sharing options...
Crazyace Posted December 30, 2009 Share Posted December 30, 2009 Just curious as to what it is , as you said a C compiler would only make it worse? Quote Link to comment Share on other sites More sharing options...
Crazyace Posted December 30, 2009 Share Posted December 30, 2009 also, is it slower because of the main code workaround - or would it still be slower if the gpu didn't have the main code bug. Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 30, 2009 Share Posted December 30, 2009 (edited) also, is it slower because of the main code workaround - or would it still be slower if the gpu didn't have the main code bug. No it is slower do to bad coding techniques. If you were able to reasonably swicth the DRAM to SRAM, there would be no slowdown at all as long as you wrote the code properly. The only slowdown would be from the other parts of the system using the main bus RAM. Edited December 30, 2009 by Gorf Quote Link to comment Share on other sites More sharing options...
Crazyace Posted December 30, 2009 Share Posted December 30, 2009 Switching DRAM to SRAM would be extremely unlikely though I was just curious as to what the routine would be ( all I could think of that might be slower than 68k in main ram might be some kind of polling loop ) Quote Link to comment Share on other sites More sharing options...
Gorf Posted December 30, 2009 Share Posted December 30, 2009 Switching DRAM to SRAM would be extremely unlikely though I was just curious as to what the routine would be ( all I could think of that might be slower than 68k in main ram might be some kind of polling loop ) Like I said if you could do it reasonably. But yeah its like a polling type loop which is not something I'd bother doing on the GPU. Use the interrupts if your are waiting on the blitter or OPL with the J-Risc's. In fact Im willng to bet that the problem JagMod was dealing with was waiting on either those two in a polling type fashion. Not something I recommend and also not something SMAC is good at dealing with. As excellent an app as SMAC is, it stillhas a few issues that need attention beforeI'd use it for my code. Quote Link to comment Share on other sites More sharing options...
JagMod Posted December 31, 2009 Share Posted December 31, 2009 No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k. As soon as I put multiple load/stores into the loop, it got faster. But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate. I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster. Quote Link to comment Share on other sites More sharing options...
Atari_Owl Posted December 31, 2009 Share Posted December 31, 2009 No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k. As soon as I put multiple load/stores into the loop, it got faster. But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate. I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster. Ahh - I'd never tried such tight loops - which would explain why i hadn't seen that effect Quote Link to comment Share on other sites More sharing options...
Gorf Posted January 1, 2010 Share Posted January 1, 2010 No, I was just copying memory in a tight loop, one load, one store, and the gpu was slower running from main than the 68k. As soon as I put multiple load/stores into the loop, it got faster. But if you are writing something like a strcpy in C for the gpu, that's the kind of assembly the compiler will generate. I'm still a proponent for gpu in main ram, but you have to be careful what you are doing, not everything running on the gpu is faster. Ahh - I'd never tried such tight loops - which would explain why i hadn't seen that effect I've never had any such bus hammering experiences in main....then again I dont really run many(or any) tight loops out there. I have at least a dozen or more instructions in between the loop. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.