GPU in main, SCIENCE!

Chilly Willy · March 7, 2014

Just gotta toss my own two cents in...

First, I've seen a couple people that just constantly scream "BCD, BCD, BCD!" Yes, the 68000 is much faster at BCD than the GPU would be in main... hell, I'd bet the 68000 is faster at BCD than the GPU in LOCAL! The answer is DON'T DO BCD ON THE GPU!! There are nearly always ways to get rid of BCD operations, and when there isn't? I'd make the 68000 do the BCD operation. It's just that simple. If the 68000 can do the job faster, use it. If it can't, don't!

Second, a couple people seem to think it's simplicity itself to divide the code into 4K or even 2K chunks. Would that it were so... I like using Doom as an example as I'm very familiar with Doom in general (having worked on ports to several different platforms) and Doom on the Jaguar in particular (since it forms the base code for the 32X version as well, and I'm a big 32X coder). Doom on the Jaguar divides the task of running the game between the 68000 and the GPU, with each being roughly able to run at the same time. The 68000 runs nearly all the game logic, while the GPU runs nearly all the rendering code. Remember that Carmack himself worked on this code. The rendering code had to be broken into NINE SEGMENTS of 4K each. The 68000 runs all the game logic, starts the first GPU segment, then waits until the third segment is done. By that time, the rendering has finished using the game variables that will be changed by the game logic (player position, etc). The 68000 then starts on the next tick worth of game logic. Each GPU segment interrupts the 68000 to blit the next segment into local ram, with the last segment triggering the 68000 to build the next OP list. The GPU uses the blitter as it goes where it can for drawing. If the 68000 finishes the game logic before the rendering is done, it must wait on the rendering. So the 68000 has one spot where it always waits, and another spot it MIGHT wait. It's possible the GPU might finish the rendering before the 68000 finishes the game logic (for example, your nose right up against a plain wall), and so it has one place it MIGHT wait as well.

So we see a few things here for a decently complex game: First, dividing the tasks between the 68000 and GPU is not trivial, and you can wind up with either or both busy looping, waiting on the other. Second, dividing the code the GPU needs to run into 4K chunks isn't trivial - forget about the 2K nonsense... unless the code IS trivial. You'll know (as the programmer) whether or not the GPU code is trivial to divide up. Third, certain tasks are easier/faster on one processor than the other, local ram or main be damned. Use the proper processor for the tasks! If you have BCD you can't code around, use the 68000!

The benchmark (and I use the term loosely) in the opening post is rather misleading, especially for the big conclusion drawn from it. It does a trivial task in local ram, then the same trivial task in main ram. Naturally, the local ram will blow the doors off the main ram, so it wins handily. From this one trivial result, it's then assumed that no possible task could ever be so much better in main ram as to overcome the speed of the same task in local ram... which starts by assuming the task can even FIT into the local ram. It also assumes that the task in local ram will never change, and that any speed lost in loading the local ram is trivial compared to the task. That may be true... or it may be false. It's simply ignored. It also ignores the fact that for many operations (not BCD, clearly), the GPU in main ram will be faster than the 68000. You don't need MIPS to figure that out, just look at the bandwidth - the 68000 never fetches more than a single word at a time, and the bus master doesn't cache any data for the 68000. The GPU fetches longs at a time, and the bus master is designed to optimize two long fetches at a time where possible. Unless your code deals exclusively with bytes or words, the GPU WILL be faster simply because it better utilizes the bandwidth of the main ram. This is another example of matching the task to the proper processor. If your code is fairly easy to run in parallel and/or uses lots of bytes or words, you probably want to use the 68000 instead of the GPU for that code. If the GPU needs to wait on the 68000, and the code the 68000 is running is working on mostly words and longs, it may be better to run the code on the GPU in main ram. If it fits in local memory, it's probably better to make it into another local ram segment and upload it. If it DOESN'T fit, what are you going to do? Wait on the 68000, or use the GPU in main ram?

My suggestion for benchmarks is this: make a trivial task that is easy to expand into different sizes. Run the task on the 68000, run the task on the GPU in main, and then upload and run the task (in N segments) on the GPU in local ram, and compare the times. Do 4K (so one segment on the GPU in local), then increasing sizes... 8K, 12K, 16K, etc, and see if there is a point at which the extra overhead on the local ram transfer makes running in main better. Maybe change the trivial task as well - make a trivial byte operation, a trivial word operation, a trivial long operation, and a trivial operation that combines all three. Those results would tell us a LOT more than the one present in the opening post.

ggn · March 7, 2014

And your example code + testbench + results to prove your sugar-coated theory is.... where?

Just gotta toss my own two cents in...

First, I've seen a couple people that just constantly scream "BCD, BCD, BCD!" Yes, the 68000 is much faster at BCD than the GPU would be in main... hell, I'd bet the 68000 is faster at BCD than the GPU in LOCAL! The answer is DON'T DO BCD ON THE GPU!! There are nearly always ways to get rid of BCD operations, and when there isn't? I'd make the 68000 do the BCD operation. It's just that simple. If the 68000 can do the job faster, use it. If it can't, don't!

Second, a couple people seem to think it's simplicity itself to divide the code into 4K or even 2K chunks. Would that it were so... I like using Doom as an example as I'm very familiar with Doom in general (having worked on ports to several different platforms) and Doom on the Jaguar in particular (since it forms the base code for the 32X version as well, and I'm a big 32X coder). Doom on the Jaguar divides the task of running the game between the 68000 and the GPU, with each being roughly able to run at the same time. The 68000 runs nearly all the game logic, while the GPU runs nearly all the rendering code. Remember that Carmack himself worked on this code. The rendering code had to be broken into NINE SEGMENTS of 4K each. The 68000 runs all the game logic, starts the first GPU segment, then waits until the third segment is done. By that time, the rendering has finished using the game variables that will be changed by the game logic (player position, etc). The 68000 then starts on the next tick worth of game logic. Each GPU segment interrupts the 68000 to blit the next segment into local ram, with the last segment triggering the 68000 to build the next OP list. The GPU uses the blitter as it goes where it can for drawing. If the 68000 finishes the game logic before the rendering is done, it must wait on the rendering. So the 68000 has one spot where it always waits, and another spot it MIGHT wait. It's possible the GPU might finish the rendering before the 68000 finishes the game logic (for example, your nose right up against a plain wall), and so it has one place it MIGHT wait as well.

So we see a few things here for a decently complex game: First, dividing the tasks between the 68000 and GPU is not trivial, and you can wind up with either or both busy looping, waiting on the other. Second, dividing the code the GPU needs to run into 4K chunks isn't trivial - forget about the 2K nonsense... unless the code IS trivial. You'll know (as the programmer) whether or not the GPU code is trivial to divide up. Third, certain tasks are easier/faster on one processor than the other, local ram or main be damned. Use the proper processor for the tasks! If you have BCD you can't code around, use the 68000!

The benchmark (and I use the term loosely) in the opening post is rather misleading, especially for the big conclusion drawn from it. It does a trivial task in local ram, then the same trivial task in main ram. Naturally, the local ram will blow the doors off the main ram, so it wins handily. From this one trivial result, it's then assumed that no possible task could ever be so much better in main ram as to overcome the speed of the same task in local ram... which starts by assuming the task can even FIT into the local ram. It also assumes that the task in local ram will never change, and that any speed lost in loading the local ram is trivial compared to the task. That may be true... or it may be false. It's simply ignored. It also ignores the fact that for many operations (not BCD, clearly), the GPU in main ram will be faster than the 68000. You don't need MIPS to figure that out, just look at the bandwidth - the 68000 never fetches more than a single word at a time, and the bus master doesn't cache any data for the 68000. The GPU fetches longs at a time, and the bus master is designed to optimize two long fetches at a time where possible. Unless your code deals exclusively with bytes or words, the GPU WILL be faster simply because it better utilizes the bandwidth of the main ram. This is another example of matching the task to the proper processor. If your code is fairly easy to run in parallel and/or uses lots of bytes or words, you probably want to use the 68000 instead of the GPU for that code. If the GPU needs to wait on the 68000, and the code the 68000 is running is working on mostly words and longs, it may be better to run the code on the GPU in main ram. If it fits in local memory, it's probably better to make it into another local ram segment and upload it. If it DOESN'T fit, what are you going to do? Wait on the 68000, or use the GPU in main ram?

My suggestion for benchmarks is this: make a trivial task that is easy to expand into different sizes. Run the task on the 68000, run the task on the GPU in main, and then upload and run the task (in N segments) on the GPU in local ram, and compare the times. Do 4K (so one segment on the GPU in local), then increasing sizes... 8K, 12K, 16K, etc, and see if there is a point at which the extra overhead on the local ram transfer makes running in main better. Maybe change the trivial task as well - make a trivial byte operation, a trivial word operation, a trivial long operation, and a trivial operation that combines all three. Those results would tell us a LOT more than the one present in the opening post.

LinkoVitch · March 7, 2014

Hi, interesting stuff CW, I always assumed the Doom port would feature heavily from C code and therefore suffer greatly from the overheads of compilation. I knew Carmack rewrote chunks of it to improve speed etc, but not the full extent. Are the GPU sections actually written assembly or Compiled from C?

The initial test isn't meant to be a 100% conclusive piece, there are many ways to use the Jag hardware and buried in all the randomness I have stated a few more tests I plan to run. I purposefully avoided using any calls to main memory so as to further impact the performance of the GPU, wanting to mearly observe the speed difference of the GPU running code, so the most basic of operations.

I am interested to know what you suggest as a trivial task to run in your suggestion, as you state that the trivial example used it of no real use initially (hope you noticed it's not just 3 instructions that are ran and infact 1200 instructions? the most you could squeeze into GPU RAM being 2048 instructions (majority of the instructions being 16bit). Which also makes me wonder if the claim of fetching 2 longs is accurate? where did you find this information? I know the GPU does some rudimentary pre-fetch, but as the instructions (with a few exceptions) are 16bit, I am not sure it would fetch 2x longs, I certainly prefetches the next instruction most of the time, but this is just 16bit, so I am not sure if it would in-fact fetch 32bit each call rather than 16bit?

Not sure I agree that it is only trivial amounts of code that can be fitted into 4K either, you can get a substantial amount of functional valid code into that space, it's quite surprising just how much when you sit down and write it.

I'd also disagree that code cannot (given the foresight) be broken up fairly easily. Unless you write all your code as a single huge monolithic chunk with no procedures or logical functional breaks. My approach would be to break these functions up into discreet code units which can be loaded as required, annoying that there is no automated way to perform relocation of code, but with careful planning before writing code this could be circumvented. The biggest problem as I see it would be the main loop/routine as these tend to be parts calling all the subroutines, as such it may not easily fit in amongst them, and given the low computational overhead it provides (mostly just orchestrating the whole thing), it would be a viable candidate for GPU in main, as I believe has been done by others instead of using the slower 68k to perform these managerial tasks.

The current benchmark provided can easily be scaled to any size (that will fit). I do plan to time the impact of paging code into Local RAM, but unless that takes 8-9x the time of running it in local I do not see there ever being a point where copying code to Local ram would be slower than main ram.

Main RAM execution is 10x slower than local, as I stated earlier, if copying a whole 4K of code into the GPU took approx half that time (given the blitter running 32bit wide copies, and I am sure there would be paging benefits from sequential reads from main), the execution time including the copy would still be faster than running that same code in main ram. So multiplying those figures out 2x 4x 8x.. the faster option isn't suddenly going to become slower than the slower option. If the copy and run was slower or approx the same speed as running in RAM then yes this might be viable.

I do plan to benchmark the time taken to copy and run code, but this won't be for a few days yet as I am away from my dev setup at the moment

I'll be attempting blit in and run timing, as well as the impact of the blitter accessing the Local RAM at the same time as GPU.

sh3-rg · March 7, 2014

First, I've seen a couple people that just constantly scream "BCD, BCD, BCD!" Yes, the 68000 is much faster at BCD than the GPU would be in main... hell, I'd bet the 68000 is faster at BCD than the GPU in LOCAL! The answer is DON'T DO BCD ON THE GPU!! There are nearly always ways to get rid of BCD operations, and when there isn't? I'd make the 68000 do the BCD operation. It's just that simple. If the 68000 can do the job faster, use it. If it can't, don't!

You haven't seen anything of the sort and your answer arrives as response to no question posed. What you actually saw were people using that to highlight pretty much the same point you rather achingly laboured over yourself, i.e. that to readily state one processor is always best, that one method of getting from A to B is always faster, to be so arrogant to assume that all programmers are created equal - this, all this and more, is utterly and profoundly arrogant, and it'd be fair to say that anyone who offered such commandments as The Book Of The Word Of The Law is no doubt more than just a wee bit self-important and short-sighted to boot.

Good programmers will do whatever they need to do to get from idea to release. Anyone who obsesses over the minutiae of their journey and retreads the same little paths over and over is only increasing the odds of never arriving at their destination... they might have a lot of fun, they might discover some amazing techniques and methods, but that's not going to get them over the line any time soon... and they can sit and theorise 'til they're old and grey, but of that ends up being the sum total of their efforts, what's, to be blunt, the fecking point?

Now, for a more flavourful if mincey analysis, I'm going to have to hand over to Mr. Robinson.

Just gotta toss my own two cents in...

Though tish ah, nay, hush and fourpence, Mr Willy.

First, I've seen a couple people that just constantly scream "BCD, BCD, BCD!"

Ah, hish and tusk dear boy, hyperbolism, oh how it adds such gravitas to the intercourse. Bravo, dear boy. Braaaa. Vo. An additional point for such adept, nay adroit, amplification. *ping*

Second, a couple people seem to think it's simplicity itself to divide the code into 4K or even 2K chunks. Would that it were so...

Ah, would that it were, dear boy. Would. That. It. Were.

+Gemintronic · March 7, 2014

post-25413-0-72937100-1394229063_thumb.p

That's pretty disturbing. I don't know if Mr. Robinson wants to fluff my chuff or tell me to "F" "O"ff!

Otto1980 · March 7, 2014

maybe i missd something, but why this ugly flaming all the time?

let him tell his toughts and everyone can think about it by himself

i think he got his experience with consoles, coding, doom for example.. and in context on jaguar-doom-source his explanations sounds informative

never forget doom is one of the best (or the best) game(s) on the jaguar

so whats wrong with his toughts

still on the search for arrogance in his post.. nope.. nothing

sh3-rg · March 8, 2014

maybe i missd something, but why this ugly flaming all the time?

let him tell his toughts and everyone can think about it by himself

i think he got his experience with consoles, coding, doom for example.. and in context on jaguar-doom-source his explanations sounds informative

never forget doom is one of the best (or the best) game(s) on the jaguar

so whats wrong with his toughts

still on the search for arrogance in his post.. nope.. nothing

You're going to have to quote comments directly because what you wrote either seems like you misunderstood much of what was said or that you are suffering a language-barrier issue.

Nobody claimed Willy was arrogant, the arrogance is on the part of anyone who issues edicts of Thou Shall or Thou Shalt Not and claims that "there's only one way, my way, do it like me or do it wrong".

Everyone is entitled to their thoughts, but if they base them on unsound or unfair assumptions then they can expect this to be highlighted. That's part and parcel of discourse. If you see that as ugly flaming, that's your issue, not that of anyone else.

If in fact your are referring to JagChris and his input to this thread, then you can rest assured that he barely understands a morsel of what he utters and whatever it is that compels him to insist on repeatedly regurgitating the indoctrination ingrained in every fibre of his being would take expert cross-examination to extract and quantify. Such offerings themselves are more akin to "flaming" as they amount to little more than irritating noise at best and shameless trolling at worst.

Otto1980 · March 8, 2014

yes whatever, but would be nice to calm down and come back to topic

you know people click this forum entry because its "GPU in Main, Science!"

sadly the half of all posts are personal flaming and propaganda of what should be done or not.

dont get me wrong, i totaly agree with you that every coder got to make his own desicions when creating his solutions.

but its just nice to see such toughts, benches or tests. even if they make no sense, they are maybe good to know.

i really like cillys and linkos posts cause they do or do try to go into technical details.

:thumbsup:

greetings and peace to you all

sh3-rg · March 8, 2014

yes whatever, but would be nice to calm down and come back to topic

you know people click this forum entry because its "GPU in Main, Science!"

sadly the half of all posts are personal flaming and propaganda of what should be done or not.

dont get me wrong, i totaly agree with you that every coder got to make his own desicions when creating his solutions.

but its just nice to see such toughts, benches or tests. even if they make no sense, they are maybe good to know.

i really like cillys and linkos posts cause they do or do try to go into technical details.

:thumbsup:

greetings and peace to you all

Well exactly, I agree 100%.

I requested all the crap in this thread that isn't on topic be thrown by the wayside because otherwise nobody is going to want to sift through the rest to find one or two lines of information and worthwhile opinion.

But apparently that's like asking a moderator to indulge in self-harm. Which is a shame, because programming threads are very different to the regular chat threads and should be handled with more care and attention than the usual waving of sticks and hammers.

JagChris · March 8, 2014

If in fact your are referring to JagChris and his input to this thread, then you can rest assured that he barely understands a morsel of what he utters and whatever it is that compels him to insist on repeatedly regurgitating the indoctrination ingrained in every fibre of his being would take expert cross-examination to extract and quantify. Such offerings themselves are more akin to "flaming" as they amount to little more than irritating noise at best and shameless trolling at worst.

Everyone can rest assured that you are full of crap sh3. And I do believe your hugely disrespectful response to Chilly Willy constituted trolling far more than anything I put on here.

Edited March 8, 2014 by JagChris

+CyranoJ · March 8, 2014

'Everyone' - seems to me your the only one rambling in this thread with everyone else telling you to stfu.

We're back to that comprehension thing again

JagChris · March 9, 2014

Oh am sorry. Let me clear that up. 'Everyone' referring to everyone outside the Reboot clique who can plainly see that sh3 just insulted a programmer with more skill than the entire Reboot crew put together.

'Everyone' - seems to me your the only one rambling in this thread with everyone else telling you to stfu.

We're back to that comprehension thing again

+CyranoJ · March 9, 2014

Sniff.. Nope, doesn't smell like JS2 in here...

If I thought you'd understood anything at all in this thread I'd be offended

Oh am sorry. Let me clear that up. 'Everyone' referring to everyone outside the Reboot clique who can plainly see that sh3 just insulted a programmer with more skill than the entire Reboot crew put together.

Edited March 9, 2014 by CyranoJ

sh3-rg · March 9, 2014

Everyone can rest assured that you are full of crap sh3. And I do believe your hugely disrespectful response to Chilly Willy constituted trolling far more than anything I put on here.

oxoxo (hugs and kisses Christopher)

Oh am sorry. Let me clear that up. 'Everyone' referring to everyone outside the Reboot clique who can plainly see that sh3 just insulted a programmer with more skill than the entire Reboot crew put together.

Your opinion, Christopher, it amounts to nothing. One day you will come to understand this and maybe you'll go play some games or something, occupy yourself with matters that concern you and are within the bounds of your ability and understanding. Nobody is interested in your league table of homebrewer ability... bonkers.

If Chilly Willy is offended by me pointing out that he made an incorrect assumption when he made those "BCD! BCD! BCD! BCD!" comments, that's his call to make, I'm fairly certain he doesn't need the likes of yourself attempting tit-for-tat name calling at any rate. If you think the stuff in the spoiler tag was offensive, you need to go read up on Call My Bluff, Stephen Fry, Hugh Laurie, The Robert Robinsons and British humour in general. Then climb down from that high horse, Dobbin is getting tired of you mindlessly rocking back and forth all day...

LinkoVitch · March 9, 2014

Oh am sorry. Let me clear that up. 'Everyone' referring to everyone outside the Reboot clique who can plainly see that sh3 just insulted a programmer with more skill than the entire Reboot crew put together.

JC, please can you stop with this petty arguing and name calling! Your 'contributions' to this thread have seemed pretty much nothing more than attempting to start name calling and flamewars! YOU are the one doing the name calling! I did not start this thread with the intention of trolling or flaming, you seem to have come to it with nothing BUT that on your agenda! If you have nothing to add to this discussion other than this, can I please ask you to not post it here, it's doing nothing constructive, I only hope those individuals who have an interest in this topic have not been driven away by your antics.

Just think, if we had discussions that didn't involve random trolling regarding technical matters, perhaps there would be more active dev's, or people who are struggling along on their own would not be worried about asking a question for fear of the Coder Police chastising them for daring to think different. Leading to more releases, more fun, and more happy times!

Now, drop it!

Chilly Willy · March 13, 2014

Hi, interesting stuff CW, I always assumed the Doom port would feature heavily from C code and therefore suffer greatly from the overheads of compilation. I knew Carmack rewrote chunks of it to improve speed etc, but not the full extent. Are the GPU sections actually written assembly or Compiled from C?

It's compiled from C using the crappy gcc JRISC port, then fixed and optimized by hand... except for a few routines that were easy enough to do by hand. So that does factor a bit into the trouble dividing the code into 4K blocks... we all know that C isn't terribly good compared to hand-written assembly except for trivial cases.

The initial test isn't meant to be a 100% conclusive piece, there are many ways to use the Jag hardware and buried in all the randomness I have stated a few more tests I plan to run. I purposefully avoided using any calls to main memory so as to further impact the performance of the GPU, wanting to mearly observe the speed difference of the GPU running code, so the most basic of operations.

I am interested to know what you suggest as a trivial task to run in your suggestion, as you state that the trivial example used it of no real use initially (hope you noticed it's not just 3 instructions that are ran and infact 1200 instructions? the most you could squeeze into GPU RAM being 2048 instructions (majority of the instructions being 16bit). Which also makes me wonder if the claim of fetching 2 longs is accurate? where did you find this information? I know the GPU does some rudimentary pre-fetch, but as the instructions (with a few exceptions) are 16bit, I am not sure it would fetch 2x longs, I certainly prefetches the next instruction most of the time, but this is just 16bit, so I am not sure if it would in-fact fetch 32bit each call rather than 16bit?

The hardware manual goes into detail on the bus operation for each of the units. You need that info to avoid wasting bus bandwidth. And I think I goofed on that slightly - it's 2 QUADS, not longs as it's based of the basic DRAM cycle of one 64-bit cycle followed by a 64-bit fast page mode cycle. The bus controller tries to maximize the usage of that bandwidth for the BLITTER, OP, and GPU since those will be the bottle necks on anything substantial. You'll have the GPU crunching the numbers, the BLITTER drawing the data, and the OP drawing the screen, all running at the same time from the DRAM. The unified memory architecture is the main limitation of the Jaguar (my opinion). The 32X has one minor advantage in that the 68000 can run from the MegaDrive work ram independ of the SH2s. You want to run the 68000 in that work ram to keep it off the bus. Any time the 68000 accesses the rom bus, it stops any access by either SH2 from the rom for the entire 68000 bus cycle (up to 15 SH2 cycles). You want the SH2s to cache the rom accesses since cache is much faster than rom, and try not to flood the cache (make large data arrays non-cacheable).

Back to the Jaguar - your test is a decent size to test one bank of code staying resident in local ram. I'd like to see something more like Doom - several banks of code loaded and run one after the other. I'd also like to see a test of code run on the 68000 compared to the same code (allowing for differences in the processors) running on the GPU in main. The idea for improving performance by running the GPU in main is that some things are run on the 68000 during a period the GPU is just waiting on the 68000. If the same code could be run on the GPU in main (leaving the local ram with the tight loop code that consumes the most time), you could eliminate some time the GPU is just busy looping, and spend less time given the GPU in main SHOULD be faster than the 68000 on the same code. I hope that makes sense.

Not sure I agree that it is only trivial amounts of code that can be fitted into 4K either, you can get a substantial amount of functional valid code into that space, it's quite surprising just how much when you sit down and write it.

Well, yes, it's a lot. But for code hand written in assembly. And that's why most Jaguar games I've seen the code to are almost all 68000 for the main logic (either assembly or compiled from C), and hand written assembly for the JRISC. It requires a lot of effort by the programmer to fit the game into that format, which is why ports tended to be pretty scarce on the Jaguar.

I'd also disagree that code cannot (given the foresight) be broken up fairly easily. Unless you write all your code as a single huge monolithic chunk with no procedures or logical functional breaks. My approach would be to break these functions up into discreet code units which can be loaded as required, annoying that there is no automated way to perform relocation of code, but with careful planning before writing code this could be circumvented. The biggest problem as I see it would be the main loop/routine as these tend to be parts calling all the subroutines, as such it may not easily fit in amongst them, and given the low computational overhead it provides (mostly just orchestrating the whole thing), it would be a viable candidate for GPU in main, as I believe has been done by others instead of using the slower 68k to perform these managerial tasks.

I look at most of these from the point of view of ports since that's most of what I work on. A port of an existing game (particularly one that's all C) will be rather hard to break up into pieces. Doom is a pretty good example of the work it would take to do it in a way as to utilize all parts fairly evenly. If you write the game from scratch with the Jaguar limits in mind, it's not as big a deal since you would be thinking about these limits while writing the code.

I do plan to benchmark the time taken to copy and run code, but this won't be for a few days yet as I am away from my dev setup at the moment

I'll be attempting blit in and run timing, as well as the impact of the blitter accessing the Local RAM at the same time as GPU.

I put my name down for one of the next skunkboard runs, so hopefully that occurs soon. I'd love to work on some Jaguar homebrew.

LinkoVitch · March 13, 2014

Doesn't look like I am going to get enough time to sit down and do the extra tests for probably at least 2-3 weeks! I will get them done, just not in the time frame I originally hoped to, sorry.

+Gemintronic · March 13, 2014

I'm sorry. I don't care if it's ground breaking work. If I can't have Mario Paint for the Jag now I'm switching back to making levels for Adventure Creator.

GPU in main, SCIENCE!

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members