Jump to content
IGNORED

Could the Jag Do Decent Ports of NeoGeo and CPS-2


christo930

Recommended Posts

DML not s not seeing the Falcon's n DSP as an risc. He is having trouble with his quadratic texturing over it.

 

For this [Quake engine], the Falcon's blitter is a paperweight, yes. It is far less sophisticated than the Jaguar blitter - it is a two-source, integer-addressing block transfer device with some logic operations and bitplane scrolling support. Aside a few interesting quirks and tricks which have been found to work with it, the blitter is really not capable of 3D work, except where it is needed to fill flat colour or copy contiguous data in rows or rectangles.

 

The Jaguar blitter is a lot faster, wider bus, has fixedpoint addressing IIRC and I'm sure I had it affine-texturing cubes back in the day as one of my early tests with the devkit.

 

The Jag is definitely much better set up for data bandwidth and pixel pushing.

The 68000 is slower than the Falcon's 68030, but unlike the Falcon nearly everything could be split between DSP and GPU, with enough effort.

The Jag bus is much wider and faster than the Falcon's 16bit bus. The Falcon's direct DSP/CPU exchanges must bottleneck through a narrow 8-bit wide port - the DSP has no direct access to system RAM.

The Jag DSP has some issues accessing system RAM as well but still much faster than this even with the bugs.

Main problems with the Jag are the effort involved in offloading the bulk engine code from the 68k (managing overlays on the GPU), and general lack of system RAM. -DML

 

http://www.atari-forum.com/viewtopic.php?f=68&t=26775&p=261628&hilit=Risc#p26162

 

For this to be efficient, you really need a RISC device with a multiply-accumulate and fast shifting capability. Or at the very least, a very fast multiplier and careful coding. Unfortunately the Falcon's DSP is terrible at shifting and does present some problems of its own here, getting it to work fast. Left as an exercise for the reader ;)

Edited by JagChris
Link to comment
Share on other sites

He's confirming my Blitter point - for flatshading purposes, all we need is for Blitter to fill flat color. Nothing else.

 

What I don't know, however, is if Falcon's Blitter is fully parallel, or if main execution on 68030 halts, while Blitter is accessing RAM (it might be limited in number of data ports, just like jag is).

 

Because it's OK, if Blitter is slower and has only 16-bit access. As long as it is working in parallel, engine can keep working on other stuff, while Blitter is filling scanlines.

 

 

The Falcon's direct DSP/CPU exchanges must bottleneck through a narrow 8-bit wide port - the DSP has no direct access to system RAM.

We'd have to see some schematics. Is Falcon really crippled like that ? Down to 8-bit transfer ?

Link to comment
Share on other sites

To end the speculations:

 

Data can be transfered to and from the Falcon DSP using either the sound interface (which limits you to the amount used by ~50kHz 16bit stereo samples) or the host port.

Host port is indeed 8bit internally but accessed by writing 24bit words. Its been a while but I seem to remember the speed being close to 50% of that of the ST-Ram so ~2-3MB/s. The host port doesn't allow for DMA so both the DSP and the host are 100% busy during a transfer.

 

The FPU doesn't really run in parallel with anything on the 030 so more or less useless for speed, only good for accuracy.

 

I believe the blitter can run in parallel with the host to some extent assuming the code and data used by the host is cached.

 

If the host is an 040 or 060 then everything changes and the DSP is more or less useless for anything but audio even though it runs in parallel because of the time it takes to transfer data across the host port.

  • Like 1
Link to comment
Share on other sites

To end the speculations:

 

Data can be transfered to and from the Falcon DSP using either the sound interface (which limits you to the amount used by ~50kHz 16bit stereo samples) or the host port.

Host port is indeed 8bit internally but accessed by writing 24bit words. Its been a while but I seem to remember the speed being close to 50% of that of the ST-Ram so ~2-3MB/s. The host port doesn't allow for DMA so both the DSP and the host are 100% busy during a transfer.

 

The FPU doesn't really run in parallel with anything on the 030 so more or less useless for speed, only good for accuracy.

 

I believe the blitter can run in parallel with the host to some extent assuming the code and data used by the host is cached.

 

If the host is an 040 or 060 then everything changes and the DSP is more or less useless for anything but audio even though it runs in parallel because of the time it takes to transfer data across the host port.

Thanks!

 

Looks like we've actually been pretty lucky on a jag to get full 16-bit bus on DSP (with 32-bit access auto split) then !

 

So, the FPU then needs babysitting via host ? Meaning, value by value ? That would indeed make it totally useless for performance....

 

Oh, well...

Link to comment
Share on other sites

Will do.

In the meantime there is no evidence out there that lb for lb the Jag won't smoke the Falcon.

While everyone is waiting we can look at the available evidence.

Towers II, letterboxed on the Falcon compared to full screen on Jag.

Doom, quick and dirty port on the Jag, 15fps fullscreen.

After years of work on Falcon, Doom is 9-13fps, letterboxed.

DML gave his view on the 2 versions s:

 

'Yeah its hard to compare Doom on the two systems because they aren't really 1:1. Jag game code was rewritten by ID to make it run fast on a console - the Falcon one uses ported PC code with F030 bandages applied and tickrate throttled. The maps are also quite different, and vary dramatically in expense. The jag ones were selected down to work well on it.

 

 

For the record - this is the Falcon version running the full game code, but with the AI 'paused' to kill some of the expense.

 

https://www.youtube.com/watch?v=zkW_W3u3Q-s

 

The window is not full size, but it is using full resolution pixels (not chunky columns, like the Jaguar) so the rendering width is a good bit larger than the Jag- while height is smaller. I don't have a video of it using chunky columns at full size but the speed is similar. The raw viewer runs faster still - since it's doing nothing but drawing and isn't capped at 12hz.

 

 

The Jaguar really has the edge since it can hardware-blit textured spans and columns in a single shot (or GPU them direct to the framebuffer). The Falcon can't do either of those things. Still they aren't *so* far apart in performance despite this.'

 

 

Very unbiased look and he details the differences in a lot more depth than just frame rates.

Link to comment
Share on other sites

Ok. Are you implying his previous comments are biased?

 

Yes it was modified to run on the Jag. A console. But it wasn't rewritten for the Jag. It's still the PC engine at it's core.

 

If they were rewritten for their targets you'd get a big boost in performance from both.

 

No they aren't that far apart performance 13 frames per second versus 15. And the Falcon Port has a lot of nice things transparent water etc.

Edited by JagChris
Link to comment
Share on other sites

a. Jagchris, DML is saying he would need a shifter on the DSP that the falcon's DSP doesn't have and not that the DSP isn't RISC (Reduced Instruction Set btw, since obviously you don't know what it means).

b. Vladr, IIRC just like the STE the falcon's blitter has two modes. Hog and Multiplexed (I think that's how it's called). In the first, the blitter takes over the bus until you return it to the cpu while the cpu can work with its caches. In the second mode, bus time is switched between the blitter and the cpu every 64 cycles. The falcon's blitter was mostly put there for two reasons. Compatibility to the ste and to speed up graphical operations on the desktop. The consensus at the early days was that it was useless for games and demos apart from some edge cases.

c. You are all playing ST games and the falcon does that better than the jaguar.

  • Like 3
Link to comment
Share on other sites

@JagChris:

 

I'm saying your posts are certainly very 'curious' in terms of how your representing things.

 

You compare Doom on Falcon and Jaguar in the most basic manner, making no attempt to explain differences beyond the frame rate of the 2...

 

dml however does...

 

 

In my earlier post about Jaguar Robinson's Requiem.. i made clear the Jaguar version is done in polygons because the hardware isn't suited to the PC Voxel Engine...

 

This was simply going off comments made during the French interview with the coder..which as i again made clear was the result of a 30 second Goggle search.

 

You had talked about developers making claims, but at that point hadn't put up any developer quotes,even though you had interviewed the coder yourself..

 

 

 

You start talking about Fallen Angel as being great example of a Voxel Engine on the Jaguar when you know damn well it's not using a true voxel engine..

 

I have seen claims Phase Zero isn't using a true voxel engine either, but i leave that to the tech heads to confirm or disprove.

 

This is from a post you yourself put up:

 

'2.Can you tell us about this 'voxel-ish' engine. My understanding there are two or so different types of pixel height mapping engines that arent quite 'voxel'. Can you elaborate on what this is? Is it true voxel? Is this your first engine of this type? How well does the Jag seem to lend itself to this type of engine?

 

Quote

It isn't a true voxel engine. It is more a kind of height-mapping engine. I don't know much about true voxel engines but the few I've seen required modern computers (>1GHz processors, 100s of MB of RAM). Early 90's machines would do very blocky stuff.

For the kind of engine I implemented, I guess the Jaguar compares well to its contemporaries (486 DX PC, 3DO...).'

 

 

 

http://atariage.com/forums/topic/207020-dr-typo-interview-fallen-angels/

 

 

So i am very curious as to why your making such claims, when it's clear you had the information to hand..

 

So.are you talking about PC Voxel engines..Jaguar Height Mapping engines or what here when trying to compare same games running on Jaguar and Falcon?.

 

You've gone from talking about Jaguar smoking the performance of Falcon versions of games to now admitting the frame rate difference in Doom is minimal and Falcon version has some nice effects..

 

Plus Jaguar Robinson's Requiem has compromises made to 2D elements like the trees compared to the Falcon version.

 

And your straying into the if they were rewritten... realm.

 

Fact is Carmack didn't rewrite Doom from ground up for Jaguar hardware so we have never seen just what could of been done.

 

I've yet to see anyone do any Quake engine type tests on Jaguar..despite some B.S nudge nudge..claims posted alongside Ultimate Future Faces b.s claim Quake on Jaguar was 30% complete on Jaguar posted up in Lost games threads etc online over the years.

 

So at this point it's yet more speculation..and Apple and Orange type comparisons.

Edited by Lost Dragon
Link to comment
Share on other sites

Yeah LD i interviewed him too. They changed it because they didn't like the look of the voxels.

http://www.3do.cdinteractive.co.uk/viewtopic.php?f=14&t=3375&hilit=Cogordan

It's a good thing you've found out the system wasn't suited for voxels. Don't tell those who enjoy fallen Angels or Phase Zero.

You guys ignore evidence right in front of everyones faces for years and then call those who don't imbeciles.

Not gonna dig up the numerous statements Carmack has made that it is the PC DOOM engine bolted into the Jaguar.

Phase Zero developers talking:

 

 

'Jeremy: The graphics are based on our nifty spin on height fields...

R.I.P.: Height Fields?

Jeremy: Height fields are basically like large maps,

where every square on the map has its own height... It's very similar to the effect used in games like CyberRace or Commanche on the PC.

Paul: Every pixel on the map has a color and a height, and they are independent.

Jeremy: The effect is a very highly detailed landscape that has lots of little cragglies.

The neat thing about this particular implementation (on the Jaguar) is that we are using lots of colors, and the maps are large.'

 

https://www.unseen64.net/2010/06/05/phase-zero-jaguar-cancelled/

 

Again to my layman's eyes they are talking about coding an engine on the Jaguar which creates an effect similar to the effect of a Voxel Engine used on the PC,just as Fallen Angels creates a similar effect.

 

These aren't the same type of voxel engines which are used in the PC games your referencing.

 

But engines written to make use of Jaguar hardware...

 

It's like me saying Sega Saturn wasn't suited to the Quake Engine on PC and you saying... try telling that to Lobotomy who brought Quake out on it.

 

Coders are creating engines to replicate what was done on PC on console hardware.

 

Going off what i've read..coder of Jaguar Robinson's Requiem wasn't happy with trying to get the PC Voxel Engine running well on Jaguar hardware, it looked visually messy..so ditched it, used a polygon 3D engine instead, which offered cleaner visuals,but at expense of draw distance and simplified 2D objects..

Edited by Lost Dragon
Link to comment
Share on other sites

Simarlis Louis Marie talking about Robinson Requiem:

 

'Scenary is displayed in a technique similar to voxel technology..

 

Here, the images are created within the computers chip memory in real time, as opposed to being preprogrammed graphic images.Backgrounds are generated depending entirely on the players point of view'.

 

So,yet again the words SIMILAR TO VOXEL TECHNOLOGY...are used.

 

And the Jaguar version was not only version to use a polygon engine..was it?

 

The Amiga version used far simpler polygons for the terrain, rather than the texture mapped landscaping seen in PC version..

 

It worked well enough walking on a flat plain but that was about it.

 

But,it was simply yet again a case of finding a solution on different hardware to replicate look of PC version.

Link to comment
Share on other sites

b. Vladr, IIRC just like the STE the falcon's blitter has two modes. Hog and Multiplexed (I think that's how it's called). In the first, the blitter takes over the bus until you return it to the cpu while the cpu can work with its caches. In the second mode, bus time is switched between the blitter and the cpu every 64 cycles. The falcon's blitter was mostly put there for two reasons. Compatibility to the ste and to speed up graphical operations on the desktop. The consensus at the early days was that it was useless for games and demos apart from some edge cases.

 

OK, so it is then possible to do the same thing I'm doing on jag with flatshading - initiate the scanline blit and instead of waiting for blitter to finish, rather continue computing endpoints of next scanline in parallel. Depending on length of scanline, blitter might or might not be finished when I have new endpoints computed on 68030.

 

The only problem I see is 256 Bytes of cache. My scanline traversal code is much bigger than that. Then again, I have all 4 combinations there unrolled for max speed.

 

I suppose, prior to doing the blit, I could just fill the cache with the unrolled combination specific to the current triangle. That code should fit within 256 Bytes, as it's just 25% of the original.

 

Also, we're talking 68030 here, not RISC. I've seen so many cases recently, where a page of code on RISC is, like, 3-4 lines on 68000 :)

Link to comment
Share on other sites

Thanks!

 

Looks like we've actually been pretty lucky on a jag to get full 16-bit bus on DSP (with 32-bit access auto split) then !

 

So, the FPU then needs babysitting via host ? Meaning, value by value ? That would indeed make it totally useless for performance....

 

Oh, well...

 

 

Falcon FPU (68882) is really no different from FPUs in other computers. You can't upload code to it, instructions are part of the regular instruction stream for the CPU.

  • Like 1
Link to comment
Share on other sites

OK, so it is then possible to do the same thing I'm doing on jag with flatshading - initiate the scanline blit and instead of waiting for blitter to finish, rather continue computing endpoints of next scanline in parallel. Depending on length of scanline, blitter might or might not be finished when I have new endpoints computed on 68030.

 

The only problem I see is 256 Bytes of cache. My scanline traversal code is much bigger than that. Then again, I have all 4 combinations there unrolled for max speed.

 

I suppose, prior to doing the blit, I could just fill the cache with the unrolled combination specific to the current triangle. That code should fit within 256 Bytes, as it's just 25% of the original.

 

Also, we're talking 68030 here, not RISC. I've seen so many cases recently, where a page of code on RISC is, like, 3-4 lines on 68000 :)

 

 

To get the best 3D flat shaded performance out of a Falcon you'll probably want to have the DSP rasterise triangles to an internal frame buffer and then copy that over the host port to main memory.

That way you do all the processing on the DSP and pay a fixed (but large) cost every frame for transferring data across the hostport. If you're targeting 320x200 in 4bitplanes you're looking at 32kB of data inside DSP ram (out of its 96kB or 32k words where each DSP word is 24bits) across the host port each frame. At 2.5MB/s you'll probably achieve ~40kB per 60Hz frame so it'll take almost a frame to get the data into main ram which the videl can display.

 

If you're happy with 30fps you have one 60Hz both on host CPU and DSP running in parallel.

  • Like 1
Link to comment
Share on other sites

To get the best 3D flat shaded performance out of a Falcon you'll probably want to have the DSP rasterise triangles to an internal frame buffer and then copy that over the host port to main memory.

That way you do all the processing on the DSP and pay a fixed (but large) cost every frame for transferring data across the hostport. If you're targeting 320x200 in 4bitplanes you're looking at 32kB of data inside DSP ram (out of its 96kB or 32k words where each DSP word is 24bits) across the host port each frame. At 2.5MB/s you'll probably achieve ~40kB per 60Hz frame so it'll take almost a frame to get the data into main ram which the videl can display.

So, DSP on Falcon has fast access to 96 KB ? I thought internal program RAM was just 1 KB (512 words). From top of my memory, there's P,X,Y section. Some of that is mirrored and gives unique storage (good for circular buffers).

I suppose the 512K is for P, and there should be 128-256 available for both X,Y, right ?

The external data bus is supposed to be zero waitstate, correct ?

 

Have you tried the slower SSI transfer ? I think I read you could program the Blitter as a DMA device which should in theory (normally Falcon Blitter cannot access this, correct?) enable transfer from those 96 KB to main RAM in parallel while DSP is running.

 

I would gladly take the slower transfer which runs in parallel and enables sort-of multithreaded design. Within 96 KB, one could easily allocate portion for old frame (that would be blitted in parallel) and new frame, that would be processed by DSP code.

 

If you're happy with 30fps you have one 60Hz both on host CPU and DSP running in parallel.

30 fps would be fine, really. But doesn't look easily accessible without some serious yoda yoga...

 

Correct me if I'm wrong, but the easiest way to get a parallel data transfer on Falcon is via Blitter driven from 68030, no ?

 

So, this looks like one needs to create a hybrid 3D engine, where the final composite would be:

1. Portion of 3D scene done on DSP and transferred either via SSI or host interface

2. Portion of 3D scene done on 68030, but using Blitter

 

This would require careful benchmarking and being able to split the load dynamically between the two, on a frame-by-frame basis, without introducing too much of waiting on either side.

Link to comment
Share on other sites

Yes, we're not questioning the 2D powah of jag. We're trying to assess how close in 3D engine Falcon is.

 

 

Well, we're not going to attempt 3D rasterizing in highres <768x200 - 1536x200> on Falcon, that's for sure.

 

But, even 16-bit bus is usable in something like 256x200 - 320x200. Hell, DML is using 256x128 framebuffer. 16-bit is adequate there.

 

Let's not forget that I have (via compile-time switch) options to switch between 8/16/32-bit blitting (on the SW rasterizer codepath), so I know very well the impact of just 16-bit writes.

 

And it's not as drastic as people make it out to be.

 

 

Not necessarily :)

While at this time I don't want another distraction from jag coding, I will unbox my Falcon next year and port my 3D engine there.

 

When I last checked the DSP 56000, the instruction set is (obviously) very similar (it's a RISC after all). My 68000 code should run straight without any modifications on 68030.

 

So, the port should be very straightforward. 2-3 weeks at most, I suspect, after I get the build environment up&running.

 

 

Then we'll have exact, fully comparable, benchmarks on differences between the two platforms in 3D arena :)

 

You know what the Motorola 56K DSP really needs for Falcon030 fans? A DSP-whisperer that can coax it into playing back audio made for other audio chips like the Atari TIA, Atari POKEY, MOS SID, the YM2149, the YM2151, the Amiga PAULA, and the DAC Williams used in a lot of their arcade games. Like what Cyrano Jones has been able to do with the Jag's less powerful proprietary DSP. And then the Falcon030 could have its pick of the various audio options in ST Games as Cyrano's conversions to the Jag feature. Hell, throw in the YM chip that the Sega Genesis/MegaDrive uses and you've got almost all your bases covered. Okay, forgot about the Ensoniq chip the Apple IIgs and the SNES used [not to mention the Atari Panther would've used it too]...

Link to comment
Share on other sites

Have you seen his latest raymarching demos ?

 

I would argue that he pushes the Falcon HW pretty darn well :)

 

But you are correct - the more people are active, the more are HW limits pushed.

 

Patience, my friend :)

 

Any particular links? I saw one of the Sharp X68000 shooters ported to the Falcon030 with some of the sound retained and it looked spectacular.

Edited by Lynxpro
Link to comment
Share on other sites

a. Jagchris, DML is saying he would need a shifter on the DSP that the falcon's DSP doesn't have and not that the DSP isn't RISC (Reduced Instruction Set btw, since obviously you don't know what it means)

Yes I know what it means but when I looked it up it's not stated as an risc on Wikipedia. So, honest mistake.

Link to comment
Share on other sites

 

You know what the Motorola 56K DSP really needs for Falcon030 fans? A DSP-whisperer that can coax it into playing back audio made for other audio chips like the Atari TIA, Atari POKEY, MOS SID, the YM2149, the YM2151, the Amiga PAULA, and the DAC Williams used in a lot of their arcade games. Like what Cyrano Jones has been able to do with the Jag's less powerful proprietary DSP. And then the Falcon030 could have its pick of the various audio options in ST Games as Cyrano's conversions to the Jag feature. Hell, throw in the YM chip that the Sega Genesis/MegaDrive uses and you've got almost all your bases covered. Okay, forgot about the Ensoniq chip the Apple IIgs and the SNES used [not to mention the Atari Panther would've used it too]...

Well there is such a thing. There is jam player that plays many of these formats.

 

@jagchris.. you might actually be right. The 56k seems to have quite a rich instruction set but it also has some risc features. However i saw the manual and there is no mention of risc.

Link to comment
Share on other sites

So i am very curious as to why your making such claims, when it's clear you had the information to hand..

?

 

Height maps are the baby brother of voxels! And Voxel is quicker to spell. But below a certain power range I assumed that everyone assumed the former rather than the latter. Not even Cyril was put off by the word use. ?

Link to comment
Share on other sites

@Jagchris:Sorry but whilst Voxel might be quicker to spell..that's a bit of a cop out excuse.

 

You came steaming in talking about the Voxel landscape used by PC Robinson's Requiem. ..it doesn't use one...just something similar...If the quote given by developer is correct..

 

Then wax lyrical about Jaguar Hardware being well suited to Voxel engines and cite Phase Zero and Fallen Angels as key examples..again neither use Voxel engines..

 

 

Then say likes of CJ and myself ignore the evidence in front of everyone's faces...

 

And now say you simply assumed everyone knew you were talking about height maps..not voxel engines..

 

Really?...

 

It's fine to admit you were mistaken and you got confused over type of engines used and why...

Edited by Lost Dragon
Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...