Jump to content
IGNORED

What could have saved the Jag?


Tommywilley84

Recommended Posts

8 hours ago, laymanpigeon said:

Since Carmack complained about inability for Tom of caching textures.

I am not sure about IDSoftware's intentions back then, but I strongly assume, quick money. Or maybe they were urged from Atari in the hope to have a "killer" game for the Jaguar.

This seems to me the only reason they tried to use their C code instead of doing a proper Assembly version.

I often hear coders (or better Software Developers as it is called today) complain about bad hardware, slow hardware, hardware missing certain features. But I tell them, this is your HW, you cannot change it, so change the software. If the CPU cannot handle floats in HW, do not us it. If the HW can only run full speed from a 4k large memory, do not use textures. If one of the processors is hogging the bus, stop it.

 

  • Like 3
Link to comment
Share on other sites

25 minutes ago, 42bs said:

JagMod wrote a loop like this

int i;
for (i = 0; i < 10;++i)

would never be left.

Huh. I tried this:

 

void test(int val)
{
    val++;
}

void func(void) {
    int i;
    for (i = 0; i < 10;++i) test(i);
}

 

And got this when compiling with -mGPU -O2 -fomit-frame-pointer:

 

;GCC for Atari Jaguar GPU/DSP (Jun 12 1995) (C)1994-95 Brainstorm
        MACRO   _RTS
        load    (ST),TMP
        jump    T,(TMP)
        addqt   #4,ST   ;rts
        ENDM
_forloop_start::
        .GPU
        .ORG    $F03000
ST      .REGEQU r18
TMP     .REGEQU r16
GT      .CCDEF  $15
gcc2_compiled_for_madmac:
        ;(.TEXT)
        .EVEN
_test::
        _RTS
        .EVEN
_func::
        subqt   #12,ST
        move    ST,r14
        store   r19,(ST)
        store   r20,(r14+1)
        store   r21,(r14+2)
        moveq   #0,r19  ;movsi  #0->r19
        movei   #_test,r21      ;movsi  #_test->r21
        moveq   #9,r20  ;movsi  #9->r20
        move    r19,r0  ;movsi  r19->r0
L8:
        addqt   #1,r19  ;iaddqtsi3      #1+r19->r19
        move    PC,TMP
        subqt   #4,ST
        addqt   #10,TMP
        jump    T,(r21)
        store   TMP,(ST)        ;call   r21
        cmp     r19,r20 ;rcmpsi r19,r20
        jr      GT,L8   ;jgt    L8
        move    r19,r0  ;movsi  r19->r0
        move    ST,r14
        load    (ST),r19
        load    (r14+1),r20
        load    (r14+2),r21
        addqt   #12,ST
        _RTS
        .LONG
        .68000
_forloop_end::
_forloop_size   .EQU    *-_forloop_start
        .GLOBL  _forloop_size
        .IF     _forloop_size>$1000
        .PRINT  "Code size (",/l/x _forloop_size,") is over $1000"
        .FAIL
        .ENDIF

 

I didn't run it, but I *think* that works.

  • Like 2
Link to comment
Share on other sites

57 minutes ago, 42bs said:

I am not sure about IDSoftware's intentions back then, but I strongly assume, quick money. Or maybe they were urged from Atari in the hope to have a "killer" game for the Jaguar.

This seems to me the only reason they tried to use their C code instead of doing a proper Assembly version.

I often hear coders (or better Software Developers as it is called today) complain about bad hardware, slow hardware, hardware missing certain features. But I tell them, this is your HW, you cannot change it, so change the software. If the CPU cannot handle floats in HW, do not us it. If the HW can only run full speed from a 4k large memory, do not use textures. If one of the processors is hogging the bus, stop it.

 

Right after Doom ran on PC, ID ported it to the Jag. See: a western developer with a great game and a western console!! Thanks to all the C code in Doom, this was a fast process ( to market ) before the playstation was sold in the west. John Carmack hand optimized all the assembler output from the compiler, just look at the files. Almost every second line has a comment about performance (tricks). JC specifically complains about the fact that the Jaguar can do simple shading in a phrase mode fast, but falls back to pixel mode if you want to do anything interesting. There should not have been modes. Already in VIM modes are seen as a bad thing. C64 is great because it only has 4 video modes or basically feature flags you are free to set directly without some call to a BIOS: hicolor vs multicolor, text vs graphics. And you have independent flags for sprites. No global mode.

Still Carmack said that Doom ran fast if you just looked at a wall. But then allready all of the Jag was used: The bus blocked the 68k, Jerry did transformation, and Tom fed the blitter.

  • Confused 4
Link to comment
Share on other sites

9 hours ago, laymanpigeon said:

Will do.

Lower latency and more bandwidth of RAM would have improved performance.

Texture misses are a stall, lower latency could alleviate such an issue.

Since Carmack complained about inability for Tom of caching textures.

Actually Carmack suggested some tiny on chip cache for the blitter to improve texture performance.

 

https://alexbeyman.medium.com/john-carmacks-thoughts-on-the-atari-jaguar-2c2cc0daa16a

  • Thanks 1
Link to comment
Share on other sites

15 hours ago, laymanpigeon said:

http://personal.ee.surrey.ac.uk/Personal/R.Webb/l3a15/extras/sdramart.html

 

https://www.teldat.com/blog/dynamic-random-access-memory-dram-fast-page-mode-fpm/

 

If Atari could have had survived delay of Jaguar for a year, SDRAM would be viable.

It would have effectively provided equivalent performance of 128bit FPM DRAM.

As Carmack stated another 64bits would have tripled texturing performance.

 

Doom then could have ran at solid 320x240 resolution with better framerate.

Skyhammer in worst case scenario would not have went below 15FPS.

Fight For Life could have had fully textured characters at solid 30FPS.

 

Perhaps to achieve such it may require duplication of texture data in SDRAM.

Performance would be even greater if SDRAM matched RISC clock for clock.

Even more so if another 64bit of SDRAM was there thus each can have own.

 

Anyway this is for JagChris... Downix is alive and well. lmao

 

I chat in DMs with him on Twitter.

Replacing the FPM DRAM Memory Controller by a SDRAM Memory Controller will not change the face of the world in the Jaguar simply because :

- the 68k is way slower than the FPM DRAM and only do Random Read-Write access => so no performance increase here

- the DSP/GPU don't do FPM DRAM access => so no performance increase here

- the Blitter do FPM DRAM access only when configured in write only mode, all other mode will be Random Read-Write access => so no performance increase in textured mode, only in flat/gouraud

- the OP do FPM DRAM access for bitmap => the performance will be significantly increase for wide bitmap

 

Random read/write on SDRAM is not free and SDRAM needed also a higher refresh rate.

 

As is, Doom, Skyhammer, etc, will not have the boost you think it will have, as the botleneck is not the bandwith of the memory but the lack of cache for each CPU.

 

 

 

 

 

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

3 hours ago, cubanismo said:

Huh. I tried this:

 

void test(int val)
{
    val++;
}

void func(void) {
    int i;
    for (i = 0; i < 10;++i) test(i);
}

 

And got this when compiling with -mGPU -O2 -fomit-frame-pointer:

 

;GCC for Atari Jaguar GPU/DSP (Jun 12 1995) (C)1994-95 Brainstorm
        MACRO   _RTS
        load    (ST),TMP
        jump    T,(TMP)
        addqt   #4,ST   ;rts
        ENDM
_forloop_start::
        .GPU
        .ORG    $F03000
ST      .REGEQU r18
TMP     .REGEQU r16
GT      .CCDEF  $15
gcc2_compiled_for_madmac:
        ;(.TEXT)
        .EVEN
_test::
        _RTS
        .EVEN
_func::
        subqt   #12,ST
        move    ST,r14
        store   r19,(ST)
        store   r20,(r14+1)
        store   r21,(r14+2)
        moveq   #0,r19  ;movsi  #0->r19
        movei   #_test,r21      ;movsi  #_test->r21
        moveq   #9,r20  ;movsi  #9->r20
        move    r19,r0  ;movsi  r19->r0
L8:
        addqt   #1,r19  ;iaddqtsi3      #1+r19->r19
        move    PC,TMP
        subqt   #4,ST
        addqt   #10,TMP
        jump    T,(r21)
        store   TMP,(ST)        ;call   r21
        cmp     r19,r20 ;rcmpsi r19,r20
        jr      GT,L8   ;jgt    L8
        move    r19,r0  ;movsi  r19->r0
        move    ST,r14
        load    (ST),r19
        load    (r14+1),r20
        load    (r14+2),r21
        addqt   #12,ST
        _RTS
        .LONG
        .68000
_forloop_end::
_forloop_size   .EQU    *-_forloop_start
        .GLOBL  _forloop_size
        .IF     _forloop_size>$1000
        .PRINT  "Code size (",/l/x _forloop_size,") is over $1000"
        .FAIL
        .ENDIF

 

I didn't run it, but I *think* that works.

I think that the compiler error is that there is no real way to do GT in signed for the DSP/GPU :

https://www.mirari.fr/9d0k

 

The NNNZ test can be wrong in some signed value as overlow flag doesn't exist in DSP/GPU.

 

 

Over thing:

how does it compile with "for (i = 0; i<2; ++i)" (does it push 2-1 in r20) ?

Maybe i don't read the code and count properly but I feel like there is 1 loop less than it should do ?

 

Edited by SCPCD
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Correct me if Iam wrong, but before the Sega Dreamcast there was no console hardware with SDRAM, for obvious reasons, mainly cost.

 

 

Playstation: 2 MB EDO DRAM, 1 MB VRAM; 2Kb texture cache on chip.

 

N64: 4,5 MB Rambus DRAM

 

Probably someone is confusing SDRAM with on chip SRAM?

 

 

 

  • Like 2
Link to comment
Share on other sites

On 9/2/2022 at 7:32 AM, 42bs said:

I do not know your mileage, but I am writing software for nearly 40 yrs now. Doing it for leaving for over 20yrs. And I can assure you, a buggy compiler - means one that generates buggy code - is the worst thing ever. No one wants this. A buggy assembler is easy to handle as it is 1:1 from mnemonic to machine code.

 

And regarding developers sharing info: A big lough on this. Even today demo/intro coders do mostly hide their tricks. And in the 80's and 90's they never did.

Just check out demozoo or pouet how many demos come with source code.

Just a quote from JagChris some years ago (he claims it is Scato(logical?))

Quote

Our polygon engine uses the blitter in some strange ways that make it about the fastest rendering engine anyone ever wrote.

 

Anyone saw this magic blitter poly routine? Or could it be that some SW guys did not want to share their tricks?

  • Like 2
Link to comment
Share on other sites

2 hours ago, agradeneu said:

Correct me if Iam wrong, but before the Sega Dreamcast there was no console hardware with SDRAM, for obvious reasons, mainly cost.

 

 

Playstation: 2 MB EDO DRAM, 1 MB VRAM; 2Kb texture cache on chip.

 

N64: 4,5 MB Rambus DRAM

 

Probably someone is confusing SDRAM with on chip SRAM?

 

 

 

Sega Saturn has 1MB SDRAM as part of Work RAM.

Though that was last minute addition to Saturn.

As to be competitive against PlayStation.

 

SDRAM is dual ported just as VRAM.

 

It takes 8ns for SDRAM to conduct 4 memory reads vs 14 for FPM DRAM.

Being dual ported further increases gap in performance with FPM DRAM.

Two banks can be called upon with second one few ns after first one.

 

{A~B}

[5][¤]

[4][¤]

[3][¤]

[2][¤]

[1][5](memory read from bank A)

[1][4]

[1][3]

[1][2]

[5][1](memory read from bank B after bank A)

[4][1]

[3][1]

[2][1]

[1][5](memory read from bank A after bank B)

 

SRAM will always have advantage over SDRAM due to constant 1ns timing.

SDRAM can provide SRAM latency with propering timing vs FPM DRAM.

SDRAM for a GPU role is substitute to VRAM while FPM is inadequate.

Edited by laymanpigeon
clarification
Link to comment
Share on other sites

3 hours ago, 42bs said:

Just a quote from JagChris some years ago (he claims it is Scato(logical?))

Anyone saw this magic blitter poly routine? Or could it be that some SW guys did not want to share their tricks?

Of course some people are very proprietary. Like the Pouet group. This small community is in many ways more proprietary and hoardy than most.

 

But that's a far cry from none or that people never share.

 

We're lucky enough that they have shared where it matters. None of those people's withheld contributions would have won them the label of the Godfather of the homebrew scene or are in any danger of significantly reshaping the homebrew scene.

Link to comment
Share on other sites

6 hours ago, SCPCD said:

I think that the compiler error is that there is no real way to do GT in signed for the DSP/GPU :

https://www.mirari.fr/9d0k

Got it. I'll run some experiments with signed values next time I have a minute.

 

6 hours ago, SCPCD said:

Over thing:

how does it compile with "for (i = 0; i<2; ++i)" (does it push 2-1 in r20) ?

Maybe i don't read the code and count properly but I feel like there is 1 loop less than it should do ?

Yeah, I think you're right. The increment would need to happen after the compare for this logic to be correct, as you'd expect for a "for" loop, but it happens up front. I'll try to compare to the unoptimized version and with a non-const loop limit later as well to see if there's anything deterministic enough to fix it with trivial post-processing.

 

One thing I did notice is that as expected, the compiler will inject stabs debugging symbols in the assembly if you pass it -g. Even though the compiler is demonstrably broken, I can't help but continue wondering how I could get that to translate into source-level debugging of C code running on the GPU.

Link to comment
Share on other sites

6 hours ago, SCPCD said:

Maybe i don't read the code and count properly but I feel like there is 1 loop less than it should do ?

I think the loop is correct. Last compare is 9 with 9 which is not GT, so it jumps one last time. The parameter r0 is loaded once before entering the loop and then in the delay slot.

 

Anyway, even if this example is correct, one falsely generated code and you cannot trust the compiler anymore. Had this even with commercial compilers. Sometimes I just changed from post to pre increment and the code was correct. 

Edited by 42bs
Link to comment
Share on other sites

2 minutes ago, 42bs said:

I think the loop is correct. Last compare is 9 with 9 which is not GT, so it jumps one last time. The parameter r0 is loaded once before entering the loop and then in the delay slot.

Isn't this the list of comparisons though:

 

(loop iter 1)

9 > 1 == true, branch

(loop iter 2)

9 > 2 == true, branch

(loop iter 3)

9 > 3 == true, branch

(loop iter 4)

9 > 4 == true, branch

(loop iter 5)

9 > 5 == true, branch

(loop iter 6)

9 > 6 == true, branch

(loop iter 7)

9 > 7 == true, branch

(loop iter 8)

9 > 8 == true, branch

(loop iter 9)

9 > 9 == false, fall through

 

So there are 9 iterations, but we should have run 10 (0 - 9) given this C code:

 

for (i = 0; i < 10; ++i)...

 

Further, when you exit the loop, r0 and r19, which appear to be holding the persistent value of 'i' , should be 10, but they will both be 9. Not an issue because there is no further read of them, but I also wonder if adding a subsequent use of 'i' in the C code changes things at all. Interestingly, the code does manage to pass the right parameter value through the call to test() despite the internal counter being off by one. If the comparison had instead been this:

 

cmp     r0,r20 ;rcmpsi r0,r20, not rcmpsi r19,r20 as compiler generated

 

The logic would have been correct using the delay slot move. This is why I wonder if the optimizer defeated the intended logic somehow due to some lack of hazard handling or something. Have to do a few more experiments.

Link to comment
Share on other sites

10 hours ago, agradeneu said:

Correct me if Iam wrong, but before the Sega Dreamcast there was no console hardware with SDRAM, for obvious reasons, mainly cost.

 

 

Playstation: 2 MB EDO DRAM, 1 MB VRAM; 2Kb texture cache on chip.

 

N64: 4,5 MB Rambus DRAM

 

Probably someone is confusing SDRAM with on chip SRAM?

 

 

 

The 7800 had SDRAM, but it was only 4kB.

Link to comment
Share on other sites

Both the Saturn and the 32X used SDRAM. On the Saturn, HIMEM was 1MB of 32-bit wide SDRAM (two 256Kx16 chips); this was where the main game code and data was to reside. LOMEM was 1MB of 16-bit wide DRAM, and was for extra data storage. While you could put code in it, you wouldn't want to for speed reasons.

 

The 32X used 256KB of 16-bit wide SDRAM (one 128Kx16 chip). You want most of your code in the rom, but critical code (like interrupt handlers) should be in SDRAM. The cache in the SH2 keeps the rom code from being too slow, so try not to trash the cache contents too much. Flooding the cache will force it to reload the code from rom, which is slow.

  • Like 4
Link to comment
Share on other sites

  • 1 year later...

What could,ve saved the jaguar was 1,release the jagduo because it looked much much better then the toilet looking jaguar with CD addon on top of it,

2,atari should,ve tried to come up with a 3D version of scrapyard dog to compeat against sega’s sonic and nintendo’s,

3,they should,ve come woth more smooth running 3D textured games for the jaguar CD aside from cartrides to convince people that the jaguar could do more then 16bit graphics,

4,if atari did had a huge sack of money,they could,ve paid nintendo and sega to port mario 64 and sonic extreme to the atari jaguar (remember those 8bit atari days),am sure that would,ve absolutely boost up the sales of the atari jaguar because supermario 64 was all the rage and everyone wanted to jump to 3D so sonic extreme would,ve helped to sell more atari jaguar systems as well (not sure if atari would get their cash back wich they might could & would,ve spended on both sega and nintendo in such case),BUT it could,ve ended up to be more disastrous to atari if those ports flopped on the jaguar as well,mmm.

5,it needed ps1 & N64 ports of games to it,wether the system was truelly or not fully capable of running those games,they should,ve tried such attempts of risks to do so,

6,atari should,ve come with atari classics for the jaguar with emulated atari 2600,5200 and 7800 games on it,including those onces from sega & nintendo in order to also target atari retro fans,even if it would,ve require licenses to put those nintendo & sega games on it,it might,ve been worth it

etc,,,

 

BUT since it all didn’t happen,let’s see if a homebrewer could port supermario 64 and the incompleted saturn version of sonic extreme to the jaguar to see how really capable the jaguar really is.

 

  • Like 1
  • Haha 4
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...