Jump to content
IGNORED

GCC for the TI


insomnia

Recommended Posts

Here's a clever one...

 

hchar(11, 0, 32, 32);

 

hchar() is just a macro that wraps vdpmemset...

#define hchar(r, c, ch, cnt) vdpmemset(gIMAGE+((r)<<5)+(c), ch, cnt)

 

... which has this prototype:

void vdpmemset(int pAddr, unsigned char ch, int cnt);

 

The code becomes:

li r3,>0020     r3 (integer) gets a count of 32
movb r3,r2      the count is meant to be copied to r2, but r2 is a byte <--- BUG
li r1,>0160     the address is calculated correctly and set
bl @vdpmemset   function called

 

r2 is getting the wrong byte of r3

 

I was able to confirm this is running through movqi by changing what it emits, but I don't know exactly why, since R3 should have been a HI, and it's constrained to QI. It's a pretty small thing and didn't break this code, but it's kind of insidious. ;)

 

The good news is a good 90% of Super Space Acer is running fine with these changes... and it's a pretty big codebase built on a completely foreign compiler. ;) But will need help on this one too. 

Link to comment
Share on other sites

1 hour ago, Tursi said:

r2 is getting the wrong byte of r3

 

I was able to confirm this is running through movqi by changing what it emits, but I don't know exactly why

;;-------------------------------------------------------------------
;; Move byte value
(define_insn "movqi"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,rR>,Q,  Q,r, r")
        (match_operand:QI 1 "general_operand"      "rR>, Q,  rR>,Q,OM,i"))]
  ""
  {
    tms9900_debug_operands ("movqi", operands, 2);
    tms9900_inline_debug ("; movqi alt=%d\n", which_alternative);
    int val;
    if (which_alternative >= 4)
    {
      val = INTVAL(operands[1]) << 8;
      tms9900_inline_debug ("; movqi imm MUL const %d * 256\n", val);

      if (val == 0)
        return("clr  %0");
      else if ((val & 0xff00) == 0xff00)
        return("seto %0");
      else
      {
        rtx args[2];
        args[0] = operands[0];
        args[1] = GEN_INT(val);
        output_asm_insn ("li   %0, %1", args);
        return "";
      }
    }
    else
    {
      if (REG_OFFSET (operands[1]) == 1) {
        output_asm_insn ("mov  %1, %0", operands);
        output_asm_insn ("swpb %0", operands);
        return "";
      } else
        return ("movb %1, %0");
    }
  }

 

Maybe because the movb at the bottom of the code-insn didn't check for the source operand offset?

 

I've added a check here for a 1 byte offset, and that might work.  I'll leave it for smarter people than me to mess with the length attribute (which I didn't modify or include here).

  • Like 1
Link to comment
Share on other sites

10 hours ago, Tursi said:

 

Unfortunately, this string is in bank-switched ROM, so the write causes a bank switch to occur and the game crashes. The code is certainly valid!

 

Is there any way that I can tell from the C or command line side GCC that it's not allowed to write to ROM, not even for an equality test? My suspicion is no, that this instruction sequence isn't telling GCC that it includes a write since it was never assumed there would be side effects?

 

Ah, interesting side effect!  I never thought of that.  This is being emitting by the insn "tstqi" which the GCC emits when it wants to test if a value is zero.  I could constrain it so this test can only be performed on registers,  or maybe do movb to a scratch,

I can't think of any way to tell gcc about ROM.  These steps are done way before any actual addresses are assigned for instructions.  We could include a compiler switch to select different opcode for ROM or RAM ... but maybe easier to find a non destructive test that works for either.

Link to comment
Share on other sites

5 hours ago, JasonACT said:

Isn't R0 always free for something like this?

Great minds think alike!  I am already doing the same elsewhere in the code.  In the gcc13 port, the compiler asserts on a shift because it doesn't know how to move between reg classes of general and shiftcount.  So I just stopped telling it R0 exists at all and use it as a permanent private scratch register.

  • Like 2
  • Haha 1
Link to comment
Share on other sites

4 hours ago, Tursi said:

r2 is getting the wrong byte of r3

 

I was able to confirm this is running through movqi by changing what it emits, but I don't know exactly why, since R3 should have been a HI, and it's constrained to QI. It's a pretty small thing and didn't break this code, but it's kind of insidious. ;)

 

The good news is a good 90% of Super Space Acer is running fine with these changes... and it's a pretty big codebase built on a completely foreign compiler. ;) But will need help on this one too. 

Sounds like probably another case of gcc using a subreg instead of calling extend, like we saw with "andhi3".  If you turn on inline insn dump (#define TMS9900_INLINE_DEBUG 1) in tms9900.c you'll see a print of the operands and which insn is emitting the code.

I may just have to add checks of REG_OFFSET anywhere we consume a QI

Link to comment
Share on other sites

7 hours ago, JasonACT said:

Maybe because the movb at the bottom of the code-insn didn't check for the source operand offset?

 

I've added a check here for a 1 byte offset, and that might work.  I'll leave it for smarter people than me to mess with the length attribute (which I didn't modify or include here).

Looks good to me.  I'm not sure how valuable the length attribute is.  I think it is used to calculate costs when optimising for size, in which case getting it slightly wrong shouldn't break anything, and also to calculate jump offsets.  We can't jump more than +254 or -256 so we emit a branch if further away than that.  If lengths are conversative conservative (always worst case) then I think the jump offsets should be fine too.

  • Like 4
Link to comment
Share on other sites

14 hours ago, khanivore said:

If lengths are conversative conservative (always worst case) then I think the jump offsets should be fine too.

Yep, I agree, so in that case the 2 should be made a 4 and things should continue to work well.

 

I think I also understand the 1.19 srl/swpb optimisation now too, which is actually in a different area to what I first pointed out, and I'm thinking - expanding one of your new peepholes may be a better way to get it working as it was.. now that the code-base is stable.  I'm not sure how far along you are though, and I'm back at work tomorrow, so there's less hobby time for me now, sadly.

  • Like 1
Link to comment
Share on other sites

19 hours ago, khanivore said:

Ah, interesting side effect!  I never thought of that.  This is being emitting by the insn "tstqi" which the GCC emits when it wants to test if a value is zero.  I could constrain it so this test can only be performed on registers,  or maybe do movb to a scratch,

I can't think of any way to tell gcc about ROM.  These steps are done way before any actual addresses are assigned for instructions.  We could include a compiler switch to select different opcode for ROM or RAM ... but maybe easier to find a non destructive test that works for either.

Yeah, the code generated by just constraining it to registers looks pretty good. ;)

 

I tested JasonACT's suggested fix for movqi as well, and it does look like it fixes my other case.

 

((edit: removed diff))

 

Gonna go a quick rebuild of everything and see how far that gets me ;)

 

Edited by Tursi
Link to comment
Share on other sites

Unfortunately, JasonACT's patch made things much worse, because it didn't work only on registers, but also on memory locations. So although I think it's on the right path, that offset is clearly not used the same way for memory addresses (or maybe it doesn't even get set).

 

I traced down one example, and this is the code in SSA that handles moving right.

 

if (KSCAN_JOYX == JOY_RIGHT) {
		if (playership != SHIP_SELENA) {
			wrapplayerright();
		}
		if (SHIP_C < 224-playerXspeed) SHIP_C+=playerXspeed;
} else if....

 

Compiles down to this with the patch:

 

	movb @SpriteTab+1, r1       get player x as a BYTE
	mov  @playerXspeed, r2      get player speed as a UINT
	movb r1, r4                 copy x
	srl  r4, 8                  make int
	mov  r2, r3                 copy speed
	ai   r3, >FF20              make 224-speed
	neg  r3                     fix up
	c    r4, r3                 compare
	jhe  L105                   skip out if not enough room
	swpb  r2                    make speed a char (sort of)
	ab   r1, r2                 add x
	mov  r2, @SpriteTab+1       write it back <<<--- bug, it's a byte, not a word
	swpb @SpriteTab+1           but then swap it <<<--- bug - don't want this at all
	jmp  L105                   done

 

You can see the new code was inserted there when it was time to write the value back to memory. While we can treat a byte as a 16-bit value on the registers (probably), we absolutely can not when it's a memory address. And given the code was correct before the patch, it seems likely that the new sequence is needed ONLY for register to register moves? 

 

Oddly, move left wasn't affected and still looked fine:

 

	movb @SpriteTab+1, r1       get player x as a BYTE
	mov  @playerXspeed, r2      get player speed as a UINT
	movb r1, r3                 copy X
	srl  r3, 8                  make int
	c    r3, r2                 make sure enough room
	jle  L105                   skip if not
	swpb  r2                    make speed a char (sort of)
	sb   r2, r1                 do the subtract
	movb r1, @SpriteTab+1       write it back
	jmp  L105                   done

 

Link to comment
Share on other sites

So based on that, I did a test of adding a new constraint to movqi just so I could tell if it was a register to register case. This seems to work in all cases in Super Space Acer, at least... and both the cases above resolve. So running with that, here's a diff of the changes I'm running with, all in tms9900.md

 

--- tms9900.md.old      2023-12-30 21:37:29.809899700 -0700
+++ tms9900.md  2024-01-01 03:21:58.648751700 -0700
@@ -109,7 +109,7 @@
 ;; Jump to a subroutine which returns a value
 (define_insn "call"
   [(call (match_operand:HI 0 "general_operand" "rR,Q")
-         (match_operand:HI 1 "general_operand"  "g,g"))
+         (match_operand:HI 1 "general_operand" "g,g"))
   ]
   ""
   {
@@ -194,25 +194,25 @@
 ;;-------------------------------------
 (define_insn "tsthi"
   [(set (cc0)
-       (match_operand:HI 0 "nonimmediate_operand" "rR,Q"))]
+       (match_operand:HI 0 "register_operand" "r"))]
   ""
   {
     tms9900_debug_operands ("tsthi", operands, 1);
     return("mov  %0, %0");
   }
-  [(set_attr "length" "2,6")])
+  [(set_attr "length" "2")])


 ;;-------------------------------------
 (define_insn "tstqi"
   [(set (cc0)
-       (match_operand:QI 0 "nonimmediate_operand" "rR,Q"))]
+       (match_operand:QI 0 "register_operand" "r"))]
   ""
   {
     tms9900_debug_operands ("tstqi", operands, 1);
     return("movb %0, %0");
   }
-  [(set_attr "length" "2,6")])
+  [(set_attr "length" "2")])

 ;;-------------------------------------
 (define_insn "cmphi"
@@ -264,14 +264,14 @@
 ;;-------------------------------------------------------------------
 ;; Move byte value
 (define_insn "movqi"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,rR>,Q,  Q,r, r")
-        (match_operand:QI 1 "general_operand"      "rR>, Q,  rR>,Q,OM,i"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=r,=rR>,rR>,Q,  Q,r, r")
+        (match_operand:QI 1 "general_operand"      " r, rR>, Q, rR>,Q,OM,i"))]
   ""
   {
     tms9900_debug_operands ("movqi", operands, 2);
     tms9900_inline_debug ("; movqi alt=%d\n", which_alternative);
     int val;
-    if (which_alternative >= 4)
+    if (which_alternative >= 5)
     {
       val = INTVAL(operands[1]) << 8;
       tms9900_inline_debug ("; movqi imm MUL const %d * 256\n", val);
@@ -291,10 +291,17 @@
     }
     else
     {
-      return ("movb %1, %0");
+      if ((which_alternative == 0) && (REG_OFFSET (operands[1]) == 1)) {
+        // register to register, word to byte
+        output_asm_insn ("mov  %1, %0", operands);
+        output_asm_insn ("swpb %0", operands);
+        return "";
+      } else {
+        return ("movb %1, %0");
+      }
     }
   }
-  [(set_attr "length" "2,4,4,6,2,4")])
+  [(set_attr "length" "4,2,4,4,6,2,4")])

 ;;-------------------------------------------------------------------
 ;; Move two-byte value

 

(edit: removed spoiler tag cause I think this was missed)

Edited by Tursi
Link to comment
Share on other sites

1 hour ago, Tursi said:
   [(call (match_operand:HI 0 "general_operand" "rR,Q")
-         (match_operand:HI 1 "general_operand"  "g,g"))
+         (match_operand:HI 1 "general_operand" "g,g"))

Constructive criticism...  It was already right before the edit.

 

1 hour ago, Tursi said:
+        (match_operand:QI 1 "general_operand"      " r, rR>, Q, rR>,Q,OM,i"))]

https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gccint/Simple-Constraints.html#Simple-Constraints

Quote
whitespace
Whitespace characters are ignored and can be inserted at any position except the first. This enables each alternative for different operands to be visually aligned in the machine description even if they have different number of constraints and modifiers.

 

Link to comment
Share on other sites

8 hours ago, JasonACT said:

Constructive criticism...  It was already right before the edit.

 

https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gccint/Simple-Constraints.html#Simple-Constraints

 

That's unnecessary, please don't do that.

 

I already fixed it after I posted. Can we stay focused on the actual problems here and not waste bandwidth on nitpicks?

 

Edited by Tursi
Link to comment
Share on other sites

Let me clarify a bit, since that feel like it comes off a bit harsh even after I tried to tone it down. ;)

 

I'm still figuring this out, and I am indeed learning from the things you post. However, I shouldn't have to worry every time I reach for 'submit' that JasonACT is going to nitpick the irrelevant parts of my post. I'm only communicating concepts here, not final code. I trust khanivore to read the diff, extract what he feels is useful, and discard the rest, including the incorrect whitespace.

 

Since I'm testing every step, it's pretty late by the time I am posting conclusions, and I am far more interested in getting to bed than fixing whitespace.

 

If I wanted to submit a final patch, I'd be doing that over at Github, where it's not only much easier for khanivore, but hidden from extra eyes. ;)

 

In short, I'm asking for you not to be searching for the easy parts to pick off, but only the relevant parts. Pick away at those! ;)

 

  • Like 1
Link to comment
Share on other sites

On 1/1/2024 at 10:10 AM, Tursi said:

Unfortunately, JasonACT's patch made things much worse, because it didn't work only on registers, but also on memory locations. So although I think it's on the right path, that offset is clearly not used the same way for memory addresses (or maybe it doesn't even get set).

Yeah that's a gotcha I've hit a few times.  Many opcodes are 16-bit only and we have to be careful to ensure we only ever use them on regs.  I'm starting to use R0 as a permanent scratch reg for some cases like this.  The reloads are very messy in gcc13 since most instructions in tms9900 affect the status reg so isolating moves from compares is a nightmare otherwise.

  • Like 1
Link to comment
Share on other sites

On 1/1/2024 at 5:47 AM, JasonACT said:

I'm not sure how far along you are though

I'm deep down a rabbit hole with gcc13.  The refactor was bigger than I thought.  It is nearly working now and an initial comparison of the output of gcc13 and gcc4 is that the assembly code is about 5% smaller with gcc13.  Which isn't that much, but gcc13 does allow more flexibility to define implicit compares and a few other bits so might be able to improve that further.  I did also find and fix a few issues in the md file along the way which I can apply to the gcc4 branch as well.  So I might maintain both in the short term.
 

On 1/1/2024 at 5:47 AM, JasonACT said:

I'm back at work tomorrow, so there's less hobby time for me now, sadly

At least that's one problem I don't have any more 🙂

Link to comment
Share on other sites

24 minutes ago, khanivore said:

I'm deep down a rabbit hole with gcc13.  The refactor was bigger than I thought.  It is nearly working now and an initial comparison of the output of gcc13 and gcc4 is that the assembly code is about 5% smaller with gcc13.  Which isn't that much, but gcc13 does allow more flexibility to define implicit compares and a few other bits so might be able to improve that further.

Feel free to let me know when I can test compiling gcc13 on my mac, still having no luck with gcc4 on modern macos's here...

Link to comment
Share on other sites

4 hours ago, TheMole said:

Feel free to let me know when I can test compiling gcc13 on my mac, still having no luck with gcc4 on modern macos's here...

Thanks.  I've a bit of a way to go yet.  There is a dev branch up in the repo but there are no scripts to create patches or anything like that yet.  No libgcc either and limited testing.

Link to comment
Share on other sites

20 hours ago, khanivore said:

Yeah that's a gotcha I've hit a few times.  Many opcodes are 16-bit only and we have to be careful to ensure we only ever use them on regs.  I'm starting to use R0 as a permanent scratch reg for some cases like this.  The reloads are very messy in gcc13 since most instructions in tms9900 affect the status reg so isolating moves from compares is a nightmare otherwise.

I've decided to go with this for movqi, since I think it covers some extra scenarios:

 

    else
    {
      if ((GET_CODE(operands[0]) == REG) && (GET_CODE(operands[1]) == REG)) {
        output_asm_insn ("mov  %1, %0", operands);
        if (REG_OFFSET (operands[0]) != REG_OFFSET (operands[1]))
            output_asm_insn ("swpb %0", operands);
        return "";
      } else
        return ("movb %1, %0");
    }
  }
  [(set_attr "length" "4,4,4,6,2,4")])

 

18 hours ago, khanivore said:

I'm deep down a rabbit hole with gcc13.  The refactor was bigger than I thought.  It is nearly working now and an initial comparison of the output of gcc13 and gcc4 is that the assembly code is about 5% smaller with gcc13.  Which isn't that much, but gcc13 does allow more flexibility to define implicit compares and a few other bits so might be able to improve that further.  I did also find and fix a few issues in the md file along the way which I can apply to the gcc4 branch as well.  So I might maintain both in the short term.

13 hours ago, khanivore said:

There is a dev branch up in the repo

I see it, wow, what a difference - welcome to Wonderland.  I strangely find myself in Oz (meh, story of my life, Aussie Aussie Aussie, Oi Oi Oi) but being this heartless character:

 

tinman.gif.96fddab5d22368f465ddb63d33b21cab.gif

 

I think he's likeable enough though.

  • Like 2
  • Haha 2
Link to comment
Share on other sites

On 1/1/2024 at 10:10 AM, Tursi said:

You can see the new code was inserted there when it was time to write the value back to memory. While we can treat a byte as a 16-bit value on the registers (probably), we absolutely can not when it's a memory address. And given the code was correct before the patch, it seems likely that the new sequence is needed ONLY for register to register moves? 

Yes, offsets of a byte within a subreg should only apply to registers.  If you could turn on the inline rtl dumps we could learn where the opcodes come from.  e.g. :

; zero_extendqihi2-5
; OP0 : (reg:HI 2 r2 [27])code=[reg:HI]
; OP1 : (reg:QI 2 r2 [orig:22 color.0 ] [22])code=[reg:QI]

        srl  r2, 8


And it will tell you if subregs are being used and what their offset is so we could track own missing extends.  It would be great if we could come up with a generic predicate pattern that captures and expands usage of subregs with offsets.  Insomnia had inserted an extra pass in the compiler to try and do this but I have disabled it as I don't think its the right way to go.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...