Jump to content
IGNORED

GCC for the TI


insomnia

Recommended Posts

13 hours ago, khanivore said:

(I'm also not sure why it feels it has to upgrade 8-bit values to 16-bit values when doing binary AND.  It doesn't do that for OR ops)

Well, it has to if it's ANDing with an INT, which was my case ;)

 

Will give this a shot tonight :)

Link to comment
Share on other sites

Finished my testing, I have nothing new to report. Everything seemed to work and I've pushed updates to all my repos (nothing to worry about unless you pulled down the versions over the last couple days with R15 as the stack pointer).

 

My new libti99all still has trouble in 64 column mode, but I'm pretty confused by that since it's the same printf code running on every mode. But, it only happens on the new compiler, on the new libti99, and only in 64 column mode. I'll eventually figure it out but with nothing to give you right now let's assume it's my bug - SDCC also chokes on it for some different reason.

 

I can't test Super Space Acer as the code currently doesn't link (for a known fault that's all mine), but I'll be working on that over the next little bit.

 

So for now, good work and thank you! 

 

  • Like 4
Link to comment
Share on other sites

On 12/19/2023 at 8:42 PM, Tursi said:

My new libti99all still has trouble in 64 column mode, but I'm pretty confused by that since it's the same printf code running on every mode. But, it only happens on the new compiler, on the new libti99, and only in 64 column mode. I'll eventually figure it out but with nothing to give you right now let's assume it's my bug

vdpchar64() has cached last_offset in a local static, but you've not cached buf[4] in a local static.

  • Like 2
Link to comment
Share on other sites

17 hours ago, PeteE said:

Regarding the size of the generated code, here are some instruction comparisons I made from my C program compiled with GCC patch 1.19 (on the left) versus 1.27 (on the right.)

 

image.png.2885c566aadf3cf6f9d0d23fef5aaa13.png

image.png.be3269f6a252941b19ebc19f5aa62a14.png

image.png.3fe2601b6f06bc28d5c3f91a853d5c33.png

image.png.4c457875314af119970ee21339f02f41.png

image.png.ec374f7e0cf89bac0882bd447569faca.png

 

Thanks @PeteE, I am looking into those.  The first two are easy fixes.   ANDI+CI is definitely less efficient than LI/SETO+CB in compqi.  

I'm not sure where the last two are from but looks like a bit of paranoia around byte ordering that could be improved.  Will look into it as well.

If you have C fragments that generate these sequences that would be very helpful.

  • Like 1
Link to comment
Share on other sites

13 hours ago, JasonACT said:

vdpchar64() has cached last_offset in a local static, but you've not cached buf[4] in a local static.

hahah! Brilliant! That was it. ;)

 

I didn't write the 64 column code, and I was trying to touch some of it up when I ported it, and that was where I broke it. Thanks! That makes my day ;)

 

(I didn't know anyone had even found the new repo yet... ;) )

(Doesn't fix the Coleco build, but, there's something deeper going wrong there. I haven't spent any time looking at it yet cause I want to rewrite testlib anyway.)

 

Edited by Tursi
  • Like 5
Link to comment
Share on other sites

On 12/21/2023 at 10:10 PM, khanivore said:

If you have C fragments that generate these sequences that would be very helpful.

volatile unsigned char b;
volatile unsigned int i;

// Compile with: -Os

void test () {
    if (b == 0xFF) { // andi + ci
        b = 1;
    } else {
        b = (i >> 8); // srl + swpb + mov
        if (b == 8) // andi + ci
            b = 1;
    }
    b &= 0x07; // li + szcb
}

 

-Os output:

Spoiler

    pseg
    even

    def    test
test
    movb @b, r1
    andi r1, >ff00
    ci   r1, >FF00
    jne  L2
    li   r1, >100
    movb r1, @b
    jmp  L3
L2
    mov  @i, r1
    srl  r1, >8
    swpb  r1
    movb r1, @b
    movb @b, r1
    andi r1, >ff00
    ci   r1, >800
    jne  L3
    li   r2, >100
    movb r2, @b
L3
    movb @b, r1
    li   r2, >F8FF
    szcb r2, r1
    movb r1, @b
    b    *r11
    .size    test, .-test
    cseg

    even
    def b
b
    bss 1

    even
    def i
i
    bss 2
 

-O2 output: (horrible, in so many ways)

Spoiler

    pseg
    even

    def    test
test
    movb @b, r1
    andi r1, >ff00
    ci   r1, >FF00
    jeq  L6
    mov  @i, r1
    srl  r1, >8
    swpb  r1
    movb r1, @b
    movb @b, r1
    andi r1, >ff00
    ci   r1, >800
    jeq  L7
    movb @b, r1
    li   r2, >F8FF
    szcb r2, r1
    movb r1, @b
    b    *r11
    jmp  L8
L7
    li   r2, >100
    movb r2, @b
    movb @b, r1
    li   r2, >F8FF
    szcb r2, r1
    movb r1, @b
    b    *r11
    jmp  L8
L6
    li   r1, >100
    movb r1, @b
    movb @b, r1
    li   r2, >F8FF
    szcb r2, r1
    movb r1, @b
    b    *r11
L8
    .size    test, .-test
    cseg

    even
    def b
b
    bss 1

    even
    def i
i
    bss 2
 

 

Link to comment
Share on other sites

I found this bug a few days ago, but it took me until now to fully understand it and be sure it's a compiler bug. I am pretty sure this is an old one.

 

I have a simple difference function that looks like this:

 

// return the distance between a and b
unsigned char distance(unsigned char a, unsigned char b) {
	if (a>b) return a-b; else return b-a;
}

 

This compiles to this assembly code:

  cb r1,r2  compare a to b
  jle aed4  jump if less than or equal

  sb r2,r1  was greater than, so subtract r2 from r1, result in r1
  jmp aed8  go to return (r1 is return value)

aed4:
  sb r2,r1  subtract r2 from r1 again
  neg r1    but negate (not invert!) the result so the sign is right <--- BUG because the LSB is undefined
  
aed8:
  ret       back to caller

 

The problem is the NEG in the second half there. Because the least significant byte is not necessarily initialized, the effect on the most significant byte is going to be just an INV most of the time. I noticed it when my values were equal, but it seems it would be off-by-one in half the cases.

NEG is described as getting the two's complement by inverting, and then adding one. So if we walk through my particular case, imagine R1 and R2 have previous garbage 'XXXX'. a and b are loaded, so we have "aaXX" and "bbXX". Assuming a and b are equal, we jump into the second case there. After the subtract, we have "00XX".

 

Now the NEG runs to fix the sign, NEG of >0000 is zero (inverts to >FFFF, add one back to >0000), but because the least significant byte is unknown, that only works if we entered with it cleared. In this case "00XX" becomes "FFYY", add one and it's "FFYZ". The MSB has not been changed at all, so we return >FF instead of >00.

I guess the simple fix is getting it to clear the least significant byte before the NEG, but why doesn't it just "sb r1,r2" instead of the two-step dance?

Okay, while implementing my own workaround it became clear. The compiler only expects r1 to change, not r2. I just added an andi r1,>ff00 to my version.

Edited by Tursi
Link to comment
Share on other sites

Second question -- is there a way to specify a byte register as input to inline assembly? I even dared to peek in the compiler TMS9900 code but it looks like the answer is no.

 

The reason is I attempted to rewrite my code there in assembly, but passing the value as "r"(a) made the compiler copy it to a new register and shift it to make an int of it, which is unecessary. (I can work around this, but it struck me as interesting there wasn't a register class for it? (Is that even the right term? ;) ))

 

Link to comment
Share on other sites

On 12/19/2023 at 5:00 AM, Tursi said:

Well, it has to if it's ANDing with an INT, which was my case ;)

 

Will give this a shot tonight :)

Forgot to mention that I had tried changing the param to a char but it generated the same code.  Weirdly gcc-13 does generate proper byte opcodes if all params are bytes.

Link to comment
Share on other sites

On 12/24/2023 at 12:56 AM, JasonACT said:
b = (i >> 8); // srl + swpb + mov

Ok, but in this case it is doing exactly what you asked.  shift an int, truncate it to a byte and store the result.  I had thought the srl 8 was coming from an extend of char to int and then swpb was coming from trunc which would be redundant.

Link to comment
Share on other sites

On 12/24/2023 at 9:42 PM, Tursi said:

The problem is the NEG in the second half there. Because the least significant byte is not necessarily initialized, the effect on the most significant byte is going to be just an INV most of the time. I noticed it when my values were equal, but it seems it would be off-by-one in half the cases.

NEG is described as getting the two's complement by inverting, and then adding one. So if we walk through my particular case, imagine R1 and R2 have previous garbage 'XXXX'. a and b are loaded, so we have "aaXX" and "bbXX". Assuming a and b are equal, we jump into the second case there. After the subtract, we have "00XX".

 

Now the NEG runs to fix the sign, NEG of >0000 is zero (inverts to >FFFF, add one back to >0000), but because the least significant byte is unknown, that only works if we entered with it cleared. In this case "00XX" becomes "FFYY", add one and it's "FFYZ". The MSB has not been changed at all, so we return >FF instead of >00.

I guess the simple fix is getting it to clear the least significant byte before the NEG, but why doesn't it just "sb r1,r2" instead of the two-step dance?

Okay, while implementing my own workaround it became clear. The compiler only expects r1 to change, not r2. I just added an andi r1,>ff00 to my version.

I've had a TODO on that piece of code for ages to understand why we emit a NEG.  It seems we are telling the compiler we will accept operands to a subtract in either order and we will negate the result if you give them to us in the wrong order .  Which seems wrong to me.  I'd prefer to tell gcc not to do that.  It may save GCC from emitting a MOV to another reg but adding in the NEG surely obviates that.  And of course, neg is 16-bit so will mess up byte subtracts as you said.

(define_insn "subqi3"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,Q,rR>,Q,rR>,Q,rR>,Q")
        (minus:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,0,0,rR>,rR>,Q,Q")
                  (match_operand:QI 2 "nonimmediate_operand" "rR>,rR>,Q,Q,0,0,0,0")))
   (clobber (reg:CC CC_REGNUM))]
  ""
  {
    tms9900_debug_operands ("subqi3", operands, 3);
    if(which_alternative < 4)
    {
      output_asm_insn("sb   %2, %0",operands);
    }
    else
    {
      output_asm_insn("sb   %1, %0",operands);
      output_asm_insn("neg  %0",operands);
    }
    return("");
  }
  [(set_attr "length" "2,4,4,6, 4,6,8,10")])


 

Link to comment
Share on other sites

On 12/24/2023 at 10:10 PM, Tursi said:

Second question -- is there a way to specify a byte register as input to inline assembly? I even dared to peek in the compiler TMS9900 code but it looks like the answer is no.

 

The reason is I attempted to rewrite my code there in assembly, but passing the value as "r"(a) made the compiler copy it to a new register and shift it to make an int of it, which is unecessary. (I can work around this, but it struck me as interesting there wasn't a register class for it? (Is that even the right term? ;) ))

 

Hmm.  Not that I know.  The predicates in the md file have types QI, HI, SI, DI for 8,16,32,64.  But the constraints just have "r" for reg.  There are register classes, but they are for register usage like "general", "shift operand", "CRU", etc.

Link to comment
Share on other sites

4 hours ago, khanivore said:

I've had a TODO on that piece of code for ages to understand why we emit a NEG.  It seems we are telling the compiler we will accept operands to a subtract in either order and we will negate the result if you give them to us in the wrong order .  Which seems wrong to me.  I'd prefer to tell gcc not to do that.  It may save GCC from emitting a MOV to another reg but adding in the NEG surely obviates that.  And of course, neg is 16-bit so will mess up byte subtracts as you said.

I dunno about that. After trying to do it myself a few ways, I started to appreciate this solution as elegant. MOV to a new register may only be one instruction, but you can't predict how many instructions were required to make that register available or whether having it unchanged would improve other code. Starting the function with ANDI R1,>FF00 solves the issue and doesn't cost much. That said, I guess r2 is passed in there anyway, having it destroyed probably won't hurt most of the time, but you'll need to move back to R1 anyway, so that path is still two instructions long.

Link to comment
Share on other sites

8 hours ago, khanivore said:

Ok, but in this case it is doing exactly what you asked.  shift an int, truncate it to a byte and store the result.  I had thought the srl 8 was coming from an extend of char to int and then swpb was coming from trunc which would be redundant.

Ok, but I tried it in 1.19 and it knew to optimise it.

Link to comment
Share on other sites

8 hours ago, JasonACT said:

Ok, but I tried it in 1.19 and it knew to optimise it.

Well, 1.19 was missing the extends from char to int so a lot of times it gave wrong results for char operations. I haven't built 1.19 to check but it may be a case of it just happened to work.  I don't think it optimised it.

Link to comment
Share on other sites

1 hour ago, khanivore said:

Well, 1.19 was missing the extends from char to int so a lot of times it gave wrong results for char operations. I haven't built 1.19 to check but it may be a case of it just happened to work.  I don't think it optimised it.

This is going from an int to a char though, I suspect (as I've never really looked at a GCC .md file before, so I have very few clues here) that it has something to do with the optimisation near the bottom of the .md file, where you've added a comment "TODO will this help the case where char c = (int)x >> 8; ??" - my suspicion being it was optimising before in 1.19 but now isn't for some reason.

 

But I really don't know.

Link to comment
Share on other sites

1 hour ago, JasonACT said:

This is going from an int to a char though, I suspect (as I've never really looked at a GCC .md file before, so I have very few clues here) that it has something to do with the optimisation near the bottom of the .md file, where you've added a comment "TODO will this help the case where char c = (int)x >> 8; ??" - my suspicion being it was optimising before in 1.19 but now isn't for some reason.

 

But I really don't know.

Yeah, could be.  TBH I haven't look in-depth at the peepholes yet.  I wanted to make sure we have functioning code first and then fast/small code second.  But we're getting to the point now that the code works but can be improved so this is the right time.

  • Like 4
Link to comment
Share on other sites

I'm gonna have to ask for help on this one! :)

 

Troubleshooting a crash in Super Space Acer, I have a case of this code:

 

if (sSkillText[txtIdx][0] == '\0') {

 

Where sSkillText is an array of pointers to const char. The array is in ROM and the strings themselves are also in ROM.

 

char * const sSkillText[] = {

 

The code is assuming the pointer indexed is always valid and checking if it's empty string. GCC wants to test if the first byte is NUL, so it gets the pointer, and moves the byte to itself:

 

mov *r3,r2    get the string pointer from the array
movb *r2,*r2  test if the first byte is zero  <--- BUG, but not really!

 

Unfortunately, this string is in bank-switched ROM, so the write causes a bank switch to occur and the game crashes. The code is certainly valid!

 

Is there any way that I can tell from the C or command line side GCC that it's not allowed to write to ROM, not even for an equality test? My suspicion is no, that this instruction sequence isn't telling GCC that it includes a write since it was never assumed there would be side effects?

 

Link to comment
Share on other sites

38 minutes ago, Tursi said:

Is there any way that I can tell from the C or command line side GCC that it's not allowed to write to ROM, not even for an equality test? My suspicion is no, that this instruction sequence isn't telling GCC that it includes a write since it was never assumed there would be side effects?

How about:

 

void test () {

    volatile char c;

    c = sSkillTest[txtIdx][0];

    if (!c) ...

}

 

40 minutes ago, Tursi said:
char * const sSkillText[] = {

BTW: This is saying your array of pointers is constant, but the strings being pointed to are not.  That would require "const char *" but doesn't help here (though it probably should).

  • Like 1
Link to comment
Share on other sites

Yeah, I knew someone would call that const bit out, but I was trying to stay focused on the actual problem. ;) 

 

Your suggested code might work, since it moves the test to RAM, but I'm not okay with hunting through all my code to change it to move comparisons to temporary variables. Too large a task with too much chance of missing something.

 

Instead I attacked the compiler. 

 

;;-------------------------------------
(define_insn "tsthi"
  [(set (cc0)
        (match_operand:HI 0 "nonimmediate_operand" "rR,Q"))]
  ""
  {
    tms9900_debug_operands ("tsthi", operands, 1);
    return("mov  %0, %0");
  }
  [(set_attr "length" "2,6")])


;;-------------------------------------
(define_insn "tstqi"
  [(set (cc0)
        (match_operand:QI 0 "nonimmediate_operand" "rR,Q"))]
  ""
  {
    tms9900_debug_operands ("tstqi", operands, 1);
    return("movb %0, %0");
  }
  [(set_attr "length" "2,6")])
  
;;-------------------------------------

 

Changed these two to "register_operand" "r" (and length "2"), and the resulting code became:

 

    mov  *r3, r2
    movb *r2, r1
    jne  L51

 

Which is about as clean as I could ask for, except for needing a temporary register!

 

I tried to write a peephole optimizer for code like mov *r1,r2 / mov r2,r2, but I didn't see any generated in any of my code, so I left it out. I think the condition code set for MOV works correctly so it knows it's not needed. It does not seem to work correctly for ANDI, though I couldn't see why not as it seems to be included in the same place.

 

In analyzing the code, I don't think the mechanism of LI R3,<address> / BL *R3 pays off very often. I see it used a lot for a single instance, and even repeatedly in the same function. It costs two more bytes and 16 more cycles than just BL @<address>. I tried changing the call functions to only accept an address, but that failed to build... I think because the compiler was trying to fit a register call. I couldn't see how to have it not move the function address into a register in the first place... one step at a time, I guess.  (Edit: the part that particularly annoyed me makes sense now, so I'm going back to being unsure whether the function call mechanism is good or not... ;) )

 

Still testing, but I don't think it breaks anything else, so I'll propose that as a change. As nice as it is to be able to test an address for 0 without a register, there's too much chance of side effects when working on modern cartridges. ;)
 

Edited by Tursi
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...