Jump to content
IGNORED

GCC for the TI


insomnia

Recommended Posts

44 minutes ago, TheBF said:

Outsider curiosity question. 

 

Do you need any special incantations to declare a leaf sub-routine? 

Perhaps the compiler simply looks for any calls within the sub-routine and modifies the entry and exit code?

Yep, that's exactly what it does.  No special incantations needed.  Just simply dictated whether you call any other functions or not.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Looks like I'm having trouble with packed structs. 

 

struct __attribute__((__packed__)) DeviceRomHeader {
  unsigned char flag;
  unsigned char version;
  unsigned int reserved1;
  struct EntryLink* powerlnk;
  struct NameLink* userlnk;
  struct NameLink* dsrlnk;
  struct NameLink* basiclnk;
  struct EntryLink* interruptlnk;
  unsigned int* gromsomething;
};

 

When I declare the ROM in an expansion card as that struct, and then read the value of the dsrlnk field, In classic99 ( it has it's own ROM for FIAD access, so I'm using that as reference ), then it reads the wrong value... with lots of byte manipulation instructions... 

 

    struct DeviceRomHeader* dsrrom = (struct DeviceRomHeader*) 0x4000;

 

The value of dsrrom->dsrlnk should be 0x4010, but I get 0x7010 in r1 before my function call...

 

	movb @>4008, r1      ; appears to copy the high byte of dsrrom->dsrlnk to high byte of r1
	sla  r1, >8          ; slide that over to the high byte? but isn't it already there? so, move garbage from low byte of r1 to high byte from an earlier function call return value.
	movb @>4009, r14     ; copy the low byte of dsrrom->dsrlnk to r14 high byte.
	srl  r14, 8          ; slide the high byte into the low byte
	soc  r1, r14         ; merge the high byte of r1 and the low byte of r14 to get the value in r14, except, r1 was corrupted already by that second instruction in this snippet.
	li   r2, ab_uint2hex.1166
	mov  r2, @trampdata  ; stash the call metadata ( banks and target address ) 
	mov  r14, r1         ; move r14 into argument 1 of the function call. r14 was wrong, so r1 will be wrong too.
	bl   *r9             ; call the function that ( r9 will have the address of my trampoline in it, but that isn't relevant. ) 

 

I've been compiling with -O2

 

I believe the second instruction isn't accounting for the prior state of r1.

 

 

Link to comment
Share on other sites

9 hours ago, jedimatt42 said:

When I declare the ROM in an expansion card as that struct, and then read the value of the dsrlnk field, In classic99 ( it has it's own ROM for FIAD access, so I'm using that as reference ), then it reads the wrong value... with lots of byte manipulation instructions... 

Thanks - good catch.  It looks like for some strange reason gcc thinks it must access elements in a packed struct byte-by-byte, even though they are 16-bit values and are already aligned.  I'll have to look into why it does that.

 

The erroneous SLA is coming from GCC itself.  Because it thinks any byte value can be treated as a word value, it thinks it needs to move the LSB of r1 to the MSB before SOC, but of course in TMS9900 it already is in the MSB.  I can see an offset of -1 in operand[1] to the "ashlhi3" (arith left shift) insn so I'll add a call to my offset checking function there and see if I can eliminate that shift.


(edit: just ran a quick test without __packed__ and it generated "mov  @>4008, @x" so a short-term work-around may be just drop the attribute)
 

Edited by khanivore
  • Like 3
  • Thanks 1
Link to comment
Share on other sites

Thanks, @khanivore

 

Looking at all my use of __packed__ it seems I used it all over the place without really considering the need. Things seem to be working well with all the __packed__ removed. I don't think any of my structs actually needed it. My environment variable system works, my cached DSR info works, which meant traversing all the ROMs works... actual file IO works.  A ton more to test, but that's a good start

 

Code should be smaller and faster now too. There was a ton of that accessing ints as a pair of bytes stuff going on before... 

 

  • Like 3
Link to comment
Share on other sites

1 hour ago, jedimatt42 said:

Looking at all my use of __packed__ it seems I used it all over the place without really considering the need. Things seem to be working well with all the __packed__ removed. I don't think any of my structs actually needed it. My environment variable system works, my cached DSR info works, which meant traversing all the ROMs works... actual file IO works.  A ton more to test, but that's a good start

 

Code should be smaller and faster now too. There was a ton of that accessing ints as a pair of bytes stuff going on before... 

Yeah I've looked through the gcc code but I'm not seeing any obvious way to change this behaviour.  It seems that if you declare a struct as __packed__ then gcc says ok fine I will make no assumptions whatsoever about alignment and fetch everything one byte at time.

I think it's highly unlikely that any structs used by the console or others will contain unaligned 16-bit ints so it should be safe to omit the __packed__ attribute.
 

  • Like 2
Link to comment
Share on other sites

Also, just to add that while the SLA is not correct, it isn't as simple as just removing it 'cos then the LSB of r1 will contain junk which will propagate through to the SOC and the address will still be wrong.  So I'd need to emit ANDI r1,0xFF or something which will bloat the code even more.  Maybe best just to make this one an errata.  I had a try at creating a peephole but not easy to match up all the QI and HI operands.

  • Like 3
Link to comment
Share on other sites

-Os works for me too. I recall when I started with gcc for the TI, that it didn't. For ForceCommand this shaved 10k of instructions out. It was around 80k. 

 

I tried the no-function-cse option, and in my case since most of my function calls are to my trampoline, it actually grew in size a little tiny bit (less than 256 bytes). I was shocked how small the difference was. So I guess I save a word per call by having the function address in a register, but then I have more register gymnastics to make up for that.  

 

 

  • Like 1
Link to comment
Share on other sites

 

This is NOT the bug I'm chasing, it turns out... but a clever compiler doing the right thing.  It didn't match the patterns I was expecting, but in trying to explain it, I realize this is just smart assembly that it generated. I'm posting this anyway cause maybe it'll help someone else when reading their .s files... The following is written before I realized...

---------------------------

 

Ok, everything was pretty happy, but I'm chasing a bug I believe the optimizer (-Os)  is introducing into VIRGIL99 ( Gemini browser for ForceCommand ) 

 

So, I have this sys call macro for defining API calls from external executables into the ForceCommand API

FROM: https://github.com/jedimatt42/fcmd/blob/e3c3f68cff5d66b1317333f6f79f681f4162695b/fc_api_template#L14

 

#define FC_SYS *(int *)0x2000

#define DECL_FC_API_CALL(index, func, return_type, arg_sig, args)     \
    static inline return_type func arg_sig                            \
    {                                                                 \
        __asm__("li r0,%0 ; " #func                                   \
                :                                                     \
                : "i"(index));                                        \
        return_type(*tramp) arg_sig = (return_type(*) arg_sig)FC_SYS; \
        return tramp args;                                            \
    }

 

My fcsdk code generator produces fc_api.h with declarations using this macro :

 

#define FC_STRCPY 0x2e
...
// function: char * fc_strcpy(char *d, const char *s)
DECL_FC_API_CALL(FC_STRCPY, fc_strcpy, char *, (char *d, const char *s), (d, s))

 

I call a lot of fc_strcpy in virgil99's b0_keyboard.c

 

So it factored the code generated by my macro, into a function it calls fc_strcpy, instead of in-lining it everywhere it is used. 

 

However, the function generated uses a 'b' instead of 'bl' to call my trampoline (  the result of the line in the macro "return tramp args;"   ( looks funny but the args parameter to the macro has the parenthesis that make it a function call. )  'tramp' is a function pointer to my cartridge entry point for api calls... the address of which is stored at 0x2000 

 

So, most of the time, the function is in-lined, and the code generated looks something like:

 

* Begin inline assembler code
* 576 "/home/matthew/dev/gitlab/jedimatt42/fcmd/example/gcc/fcsdk/fc_api.h" 1
        li r0,>3B ; fc_strset
* 0 "" 2
* End of inline assembler code
        mov  r10, r14
        ai   r14, >A
        li   r3, >100
        clr  r2
        mov  r14, r1
        mov  @>2000, r4
        bl   *r4

 

this is good... this one works... The API ID is loaded into r0, then arguments for the real function call are setup, and then my trampoline is called by the 'bl *r4'  - this trampoline consumes r0 before any C usages of r0 occur... 

 

But that 'extracted' fc_strcpy doesn't generate correctly: 

 

fc_strcpy
* Begin inline assembler code
* 537 "/home/matthew/dev/gitlab/jedimatt42/fcmd/example/gcc/fcsdk/fc_api.h" 1
        li r0,>2E ; fc_strcpy
* 0 "" 2
* End of inline assembler code
        mov  @>2000, r3
        b    *r3

 

The 'b' instead of 'bl' at the end, and then there is no return either... The assembly calling this looks ok:

 

        mov  r14, r2
        li   r1, state+286
        bl   @fc_strcpy

 

Oh, no, I'm totally wrong... by using a trivial 'b', r11 from the bl @fc_strcpy is still set to return to that caller, and all of this should work... 

  • Like 3
Link to comment
Share on other sites

I found my bug, and it was mine alone... 

 

The compiler will issue calls to memcpy when performing things like literal string assignments on the stack... so I tuck the memcpy routine into the pre-amble of all my cartridge banks. But for my external executables, I did something dumb, and had the memcpy symbol literally defined in the linkfile. Now that the compiler has changed, that address was invalid. Things work once it is set to the correct address. 

 

I need to find a way to define that dynamically... Each executable I've written for ForceCommand has a clone of this bad linkfile. 

 

Edit: Looks like linker scripts can use INCLUDE, so my sdk should probably just include a generated linkerscript to include...  

  • Like 4
Link to comment
Share on other sites

12 hours ago, jedimatt42 said:

by using a trivial 'b', r11 from the bl @fc_strcpy is still set to return to that caller, and all of this should work... 

Yeah that's a tailcall optimisation.  If gcc sees that you are calling a function with a similar return to the current function, it does a B to the function instead of a BL to reduce the code size.

  • Like 1
Link to comment
Share on other sites

1 hour ago, khanivore said:

Yeah that's a tailcall optimisation.  If gcc sees that you are calling a function with a similar return to the current function, it does a B to the function instead of a BL to reduce the code size.

Very cool. 😎

Link to comment
Share on other sites

  • 2 weeks later...

I know we've kind of settled down here, but I've got a new one that's a bit beyond me. It appears to be losing a variable, and this code definitely worked on the older versions, it's a recent bug. I didn't notice it right away cause the effect on the game is really subtle.

 

This is a function that watches for the player's shield counter to change, and if it grows from zero, it calls a function to change the graphics to show the shield, and when it hits zero it calls a different function to revert the graphics. The function to show the shield is now never being called (I didn't notice because I mostly test on the F18A side, and because the colors still change I didn't pay much attention.)

 

It seems like the compiler believes that the value in OLDSHIELD was loaded into R1, but it's never loaded anywhere. Instead the C instruction does a direct comparison memory-to-memory. Could this be an incorrect constraint or something like that on compare?

 

playmvbug.txt

Edited by Tursi
Link to comment
Share on other sites

It reproduces in this much simpler snippet:

 

extern int shield,oldshield;
extern void (*shieldsOn)();

void playmv() {
  if (shield != oldshield) {
    if (oldshield == 0) {
      shieldsOn();
    }
  }
}

 

        pseg
        even
        def     playmv
playmv
        c    @shield, @oldshield  test if shield changed
        jeq  L3                   skip ahead if not
        mov  r1, r0               supposed to be testing oldshield against 0, but it thinks R1 <<<<< BUG
        jne  L3                   skip ahead if not 0
        mov  @shieldsOn, r1       else call shields on
        b    *r1
L3
        b    *r11
        .size   playmv, .-playmv
        ref     shieldsOn
        ref     oldshield
        ref     shield
/home/tursilion/newtms9900-gcc/newgcc9900/bin/tms9900-gcc -Os -std=c99 -c -s --save-temp -fno-builtin -fno-function-cse -o playmv.o playmv.c

 

I haven't checked if there's a newer commit than the last released one... am I behind?

 

Edited by Tursi
remove unnused f18a, fix comment
Link to comment
Share on other sites

I'm still learning to make sense of the RTL, but it looks like the issue might be coming from peep-movhi-cmphi, which replaces a mov @oldshield,r1 / c @shield,r1 with c @shield,@oldshield. But the RTL still thinks that oldshield was moved into R1 and tries to use it later:

 

; tsthi-10 : (insn 10 9 11 playmv.c:7 (set (cc0)
;         (reg:HI 1 r1 [orig:21 oldshield.1 ] [21])) 3 {tsthi} (expr_list:REG_DEAD (reg:HI 1 r1 [orig:21 oldshield.1 ] [21])
;         (nil)))

    mov  r1, r0

 

Disabling peepholes does indeed generate correct code:

 

playmv
	mov  @oldshield, r1
	c    @shield, r1
	jeq  L3
	mov  r1, r0
	jne  L3
	mov  @shieldsOn, r1
	b    *r1
L3
	b    *r11

 

It appears the qi version of the peephole might suffer the same problem. Is there a way to tell the compiler hey, we can only use this if we don't need the side effect?

 

Edited by Tursi
Link to comment
Share on other sites

10 hours ago, Tursi said:

I haven't checked if there's a newer commit than the last released one... am I behind?

There is a 1.31 branch with some unreleased fixes but nothing significant.  I need to merge and release those at some point.

 

1 hour ago, Tursi said:

I'm still learning to make sense of the RTL, but it looks like the issue might be coming from peep-movhi-cmphi, which replaces a mov @oldshield,r1 / c @shield,r1 with c @shield,@oldshield. But the RTL still thinks that oldshield was moved into R1 and tries to use it later:

Cool, good analysis, thanks.  I'd say I'm missing a check to see if R1 (operand[0]) is dead.  I just ran a quick check with this patch and it seems to work better:

 

iff --git a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md
index 03273a0..71aeebe 100644
--- a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md
+++ b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md
@@ -288,7 +288,7 @@
    (set (cc0)
         (compare (match_operand:HI 2 "memory_operand" "")
                  (match_dup 0)))]
-  ""
+  "peep2_reg_dead_p(3, operands[0])"
   [(set (cc0) 
         (compare (match_dup 2)
                  (match_dup 1)))]

 

I don't know what the first parameter (3) means though which concerns me slightly.  I'll try to figure that out and test a fix.

2 hours ago, Tursi said:

Is there a way to tell the compiler hey, we can only use this if we don't need the side effect?

Doing the reg dead check should ensure we only remove registers that are not needed later on, so I'll double check to make sure any dropped regs are checked in the peepholes.

  • Like 3
Link to comment
Share on other sites

12 hours ago, jedimatt42 said:

are int arrays always word aligned? what else is always word aligned? 

Yes I believe so.  Any 16-bit value will be 16-bit aligned.  Also pointers, struct starts and ends, function code, function parameters, etc.  There are a bunch of defines in tms9900.h, around line 160, that control these alignments.  The only alignment defined is 16-bit.  I don't think it matters if 32-bit values or floats are not 32-bit aligned though they should still be 16-bit aligned (well maybe no need for floats actually).

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...