+khanivore Posted March 5 Share Posted March 5 44 minutes ago, TheBF said: Outsider curiosity question. Do you need any special incantations to declare a leaf sub-routine? Perhaps the compiler simply looks for any calls within the sub-routine and modifies the entry and exit code? Yep, that's exactly what it does. No special incantations needed. Just simply dictated whether you call any other functions or not. 1 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 10 Share Posted March 10 Looks like I'm having trouble with packed structs. struct __attribute__((__packed__)) DeviceRomHeader { unsigned char flag; unsigned char version; unsigned int reserved1; struct EntryLink* powerlnk; struct NameLink* userlnk; struct NameLink* dsrlnk; struct NameLink* basiclnk; struct EntryLink* interruptlnk; unsigned int* gromsomething; }; When I declare the ROM in an expansion card as that struct, and then read the value of the dsrlnk field, In classic99 ( it has it's own ROM for FIAD access, so I'm using that as reference ), then it reads the wrong value... with lots of byte manipulation instructions... struct DeviceRomHeader* dsrrom = (struct DeviceRomHeader*) 0x4000; The value of dsrrom->dsrlnk should be 0x4010, but I get 0x7010 in r1 before my function call... movb @>4008, r1 ; appears to copy the high byte of dsrrom->dsrlnk to high byte of r1 sla r1, >8 ; slide that over to the high byte? but isn't it already there? so, move garbage from low byte of r1 to high byte from an earlier function call return value. movb @>4009, r14 ; copy the low byte of dsrrom->dsrlnk to r14 high byte. srl r14, 8 ; slide the high byte into the low byte soc r1, r14 ; merge the high byte of r1 and the low byte of r14 to get the value in r14, except, r1 was corrupted already by that second instruction in this snippet. li r2, ab_uint2hex.1166 mov r2, @trampdata ; stash the call metadata ( banks and target address ) mov r14, r1 ; move r14 into argument 1 of the function call. r14 was wrong, so r1 will be wrong too. bl *r9 ; call the function that ( r9 will have the address of my trampoline in it, but that isn't relevant. ) I've been compiling with -O2 I believe the second instruction isn't accounting for the prior state of r1. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 10 Share Posted March 10 (edited) 9 hours ago, jedimatt42 said: When I declare the ROM in an expansion card as that struct, and then read the value of the dsrlnk field, In classic99 ( it has it's own ROM for FIAD access, so I'm using that as reference ), then it reads the wrong value... with lots of byte manipulation instructions... Thanks - good catch. It looks like for some strange reason gcc thinks it must access elements in a packed struct byte-by-byte, even though they are 16-bit values and are already aligned. I'll have to look into why it does that. The erroneous SLA is coming from GCC itself. Because it thinks any byte value can be treated as a word value, it thinks it needs to move the LSB of r1 to the MSB before SOC, but of course in TMS9900 it already is in the MSB. I can see an offset of -1 in operand[1] to the "ashlhi3" (arith left shift) insn so I'll add a call to my offset checking function there and see if I can eliminate that shift. (edit: just ran a quick test without __packed__ and it generated "mov @>4008, @x" so a short-term work-around may be just drop the attribute) Edited March 10 by khanivore 3 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 10 Share Posted March 10 Thanks, @khanivore Looking at all my use of __packed__ it seems I used it all over the place without really considering the need. Things seem to be working well with all the __packed__ removed. I don't think any of my structs actually needed it. My environment variable system works, my cached DSR info works, which meant traversing all the ROMs works... actual file IO works. A ton more to test, but that's a good start. Code should be smaller and faster now too. There was a ton of that accessing ints as a pair of bytes stuff going on before... 3 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 10 Share Posted March 10 1 hour ago, jedimatt42 said: Looking at all my use of __packed__ it seems I used it all over the place without really considering the need. Things seem to be working well with all the __packed__ removed. I don't think any of my structs actually needed it. My environment variable system works, my cached DSR info works, which meant traversing all the ROMs works... actual file IO works. A ton more to test, but that's a good start. Code should be smaller and faster now too. There was a ton of that accessing ints as a pair of bytes stuff going on before... Yeah I've looked through the gcc code but I'm not seeing any obvious way to change this behaviour. It seems that if you declare a struct as __packed__ then gcc says ok fine I will make no assumptions whatsoever about alignment and fetch everything one byte at time. I think it's highly unlikely that any structs used by the console or others will contain unaligned 16-bit ints so it should be safe to omit the __packed__ attribute. 2 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 10 Share Posted March 10 Also, just to add that while the SLA is not correct, it isn't as simple as just removing it 'cos then the LSB of r1 will contain junk which will propagate through to the SOC and the address will still be wrong. So I'd need to emit ANDI r1,0xFF or something which will bloat the code even more. Maybe best just to make this one an errata. I had a try at creating a peephole but not easy to match up all the QI and HI operands. 3 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 10 Share Posted March 10 -Os works for me too. I recall when I started with gcc for the TI, that it didn't. For ForceCommand this shaved 10k of instructions out. It was around 80k. I tried the no-function-cse option, and in my case since most of my function calls are to my trampoline, it actually grew in size a little tiny bit (less than 256 bytes). I was shocked how small the difference was. So I guess I save a word per call by having the function address in a register, but then I have more register gymnastics to make up for that. 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 18 Share Posted March 18 This is NOT the bug I'm chasing, it turns out... but a clever compiler doing the right thing. It didn't match the patterns I was expecting, but in trying to explain it, I realize this is just smart assembly that it generated. I'm posting this anyway cause maybe it'll help someone else when reading their .s files... The following is written before I realized... --------------------------- Ok, everything was pretty happy, but I'm chasing a bug I believe the optimizer (-Os) is introducing into VIRGIL99 ( Gemini browser for ForceCommand ) So, I have this sys call macro for defining API calls from external executables into the ForceCommand API FROM: https://github.com/jedimatt42/fcmd/blob/e3c3f68cff5d66b1317333f6f79f681f4162695b/fc_api_template#L14 #define FC_SYS *(int *)0x2000 #define DECL_FC_API_CALL(index, func, return_type, arg_sig, args) \ static inline return_type func arg_sig \ { \ __asm__("li r0,%0 ; " #func \ : \ : "i"(index)); \ return_type(*tramp) arg_sig = (return_type(*) arg_sig)FC_SYS; \ return tramp args; \ } My fcsdk code generator produces fc_api.h with declarations using this macro : #define FC_STRCPY 0x2e ... // function: char * fc_strcpy(char *d, const char *s) DECL_FC_API_CALL(FC_STRCPY, fc_strcpy, char *, (char *d, const char *s), (d, s)) I call a lot of fc_strcpy in virgil99's b0_keyboard.c So it factored the code generated by my macro, into a function it calls fc_strcpy, instead of in-lining it everywhere it is used. However, the function generated uses a 'b' instead of 'bl' to call my trampoline ( the result of the line in the macro "return tramp args;" ( looks funny but the args parameter to the macro has the parenthesis that make it a function call. ) 'tramp' is a function pointer to my cartridge entry point for api calls... the address of which is stored at 0x2000 So, most of the time, the function is in-lined, and the code generated looks something like: * Begin inline assembler code * 576 "/home/matthew/dev/gitlab/jedimatt42/fcmd/example/gcc/fcsdk/fc_api.h" 1 li r0,>3B ; fc_strset * 0 "" 2 * End of inline assembler code mov r10, r14 ai r14, >A li r3, >100 clr r2 mov r14, r1 mov @>2000, r4 bl *r4 this is good... this one works... The API ID is loaded into r0, then arguments for the real function call are setup, and then my trampoline is called by the 'bl *r4' - this trampoline consumes r0 before any C usages of r0 occur... But that 'extracted' fc_strcpy doesn't generate correctly: fc_strcpy * Begin inline assembler code * 537 "/home/matthew/dev/gitlab/jedimatt42/fcmd/example/gcc/fcsdk/fc_api.h" 1 li r0,>2E ; fc_strcpy * 0 "" 2 * End of inline assembler code mov @>2000, r3 b *r3 The 'b' instead of 'bl' at the end, and then there is no return either... The assembly calling this looks ok: mov r14, r2 li r1, state+286 bl @fc_strcpy Oh, no, I'm totally wrong... by using a trivial 'b', r11 from the bl @fc_strcpy is still set to return to that caller, and all of this should work... 3 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 18 Share Posted March 18 I found my bug, and it was mine alone... The compiler will issue calls to memcpy when performing things like literal string assignments on the stack... so I tuck the memcpy routine into the pre-amble of all my cartridge banks. But for my external executables, I did something dumb, and had the memcpy symbol literally defined in the linkfile. Now that the compiler has changed, that address was invalid. Things work once it is set to the correct address. I need to find a way to define that dynamically... Each executable I've written for ForceCommand has a clone of this bad linkfile. Edit: Looks like linker scripts can use INCLUDE, so my sdk should probably just include a generated linkerscript to include... 4 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 18 Share Posted March 18 12 hours ago, jedimatt42 said: by using a trivial 'b', r11 from the bl @fc_strcpy is still set to return to that caller, and all of this should work... Yeah that's a tailcall optimisation. If gcc sees that you are calling a function with a similar return to the current function, it does a B to the function instead of a BL to reduce the code size. 1 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 18 Share Posted March 18 1 hour ago, khanivore said: Yeah that's a tailcall optimisation. If gcc sees that you are calling a function with a similar return to the current function, it does a B to the function instead of a BL to reduce the code size. Very cool. 😎 Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 31 Share Posted March 31 (edited) I know we've kind of settled down here, but I've got a new one that's a bit beyond me. It appears to be losing a variable, and this code definitely worked on the older versions, it's a recent bug. I didn't notice it right away cause the effect on the game is really subtle. This is a function that watches for the player's shield counter to change, and if it grows from zero, it calls a function to change the graphics to show the shield, and when it hits zero it calls a different function to revert the graphics. The function to show the shield is now never being called (I didn't notice because I mostly test on the F18A side, and because the colors still change I didn't pay much attention.) It seems like the compiler believes that the value in OLDSHIELD was loaded into R1, but it's never loaded anywhere. Instead the C instruction does a direct comparison memory-to-memory. Could this be an incorrect constraint or something like that on compare? playmvbug.txt Edited March 31 by Tursi Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 31 Share Posted March 31 (edited) It reproduces in this much simpler snippet: extern int shield,oldshield; extern void (*shieldsOn)(); void playmv() { if (shield != oldshield) { if (oldshield == 0) { shieldsOn(); } } } pseg even def playmv playmv c @shield, @oldshield test if shield changed jeq L3 skip ahead if not mov r1, r0 supposed to be testing oldshield against 0, but it thinks R1 <<<<< BUG jne L3 skip ahead if not 0 mov @shieldsOn, r1 else call shields on b *r1 L3 b *r11 .size playmv, .-playmv ref shieldsOn ref oldshield ref shield /home/tursilion/newtms9900-gcc/newgcc9900/bin/tms9900-gcc -Os -std=c99 -c -s --save-temp -fno-builtin -fno-function-cse -o playmv.o playmv.c I haven't checked if there's a newer commit than the last released one... am I behind? Edited March 31 by Tursi remove unnused f18a, fix comment Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 31 Share Posted March 31 (edited) I'm still learning to make sense of the RTL, but it looks like the issue might be coming from peep-movhi-cmphi, which replaces a mov @oldshield,r1 / c @shield,r1 with c @shield,@oldshield. But the RTL still thinks that oldshield was moved into R1 and tries to use it later: ; tsthi-10 : (insn 10 9 11 playmv.c:7 (set (cc0) ; (reg:HI 1 r1 [orig:21 oldshield.1 ] [21])) 3 {tsthi} (expr_list:REG_DEAD (reg:HI 1 r1 [orig:21 oldshield.1 ] [21]) ; (nil))) mov r1, r0 Disabling peepholes does indeed generate correct code: playmv mov @oldshield, r1 c @shield, r1 jeq L3 mov r1, r0 jne L3 mov @shieldsOn, r1 b *r1 L3 b *r11 It appears the qi version of the peephole might suffer the same problem. Is there a way to tell the compiler hey, we can only use this if we don't need the side effect? Edited March 31 by Tursi Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 31 Share Posted March 31 10 hours ago, Tursi said: I haven't checked if there's a newer commit than the last released one... am I behind? There is a 1.31 branch with some unreleased fixes but nothing significant. I need to merge and release those at some point. 1 hour ago, Tursi said: I'm still learning to make sense of the RTL, but it looks like the issue might be coming from peep-movhi-cmphi, which replaces a mov @oldshield,r1 / c @shield,r1 with c @shield,@oldshield. But the RTL still thinks that oldshield was moved into R1 and tries to use it later: Cool, good analysis, thanks. I'd say I'm missing a check to see if R1 (operand[0]) is dead. I just ran a quick check with this patch and it seems to work better: iff --git a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md index 03273a0..71aeebe 100644 --- a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md +++ b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md @@ -288,7 +288,7 @@ (set (cc0) (compare (match_operand:HI 2 "memory_operand" "") (match_dup 0)))] - "" + "peep2_reg_dead_p(3, operands[0])" [(set (cc0) (compare (match_dup 2) (match_dup 1)))] I don't know what the first parameter (3) means though which concerns me slightly. I'll try to figure that out and test a fix. 2 hours ago, Tursi said: Is there a way to tell the compiler hey, we can only use this if we don't need the side effect? Doing the reg dead check should ensure we only remove registers that are not needed later on, so I'll double check to make sure any dropped regs are checked in the peepholes. 3 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 31 Share Posted March 31 word alignment question... char arrays on the stack do not appear to be word aligned... makes sense. are int arrays always word aligned? what else is always word aligned? Quote Link to comment Share on other sites More sharing options...
+khanivore Posted April 1 Share Posted April 1 12 hours ago, jedimatt42 said: are int arrays always word aligned? what else is always word aligned? Yes I believe so. Any 16-bit value will be 16-bit aligned. Also pointers, struct starts and ends, function code, function parameters, etc. There are a bunch of defines in tms9900.h, around line 160, that control these alignments. The only alignment defined is 16-bit. I don't think it matters if 32-bit values or floats are not 32-bit aligned though they should still be 16-bit aligned (well maybe no need for floats actually). 1 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.