+khanivore Posted February 25 Share Posted February 25 I have merged the 1.30 branch to main. I found some issues in the lib1funcs implementations such as not saving some scratch regs. I'm still seeing some failures in corner cases, such as unoptimised arithmetic shift of a 32-bit value by a count of 16 or more, but I'm releasing anyway as these could take some time to work through. Release notes for 1.30 are: gcc patch 1.30 * Pass constants as wides to force_const_mem to avoid assert in combine.c:do_SUBST * Added calls to correct byte order on all byte and word arith and move * Changed inline debug to dump entire insn not just operands * Removed wrongly associative constraints on subtract * Added separate reg constraints to addhi3, andhi3, subhi3 to allow longer lengths for subreg offset fixes * Added more unit tests * Removed constraints in movqi - causes assert in reload * Added 32 bit shift operations * Marked R0 as fixed so allocator won't use it * Added reg saves in lib1funcs as some regs were being trampled 6 2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted February 25 Share Posted February 25 All my code appears to work! 5 1 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted February 26 Share Posted February 26 Ghostbusters is working too, based on playing all the parts that were failing at one time or another (I'm actually not very good at playing this one). Attached is a Windows build of GCC with final patch 1.30: install.zip 5 Quote Link to comment Share on other sites More sharing options...
Tursi Posted February 26 Share Posted February 26 So with that, to answer the earlier question: yes, libti99 is certified on the 1.30 patches. I've updated my readme files to note that. Hopefully have libti99all out to replace it in another month or two... depends on work. 3 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted February 27 Share Posted February 27 (edited) I'm not having as much luck with 1.30, compiling my stuff. I have a lot of literal C strings that are fairly long in my ForceCommand help. For example, the literal string here passed to 'wraptext' function: wraptext("==Run XB program==\n\n" "xb <program-path>\n\n" "Switch a FinalGROM99 to an Extended BASIC cartridge, configured to RUN the specified program.\n" "Default cartridge name is TIXB_G with start address 25474.\n" "Use variable XBMOD to override cartridge name, and XBADDR to override start address.\n"); Notice below in the '.s' file, I get a rogue quote character on a line all by its lonesome (about half way down). LC3 text '==Run XB program==' byte 10 byte 10 text 'xb <program-path>' byte 10 byte 10 text 'Switch a FinalGROM99 to an Ex' text 'tended BASIC cartridge, configured to RUN the specified program.' byte 10 ' text 'Default cartridge name is TIXB_G with start address 25474.' byte 10 text 'Use va' text 'riable XBMOD to override cartridge name, and XBADDR to override ' text 'start address.' byte 10 byte 0 I have a lot of these sort of strings in https://github.com/jedimatt42/fcmd/blob/main/b14_moreHelp.c This is the only occurrence I have seen in the .s, and it is the first of the conjoined string literals. It does not occur in the subsequent literals from the same source file. And this is not the largest of those literal strings either. After commenting it (bulk of b14_moreHelp.c) out, it happens in one other case, my next file full of builtin help text.. But, not the first one (in that file)... in this case the 10th... I'll have to eat some dinner and then see if I can spot what the 2 failing occurrences have in common. Edited February 27 by jedimatt42 clarifications Quote Link to comment Share on other sites More sharing options...
+khanivore Posted February 27 Share Posted February 27 3 hours ago, jedimatt42 said: Notice below in the '.s' file, I get a rogue quote character on a line all by its lonesome (about half way down). I see it. It looks like the fix I made to limit strings to 64 bytes has a bugged corner case. If the 63rd byte is non printable (\n in this case) then it closes the already-closed string and resets the count resulting in an extra closing single quote. It should check if in_text before closing the string when the length is reached. diff --git a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c index d7b9fb5..42a13e0 100644 --- a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c +++ b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c @@ -720,7 +720,7 @@ void tms9900_output_ascii(FILE* stream, const char* ptr, int len) if (ISPRINT(c)) { /* End TEXT statement */ - if (count==64) + if (in_text && count==64) { fprintf (stream, "'\n"); in_text = 0; 5 2 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted February 27 Share Posted February 27 Damn, everything compiled... now I have to test it all. Thanks @khanivore 3 2 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted February 27 Share Posted February 27 3 hours ago, jedimatt42 said: Damn, everything compiled... now I have to test it all. Thanks @khanivore Built fine for me too ... once I had paths set up to xdt99 and so on. But it doesn't get far, unfortunately. It bombed almost immediately with a bad opcode after trying to execute BL *R12 when R12 contained >6000. I'm thinking the compiler had a function address stored in R12 but it got trampled by your trampoline or other inline assembly? I built with -fno-function-cse and it got further and printed some junk on the screen, but that was about it. Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted February 27 Share Posted February 27 I have some inline assembly that needs to declare the clobbers properly. I plan to compare the .s files from old compiler, and the new. I do a lot of weird things, like type casting my trampoline to the function I want to call so the compiler sets up all the arguments while I then go to a different function, screw with the stack, and then go to the target bank and function. I observed with -Os vs -O2 that the small version of some parts were significantly bigger than the O2 version. This is probably bugs like missing volatiles on my end. Or it could be benign.. we'll see. But at least none of my nearly full banks overflowed. 1 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted February 29 Share Posted February 29 On 2/26/2024 at 4:34 PM, JasonACT said: Attached is a Windows build of GCC with final patch 1.30: Plus the last patch: install.zip 2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted February 29 Share Posted February 29 On a new program I ran into a weird one... I'm getting: /mnt/d/work/libti99ALL/vdp.h:98: error: ‘asm’ operand requires impossible reload make: *** [Makefile:69: tarot.o] Error 1 This is on VDP_SET_ADDRESS_WRITE, which is: inline void VDP_SET_ADDRESS_WRITE(unsigned int x) { __asm__ volatile ( "swpb %0\n\tmovb %0,@>8c02\n\tswpb %0\n\tori %0,>4000\n\tmovb %0,@>8c02\n\tandi %0,>3fff" : : "r"(x) : "cc"); } Sorry for the single line formatting. Now the function it's failing in calls this four times, but only the third instance causes the error: Spoiler I don't see anything weird in the assembly output - it seems to have built it. The third instance is the only one that uses R1, but why would that matter? Spoiler It's getting late so no minimal code at the moment... what does that error actually mean? Quote Link to comment Share on other sites More sharing options...
+khanivore Posted February 29 Share Posted February 29 1 hour ago, Tursi said: It's getting late so no minimal code at the moment... what does that error actually mean? I don't know tbh. The comment in gcc/reload1.c just says: /* If this was an ASM, make sure that all the reload insns we have generated are valid. If not, give an error and delete them. */ but without reproducing it, I can't see what "reload insns" it means. Do you need to declare the condition code clobber? I think that would be implied. Not sure if related though. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted February 29 Share Posted February 29 I have been taking a lot of notes the past few months on what I've learned on the GCC backend. I figured now is as good time as any to collate these into some kind of archive so I created a blog here https://tms9900-gcc.blogspot.com/ if anyone is interested. Comments and corrections welcome. Thanks! 8 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 1 Share Posted March 1 13 hours ago, khanivore said: I don't know tbh. The comment in gcc/reload1.c just says: /* If this was an ASM, make sure that all the reload insns we have generated are valid. If not, give an error and delete them. */ but without reproducing it, I can't see what "reload insns" it means. Do you need to declare the condition code clobber? I think that would be implied. Not sure if related though. I never saw any difference whether I did or didn't, but as I thought I was learning more I was trying to be more diligent. Removing the "cc" doesn't help the error. It's strange that of three instances in the same function, that's the only one, and it's even in the middle - not first or last. Hell, it's even a copy of the first block. Lovely to see even the gcc wiki don't really seem to know... https://gcc.gnu.org/wiki/reload Some forum posts suggest this code was unchanged even quite a while before this version, I saw the same block in a snippet from 3.3, supposedly. So lots of people discussing it, some even with insight. Most of the examples are very complicated, though... I don't see much that leads to resolution. It's odd that it's never happened before and the macro, to me, seems about as simple as you can get. I tried a number of tricks.. it had the same fault with O2, with a return variable, as a #define (well, I was desperate). It's not out of registers, that's all I could determine. When I replaced it with C code, it happily found two registers to use for the operation: VDPWA=vdpdst&0xff; VDPWA=(vdpdst>>8)|0x40; mov r9, r3 swpb r3 movb r3, @>8C02 mov r9, r2 sra r2, >8 ori r2, >40 swpb r2 movb r2, @>8C02 So not ideal, but it works. I also did a read through of the patch changes, but everything made sense enough to me... what little sense I have My code runs, for now, I don't know there's much else here but to use the workaround. 2 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 1 Share Posted March 1 6 hours ago, Tursi said: My code runs, for now, I don't know there's much else here but to use the workaround. Yeah, 'fraid so. I'm glad you did find a workaround so probably best to just wait and see does a pattern emerge in time to help us reproduce it. 2 Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 1 Share Posted March 1 42 minutes ago, khanivore said: Yeah, 'fraid so. I'm glad you did find a workaround so probably best to just wait and see does a pattern emerge in time to help us reproduce it. Workarounds are not new here... all my GCC code has at least one. Just glad you're been willing to invest so many hours to get things more stable. I'll take a compiler 'crash' over incorrect code anyday - far faster to debug. 5 3 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted March 2 Share Posted March 2 22 hours ago, khanivore said: wait and see does a pattern emerge in time to help us reproduce it. // tms9900-gcc -S -Os -std=c99 test.c int gColor; inline void VDP_SET_ADDRESS_WRITE(unsigned int x) { __asm__ volatile ("swpb %0\n\tmovb %0,@>8c02\n\tswpb %0\n\tori %0,>4000\n\tmovb %0,@>8c02\n\tandi %0,>3fff" : : "r"(x) : "cc"); } void vdpmemread (int, unsigned char *, int); unsigned char test () { for (int row=0; row<2; ++row) { for (int col=4; col<6; ++col) { unsigned char buf[1]; unsigned char buf2[1]; int vdpsrc=0; int vdpdst=0; vdpmemread(vdpdst, buf2, 1); vdpdst=row*2 + col*2 + gColor; vdpmemread(vdpsrc, buf, 1); vdpmemread(vdpdst, buf2, 1); VDP_SET_ADDRESS_WRITE(vdpdst); } } } Your blog is surprisingly complete. 5 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted March 2 Share Posted March 2 @khanivore This isn't anything you've done, it also happens on patch 1.12A (the special build we made for Ghostbusters from many years ago). It also doesn't happen on GCC 6.3.0 for Windows/x86 which is the native GCC I'm using to build the much older 4.4.0 TI GCC. 4 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 2 Share Posted March 2 3 hours ago, JasonACT said: // tms9900-gcc -S -Os -std=c99 test.c Very good. That does reproduce it. I googled and other is some chatter about this error in the GCC mailing lists so it could be a compiler bug? I built using my last gcc13 build and it doesn't appear, so it should go away once we upgrade. I seems to be to do with the "r" constraint and for some reason reload doesn't feel able to alloc a new reg here. If I change it to general "g" it does compile but the output is useless: swpb @>2(r10) movb @>2(r10),@>8c02 swpb @>2(r10) ori @>2(r10),>4000 movb @>2(r10),@>8c02 andi @>2(r10),>3fff Another simple workaround would be to use r0 as a scratch which would also obviate the final andi. 4 hours ago, JasonACT said: Your blog is surprisingly complete. Cheers mate! I'll try to keep it up to date as we find out new stuff. 3 Quote Link to comment Share on other sites More sharing options...
Tursi Posted March 2 Share Posted March 2 34 minutes ago, khanivore said: Another simple workaround would be to use r0 as a scratch which would also obviate the final andi. Ah, that's a pretty good idea. I could just do that instead. 3 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 3 Share Posted March 3 Just to make sure I understand the rules: r0, r10, and r11 are the only 'reserved' registers. Consequently, declaring them in clobbers is pointless. No harm, no help. Does r0's usage ever leak ? Such that I need to preserve it in my __asm__ blocks? I see it is often used as an index into the stack without being preserved at the beginning of a function... The only other usages I observe are replacements of ci register, 0 with mov register, r0 It looks like 'no', r0 is never read from, except during the sequence of stashing things onto the stack. Can this assumption be confirmed? --- Aside, preserving r12 in ForceCommand's trampoline for cartridge banking calls fixed fundamentals in my programming model, still very broken, but it gets the banner and chime out before crashing in the user prompt loop Quote Link to comment Share on other sites More sharing options...
JasonACT Posted March 4 Share Posted March 4 5 hours ago, jedimatt42 said: Does r0's usage ever leak ? Such that I need to preserve it in my __asm__ blocks? I see it is often used as an index into the stack without being preserved at the beginning of a function... The only other usages I observe are replacements of ci register, 0 with mov register, r0 It looks like 'no', r0 is never read from, except during the sequence of stashing things onto the stack. Can this assumption be confirmed? I can only talk about the R0 question, as I don't know the answers to the others, but yes R0 has been taken out of the register pool and may be used as a scratch register without preserving its value. It's actually used for variable shifts too, not just the stack, but again its value is never needed once the shift is completed. 2 1 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 4 Share Posted March 4 (edited) 11 hours ago, jedimatt42 said: Just to make sure I understand the rules: r0, r10, and r11 are the only 'reserved' registers. Consequently, declaring them in clobbers is pointless. No harm, no help. As @JasonACT said, R0 is assumed to be a clobber, no need to declare it. No insns expect R0 to have a persistent value. R10 is Stack pointer, so must be preserved (easiest option) or declared clobber (though this might confuse gcc because it wouldn't be able to save it if it has no free regs). R11 is already clobbered on entry to a function so no need to preserve it. Unless you are an inline, in which case, yes, do declare it as clobbered or save it. R9 is the frame pointer. It should also be preserved or declared clobbered. It isn't always used, especially if build is optimised, and never if you use -fomit-frame-pointer, but better to preserve or delcare clobbered. You can also clobber as many regs as you have params (up to 8). So if your function has 5 params, these are in R1-R5 and can be clobbered since they are passed by value (edit: I can think you safely use any reg R1-R8 without preserving). Any other regs should be preserved or clobbered in case they are in use by the calling function. Edited March 4 by khanivore 3 1 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted March 5 Share Posted March 5 A follow up on register allocation ... tl;dr "preserving regs creates smaller object code". Using R12-R15 as a test case, if we mark a reg as 0 in the CALL_USED_REGISTERS this means that reg must be "preserved across function call boundaries". Then the function prologue generates code like this: ai r10, >FFF4 mov r10, r0 mov r11, *r0+ mov r9, *r0+ mov r12, *r0+ mov r13, *r0+ mov r14, *r0+ mov r15, *r0 and a function can safely call any other functions it wants without saving any regs, since it knows the called function will save them. OTOH if we don't mark them as preserved then the prologue is much shorter, and instead declares a stack frame .... ai r10, >FFF0 mov r9, @>E(r10) mov r11, @>C(r10) ... because it must save everything it is using into the frame before calling another function like this: mov r3, @>2(r10) mov r4, @>4(r10) mov r5, @>6(r10) mov r6, @>8(r10) mov r13, @>A(r10) bl @f2 I see two issues with this: 1) the code to save the regs on the stack frame is not as efficient since it using labels instead of register indirect; 2) the caller doesn't know what the callee may use so must be conservative and save everything. So it seems to me we would be better marking more regs as preserved and the onus is on each function to save the regs they use. This shouldn't affect inline assembly that declares clobbers. Also I think 8 regs for function params is excessive. If we reduce to 4 and mark R5-R8 as general available, but to be preserved, it seems we can further reduce output code size. A counter argument is that leaf functions have to save what they use if regs are preserved but don't need to save anything if not. So the code for leaf functions does get bigger. But I built libti99 with and without preserved regs though and overall the code is smaller with preserved regs. 2 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted March 5 Share Posted March 5 Outsider curiosity question. Do you need any special incantations to declare a leaf sub-routine? Perhaps the compiler simply looks for any calls within the sub-routine and modifies the entry and exit code? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.