Tursi Posted December 19, 2023 Share Posted December 19, 2023 13 hours ago, khanivore said: (I'm also not sure why it feels it has to upgrade 8-bit values to 16-bit values when doing binary AND. It doesn't do that for OR ops) Well, it has to if it's ANDing with an INT, which was my case Will give this a shot tonight Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 19, 2023 Share Posted December 19, 2023 Finished my testing, I have nothing new to report. Everything seemed to work and I've pushed updates to all my repos (nothing to worry about unless you pulled down the versions over the last couple days with R15 as the stack pointer). My new libti99all still has trouble in 64 column mode, but I'm pretty confused by that since it's the same printf code running on every mode. But, it only happens on the new compiler, on the new libti99, and only in 64 column mode. I'll eventually figure it out but with nothing to give you right now let's assume it's my bug - SDCC also chokes on it for some different reason. I can't test Super Space Acer as the code currently doesn't link (for a known fault that's all mine), but I'll be working on that over the next little bit. So for now, good work and thank you! 4 Quote Link to comment Share on other sites More sharing options...
PeteE Posted December 20, 2023 Share Posted December 20, 2023 Regarding the size of the generated code, here are some instruction comparisons I made from my C program compiled with GCC patch 1.19 (on the left) versus 1.27 (on the right.) Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 21, 2023 Share Posted December 21, 2023 On 12/19/2023 at 8:42 PM, Tursi said: My new libti99all still has trouble in 64 column mode, but I'm pretty confused by that since it's the same printf code running on every mode. But, it only happens on the new compiler, on the new libti99, and only in 64 column mode. I'll eventually figure it out but with nothing to give you right now let's assume it's my bug vdpchar64() has cached last_offset in a local static, but you've not cached buf[4] in a local static. 2 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 21, 2023 Share Posted December 21, 2023 17 hours ago, PeteE said: Regarding the size of the generated code, here are some instruction comparisons I made from my C program compiled with GCC patch 1.19 (on the left) versus 1.27 (on the right.) Thanks @PeteE, I am looking into those. The first two are easy fixes. ANDI+CI is definitely less efficient than LI/SETO+CB in compqi. I'm not sure where the last two are from but looks like a bit of paranoia around byte ordering that could be improved. Will look into it as well. If you have C fragments that generate these sequences that would be very helpful. 1 Quote Link to comment Share on other sites More sharing options...
Appeelicious Posted December 21, 2023 Share Posted December 21, 2023 I want to read more about what GCC is, but first: what does it stand for? Quote Link to comment Share on other sites More sharing options...
TheMole Posted December 21, 2023 Share Posted December 21, 2023 GNU Compiler Collection. And GNU stands for GNU is Not Unix... 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 21, 2023 Share Posted December 21, 2023 (edited) 13 hours ago, JasonACT said: vdpchar64() has cached last_offset in a local static, but you've not cached buf[4] in a local static. hahah! Brilliant! That was it. I didn't write the 64 column code, and I was trying to touch some of it up when I ported it, and that was where I broke it. Thanks! That makes my day (I didn't know anyone had even found the new repo yet... ) (Doesn't fix the Coleco build, but, there's something deeper going wrong there. I haven't spent any time looking at it yet cause I want to rewrite testlib anyway.) Edited December 21, 2023 by Tursi 5 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 24, 2023 Share Posted December 24, 2023 On 12/21/2023 at 10:10 PM, khanivore said: If you have C fragments that generate these sequences that would be very helpful. volatile unsigned char b; volatile unsigned int i; // Compile with: -Os void test () { if (b == 0xFF) { // andi + ci b = 1; } else { b = (i >> 8); // srl + swpb + mov if (b == 8) // andi + ci b = 1; } b &= 0x07; // li + szcb } -Os output: Spoiler pseg even def test test movb @b, r1 andi r1, >ff00 ci r1, >FF00 jne L2 li r1, >100 movb r1, @b jmp L3 L2 mov @i, r1 srl r1, >8 swpb r1 movb r1, @b movb @b, r1 andi r1, >ff00 ci r1, >800 jne L3 li r2, >100 movb r2, @b L3 movb @b, r1 li r2, >F8FF szcb r2, r1 movb r1, @b b *r11 .size test, .-test cseg even def b b bss 1 even def i i bss 2 -O2 output: (horrible, in so many ways) Spoiler pseg even def test test movb @b, r1 andi r1, >ff00 ci r1, >FF00 jeq L6 mov @i, r1 srl r1, >8 swpb r1 movb r1, @b movb @b, r1 andi r1, >ff00 ci r1, >800 jeq L7 movb @b, r1 li r2, >F8FF szcb r2, r1 movb r1, @b b *r11 jmp L8 L7 li r2, >100 movb r2, @b movb @b, r1 li r2, >F8FF szcb r2, r1 movb r1, @b b *r11 jmp L8 L6 li r1, >100 movb r1, @b movb @b, r1 li r2, >F8FF szcb r2, r1 movb r1, @b b *r11 L8 .size test, .-test cseg even def b b bss 1 even def i i bss 2 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 24, 2023 Share Posted December 24, 2023 Where shifts use R0, and it uses ANDI R0,>F first, I don't think there's a need to MOV R0,R0 (for the test for zero) because ANDI has already done that test. Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 24, 2023 Share Posted December 24, 2023 (edited) I found this bug a few days ago, but it took me until now to fully understand it and be sure it's a compiler bug. I am pretty sure this is an old one. I have a simple difference function that looks like this: // return the distance between a and b unsigned char distance(unsigned char a, unsigned char b) { if (a>b) return a-b; else return b-a; } This compiles to this assembly code: cb r1,r2 compare a to b jle aed4 jump if less than or equal sb r2,r1 was greater than, so subtract r2 from r1, result in r1 jmp aed8 go to return (r1 is return value) aed4: sb r2,r1 subtract r2 from r1 again neg r1 but negate (not invert!) the result so the sign is right <--- BUG because the LSB is undefined aed8: ret back to caller The problem is the NEG in the second half there. Because the least significant byte is not necessarily initialized, the effect on the most significant byte is going to be just an INV most of the time. I noticed it when my values were equal, but it seems it would be off-by-one in half the cases. NEG is described as getting the two's complement by inverting, and then adding one. So if we walk through my particular case, imagine R1 and R2 have previous garbage 'XXXX'. a and b are loaded, so we have "aaXX" and "bbXX". Assuming a and b are equal, we jump into the second case there. After the subtract, we have "00XX". Now the NEG runs to fix the sign, NEG of >0000 is zero (inverts to >FFFF, add one back to >0000), but because the least significant byte is unknown, that only works if we entered with it cleared. In this case "00XX" becomes "FFYY", add one and it's "FFYZ". The MSB has not been changed at all, so we return >FF instead of >00. I guess the simple fix is getting it to clear the least significant byte before the NEG, but why doesn't it just "sb r1,r2" instead of the two-step dance? Okay, while implementing my own workaround it became clear. The compiler only expects r1 to change, not r2. I just added an andi r1,>ff00 to my version. Edited December 24, 2023 by Tursi Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 24, 2023 Share Posted December 24, 2023 Second question -- is there a way to specify a byte register as input to inline assembly? I even dared to peek in the compiler TMS9900 code but it looks like the answer is no. The reason is I attempted to rewrite my code there in assembly, but passing the value as "r"(a) made the compiler copy it to a new register and shift it to make an int of it, which is unecessary. (I can work around this, but it struck me as interesting there wasn't a register class for it? (Is that even the right term? )) Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 26, 2023 Share Posted December 26, 2023 On 12/19/2023 at 5:00 AM, Tursi said: Well, it has to if it's ANDing with an INT, which was my case Will give this a shot tonight Forgot to mention that I had tried changing the param to a char but it generated the same code. Weirdly gcc-13 does generate proper byte opcodes if all params are bytes. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 26, 2023 Share Posted December 26, 2023 On 12/24/2023 at 5:03 AM, JasonACT said: Where shifts use R0, and it uses ANDI R0,>F first, I don't think there's a need to MOV R0,R0 (for the test for zero) because ANDI has already done that test. Yep, thanks, had already spotted that, mov will be removed Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 26, 2023 Share Posted December 26, 2023 On 12/24/2023 at 12:56 AM, JasonACT said: b = (i >> 8); // srl + swpb + mov Ok, but in this case it is doing exactly what you asked. shift an int, truncate it to a byte and store the result. I had thought the srl 8 was coming from an extend of char to int and then swpb was coming from trunc which would be redundant. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 26, 2023 Share Posted December 26, 2023 On 12/24/2023 at 9:42 PM, Tursi said: The problem is the NEG in the second half there. Because the least significant byte is not necessarily initialized, the effect on the most significant byte is going to be just an INV most of the time. I noticed it when my values were equal, but it seems it would be off-by-one in half the cases. NEG is described as getting the two's complement by inverting, and then adding one. So if we walk through my particular case, imagine R1 and R2 have previous garbage 'XXXX'. a and b are loaded, so we have "aaXX" and "bbXX". Assuming a and b are equal, we jump into the second case there. After the subtract, we have "00XX". Now the NEG runs to fix the sign, NEG of >0000 is zero (inverts to >FFFF, add one back to >0000), but because the least significant byte is unknown, that only works if we entered with it cleared. In this case "00XX" becomes "FFYY", add one and it's "FFYZ". The MSB has not been changed at all, so we return >FF instead of >00. I guess the simple fix is getting it to clear the least significant byte before the NEG, but why doesn't it just "sb r1,r2" instead of the two-step dance? Okay, while implementing my own workaround it became clear. The compiler only expects r1 to change, not r2. I just added an andi r1,>ff00 to my version. I've had a TODO on that piece of code for ages to understand why we emit a NEG. It seems we are telling the compiler we will accept operands to a subtract in either order and we will negate the result if you give them to us in the wrong order . Which seems wrong to me. I'd prefer to tell gcc not to do that. It may save GCC from emitting a MOV to another reg but adding in the NEG surely obviates that. And of course, neg is 16-bit so will mess up byte subtracts as you said. (define_insn "subqi3" [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,Q,rR>,Q,rR>,Q,rR>,Q") (minus:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,0,0,rR>,rR>,Q,Q") (match_operand:QI 2 "nonimmediate_operand" "rR>,rR>,Q,Q,0,0,0,0"))) (clobber (reg:CC CC_REGNUM))] "" { tms9900_debug_operands ("subqi3", operands, 3); if(which_alternative < 4) { output_asm_insn("sb %2, %0",operands); } else { output_asm_insn("sb %1, %0",operands); output_asm_insn("neg %0",operands); } return(""); } [(set_attr "length" "2,4,4,6, 4,6,8,10")]) Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 26, 2023 Share Posted December 26, 2023 On 12/24/2023 at 10:10 PM, Tursi said: Second question -- is there a way to specify a byte register as input to inline assembly? I even dared to peek in the compiler TMS9900 code but it looks like the answer is no. The reason is I attempted to rewrite my code there in assembly, but passing the value as "r"(a) made the compiler copy it to a new register and shift it to make an int of it, which is unecessary. (I can work around this, but it struck me as interesting there wasn't a register class for it? (Is that even the right term? )) Hmm. Not that I know. The predicates in the md file have types QI, HI, SI, DI for 8,16,32,64. But the constraints just have "r" for reg. There are register classes, but they are for register usage like "general", "shift operand", "CRU", etc. Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 26, 2023 Share Posted December 26, 2023 4 hours ago, khanivore said: I've had a TODO on that piece of code for ages to understand why we emit a NEG. It seems we are telling the compiler we will accept operands to a subtract in either order and we will negate the result if you give them to us in the wrong order . Which seems wrong to me. I'd prefer to tell gcc not to do that. It may save GCC from emitting a MOV to another reg but adding in the NEG surely obviates that. And of course, neg is 16-bit so will mess up byte subtracts as you said. I dunno about that. After trying to do it myself a few ways, I started to appreciate this solution as elegant. MOV to a new register may only be one instruction, but you can't predict how many instructions were required to make that register available or whether having it unchanged would improve other code. Starting the function with ANDI R1,>FF00 solves the issue and doesn't cost much. That said, I guess r2 is passed in there anyway, having it destroyed probably won't hurt most of the time, but you'll need to move back to R1 anyway, so that path is still two instructions long. Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 27, 2023 Share Posted December 27, 2023 8 hours ago, khanivore said: Ok, but in this case it is doing exactly what you asked. shift an int, truncate it to a byte and store the result. I had thought the srl 8 was coming from an extend of char to int and then swpb was coming from trunc which would be redundant. Ok, but I tried it in 1.19 and it knew to optimise it. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 27, 2023 Share Posted December 27, 2023 8 hours ago, JasonACT said: Ok, but I tried it in 1.19 and it knew to optimise it. Well, 1.19 was missing the extends from char to int so a lot of times it gave wrong results for char operations. I haven't built 1.19 to check but it may be a case of it just happened to work. I don't think it optimised it. Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 27, 2023 Share Posted December 27, 2023 1 hour ago, khanivore said: Well, 1.19 was missing the extends from char to int so a lot of times it gave wrong results for char operations. I haven't built 1.19 to check but it may be a case of it just happened to work. I don't think it optimised it. This is going from an int to a char though, I suspect (as I've never really looked at a GCC .md file before, so I have very few clues here) that it has something to do with the optimisation near the bottom of the .md file, where you've added a comment "TODO will this help the case where char c = (int)x >> 8; ??" - my suspicion being it was optimising before in 1.19 but now isn't for some reason. But I really don't know. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 27, 2023 Share Posted December 27, 2023 1 hour ago, JasonACT said: This is going from an int to a char though, I suspect (as I've never really looked at a GCC .md file before, so I have very few clues here) that it has something to do with the optimisation near the bottom of the .md file, where you've added a comment "TODO will this help the case where char c = (int)x >> 8; ??" - my suspicion being it was optimising before in 1.19 but now isn't for some reason. But I really don't know. Yeah, could be. TBH I haven't look in-depth at the peepholes yet. I wanted to make sure we have functioning code first and then fast/small code second. But we're getting to the point now that the code works but can be improved so this is the right time. 4 Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 31, 2023 Share Posted December 31, 2023 I'm gonna have to ask for help on this one! Troubleshooting a crash in Super Space Acer, I have a case of this code: if (sSkillText[txtIdx][0] == '\0') { Where sSkillText is an array of pointers to const char. The array is in ROM and the strings themselves are also in ROM. char * const sSkillText[] = { The code is assuming the pointer indexed is always valid and checking if it's empty string. GCC wants to test if the first byte is NUL, so it gets the pointer, and moves the byte to itself: mov *r3,r2 get the string pointer from the array movb *r2,*r2 test if the first byte is zero <--- BUG, but not really! Unfortunately, this string is in bank-switched ROM, so the write causes a bank switch to occur and the game crashes. The code is certainly valid! Is there any way that I can tell from the C or command line side GCC that it's not allowed to write to ROM, not even for an equality test? My suspicion is no, that this instruction sequence isn't telling GCC that it includes a write since it was never assumed there would be side effects? Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 31, 2023 Share Posted December 31, 2023 38 minutes ago, Tursi said: Is there any way that I can tell from the C or command line side GCC that it's not allowed to write to ROM, not even for an equality test? My suspicion is no, that this instruction sequence isn't telling GCC that it includes a write since it was never assumed there would be side effects? How about: void test () { volatile char c; c = sSkillTest[txtIdx][0]; if (!c) ... } 40 minutes ago, Tursi said: char * const sSkillText[] = { BTW: This is saying your array of pointers is constant, but the strings being pointed to are not. That would require "const char *" but doesn't help here (though it probably should). 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 31, 2023 Share Posted December 31, 2023 (edited) Yeah, I knew someone would call that const bit out, but I was trying to stay focused on the actual problem. Your suggested code might work, since it moves the test to RAM, but I'm not okay with hunting through all my code to change it to move comparisons to temporary variables. Too large a task with too much chance of missing something. Instead I attacked the compiler. ;;------------------------------------- (define_insn "tsthi" [(set (cc0) (match_operand:HI 0 "nonimmediate_operand" "rR,Q"))] "" { tms9900_debug_operands ("tsthi", operands, 1); return("mov %0, %0"); } [(set_attr "length" "2,6")]) ;;------------------------------------- (define_insn "tstqi" [(set (cc0) (match_operand:QI 0 "nonimmediate_operand" "rR,Q"))] "" { tms9900_debug_operands ("tstqi", operands, 1); return("movb %0, %0"); } [(set_attr "length" "2,6")]) ;;------------------------------------- Changed these two to "register_operand" "r" (and length "2"), and the resulting code became: mov *r3, r2 movb *r2, r1 jne L51 Which is about as clean as I could ask for, except for needing a temporary register! I tried to write a peephole optimizer for code like mov *r1,r2 / mov r2,r2, but I didn't see any generated in any of my code, so I left it out. I think the condition code set for MOV works correctly so it knows it's not needed. It does not seem to work correctly for ANDI, though I couldn't see why not as it seems to be included in the same place. In analyzing the code, I don't think the mechanism of LI R3,<address> / BL *R3 pays off very often. I see it used a lot for a single instance, and even repeatedly in the same function. It costs two more bytes and 16 more cycles than just BL @<address>. I tried changing the call functions to only accept an address, but that failed to build... I think because the compiler was trying to fit a register call. I couldn't see how to have it not move the function address into a register in the first place... one step at a time, I guess. (Edit: the part that particularly annoyed me makes sense now, so I'm going back to being unsure whether the function call mechanism is good or not... ) Still testing, but I don't think it breaks anything else, so I'll propose that as a change. As nice as it is to be able to test an address for 0 without a register, there's too much chance of side effects when working on modern cartridges. Edited December 31, 2023 by Tursi 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.