JasonACT Posted December 31, 2023 Share Posted December 31, 2023 2 hours ago, Tursi said: Which is about as clean as I could ask for, except for needing a temporary register! Isn't R0 always free for something like this? Quote Link to comment Share on other sites More sharing options...
Tursi Posted December 31, 2023 Share Posted December 31, 2023 Here's a clever one... hchar(11, 0, 32, 32); hchar() is just a macro that wraps vdpmemset... #define hchar(r, c, ch, cnt) vdpmemset(gIMAGE+((r)<<5)+(c), ch, cnt) ... which has this prototype: void vdpmemset(int pAddr, unsigned char ch, int cnt); The code becomes: li r3,>0020 r3 (integer) gets a count of 32 movb r3,r2 the count is meant to be copied to r2, but r2 is a byte <--- BUG li r1,>0160 the address is calculated correctly and set bl @vdpmemset function called r2 is getting the wrong byte of r3 I was able to confirm this is running through movqi by changing what it emits, but I don't know exactly why, since R3 should have been a HI, and it's constrained to QI. It's a pretty small thing and didn't break this code, but it's kind of insidious. The good news is a good 90% of Super Space Acer is running fine with these changes... and it's a pretty big codebase built on a completely foreign compiler. But will need help on this one too. Quote Link to comment Share on other sites More sharing options...
JasonACT Posted December 31, 2023 Share Posted December 31, 2023 1 hour ago, Tursi said: r2 is getting the wrong byte of r3 I was able to confirm this is running through movqi by changing what it emits, but I don't know exactly why ;;------------------------------------------------------------------- ;; Move byte value (define_insn "movqi" [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,rR>,Q, Q,r, r") (match_operand:QI 1 "general_operand" "rR>, Q, rR>,Q,OM,i"))] "" { tms9900_debug_operands ("movqi", operands, 2); tms9900_inline_debug ("; movqi alt=%d\n", which_alternative); int val; if (which_alternative >= 4) { val = INTVAL(operands[1]) << 8; tms9900_inline_debug ("; movqi imm MUL const %d * 256\n", val); if (val == 0) return("clr %0"); else if ((val & 0xff00) == 0xff00) return("seto %0"); else { rtx args[2]; args[0] = operands[0]; args[1] = GEN_INT(val); output_asm_insn ("li %0, %1", args); return ""; } } else { if (REG_OFFSET (operands[1]) == 1) { output_asm_insn ("mov %1, %0", operands); output_asm_insn ("swpb %0", operands); return ""; } else return ("movb %1, %0"); } } Maybe because the movb at the bottom of the code-insn didn't check for the source operand offset? I've added a check here for a 1 byte offset, and that might work. I'll leave it for smarter people than me to mess with the length attribute (which I didn't modify or include here). 1 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 31, 2023 Share Posted December 31, 2023 10 hours ago, Tursi said: Unfortunately, this string is in bank-switched ROM, so the write causes a bank switch to occur and the game crashes. The code is certainly valid! Is there any way that I can tell from the C or command line side GCC that it's not allowed to write to ROM, not even for an equality test? My suspicion is no, that this instruction sequence isn't telling GCC that it includes a write since it was never assumed there would be side effects? Ah, interesting side effect! I never thought of that. This is being emitting by the insn "tstqi" which the GCC emits when it wants to test if a value is zero. I could constrain it so this test can only be performed on registers, or maybe do movb to a scratch, I can't think of any way to tell gcc about ROM. These steps are done way before any actual addresses are assigned for instructions. We could include a compiler switch to select different opcode for ROM or RAM ... but maybe easier to find a non destructive test that works for either. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 31, 2023 Share Posted December 31, 2023 5 hours ago, JasonACT said: Isn't R0 always free for something like this? Great minds think alike! I am already doing the same elsewhere in the code. In the gcc13 port, the compiler asserts on a shift because it doesn't know how to move between reg classes of general and shiftcount. So I just stopped telling it R0 exists at all and use it as a permanent private scratch register. 2 1 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 31, 2023 Share Posted December 31, 2023 4 hours ago, Tursi said: r2 is getting the wrong byte of r3 I was able to confirm this is running through movqi by changing what it emits, but I don't know exactly why, since R3 should have been a HI, and it's constrained to QI. It's a pretty small thing and didn't break this code, but it's kind of insidious. The good news is a good 90% of Super Space Acer is running fine with these changes... and it's a pretty big codebase built on a completely foreign compiler. But will need help on this one too. Sounds like probably another case of gcc using a subreg instead of calling extend, like we saw with "andhi3". If you turn on inline insn dump (#define TMS9900_INLINE_DEBUG 1) in tms9900.c you'll see a print of the operands and which insn is emitting the code. I may just have to add checks of REG_OFFSET anywhere we consume a QI Quote Link to comment Share on other sites More sharing options...
+khanivore Posted December 31, 2023 Share Posted December 31, 2023 7 hours ago, JasonACT said: Maybe because the movb at the bottom of the code-insn didn't check for the source operand offset? I've added a check here for a 1 byte offset, and that might work. I'll leave it for smarter people than me to mess with the length attribute (which I didn't modify or include here). Looks good to me. I'm not sure how valuable the length attribute is. I think it is used to calculate costs when optimising for size, in which case getting it slightly wrong shouldn't break anything, and also to calculate jump offsets. We can't jump more than +254 or -256 so we emit a branch if further away than that. If lengths are conversative conservative (always worst case) then I think the jump offsets should be fine too. 4 Quote Link to comment Share on other sites More sharing options...
+TheBF Posted December 31, 2023 Share Posted December 31, 2023 You might have a typo there (+254 -256) 9900 native jump instructions are limited to so +127 .. -128. Or is there even more magic inside GCC? Quote Link to comment Share on other sites More sharing options...
PeteE Posted December 31, 2023 Share Posted December 31, 2023 1 hour ago, TheBF said: You might have a typo there (+254 -256) 9900 native jump instructions are limited to so +127 .. -128. Or is there even more magic inside GCC? +254 -256 bytes is +127 -128 words 2 Quote Link to comment Share on other sites More sharing options...
retrodroid Posted December 31, 2023 Share Posted December 31, 2023 6 hours ago, PeteE said: +254 -256 bytes is +127 -128 words ...and therein lies the very essence of low-level programming on the 4A! lol. 1 1 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted January 1 Share Posted January 1 14 hours ago, khanivore said: If lengths are conversative conservative (always worst case) then I think the jump offsets should be fine too. Yep, I agree, so in that case the 2 should be made a 4 and things should continue to work well. I think I also understand the 1.19 srl/swpb optimisation now too, which is actually in a different area to what I first pointed out, and I'm thinking - expanding one of your new peepholes may be a better way to get it working as it was.. now that the code-base is stable. I'm not sure how far along you are though, and I'm back at work tomorrow, so there's less hobby time for me now, sadly. 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 1 Share Posted January 1 (edited) 19 hours ago, khanivore said: Ah, interesting side effect! I never thought of that. This is being emitting by the insn "tstqi" which the GCC emits when it wants to test if a value is zero. I could constrain it so this test can only be performed on registers, or maybe do movb to a scratch, I can't think of any way to tell gcc about ROM. These steps are done way before any actual addresses are assigned for instructions. We could include a compiler switch to select different opcode for ROM or RAM ... but maybe easier to find a non destructive test that works for either. Yeah, the code generated by just constraining it to registers looks pretty good. I tested JasonACT's suggested fix for movqi as well, and it does look like it fixes my other case. ((edit: removed diff)) Gonna go a quick rebuild of everything and see how far that gets me Edited January 1 by Tursi Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 1 Share Posted January 1 Unfortunately, JasonACT's patch made things much worse, because it didn't work only on registers, but also on memory locations. So although I think it's on the right path, that offset is clearly not used the same way for memory addresses (or maybe it doesn't even get set). I traced down one example, and this is the code in SSA that handles moving right. if (KSCAN_JOYX == JOY_RIGHT) { if (playership != SHIP_SELENA) { wrapplayerright(); } if (SHIP_C < 224-playerXspeed) SHIP_C+=playerXspeed; } else if.... Compiles down to this with the patch: movb @SpriteTab+1, r1 get player x as a BYTE mov @playerXspeed, r2 get player speed as a UINT movb r1, r4 copy x srl r4, 8 make int mov r2, r3 copy speed ai r3, >FF20 make 224-speed neg r3 fix up c r4, r3 compare jhe L105 skip out if not enough room swpb r2 make speed a char (sort of) ab r1, r2 add x mov r2, @SpriteTab+1 write it back <<<--- bug, it's a byte, not a word swpb @SpriteTab+1 but then swap it <<<--- bug - don't want this at all jmp L105 done You can see the new code was inserted there when it was time to write the value back to memory. While we can treat a byte as a 16-bit value on the registers (probably), we absolutely can not when it's a memory address. And given the code was correct before the patch, it seems likely that the new sequence is needed ONLY for register to register moves? Oddly, move left wasn't affected and still looked fine: movb @SpriteTab+1, r1 get player x as a BYTE mov @playerXspeed, r2 get player speed as a UINT movb r1, r3 copy X srl r3, 8 make int c r3, r2 make sure enough room jle L105 skip if not swpb r2 make speed a char (sort of) sb r2, r1 do the subtract movb r1, @SpriteTab+1 write it back jmp L105 done Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 1 Share Posted January 1 (edited) So based on that, I did a test of adding a new constraint to movqi just so I could tell if it was a register to register case. This seems to work in all cases in Super Space Acer, at least... and both the cases above resolve. So running with that, here's a diff of the changes I'm running with, all in tms9900.md --- tms9900.md.old 2023-12-30 21:37:29.809899700 -0700 +++ tms9900.md 2024-01-01 03:21:58.648751700 -0700 @@ -109,7 +109,7 @@ ;; Jump to a subroutine which returns a value (define_insn "call" [(call (match_operand:HI 0 "general_operand" "rR,Q") - (match_operand:HI 1 "general_operand" "g,g")) + (match_operand:HI 1 "general_operand" "g,g")) ] "" { @@ -194,25 +194,25 @@ ;;------------------------------------- (define_insn "tsthi" [(set (cc0) - (match_operand:HI 0 "nonimmediate_operand" "rR,Q"))] + (match_operand:HI 0 "register_operand" "r"))] "" { tms9900_debug_operands ("tsthi", operands, 1); return("mov %0, %0"); } - [(set_attr "length" "2,6")]) + [(set_attr "length" "2")]) ;;------------------------------------- (define_insn "tstqi" [(set (cc0) - (match_operand:QI 0 "nonimmediate_operand" "rR,Q"))] + (match_operand:QI 0 "register_operand" "r"))] "" { tms9900_debug_operands ("tstqi", operands, 1); return("movb %0, %0"); } - [(set_attr "length" "2,6")]) + [(set_attr "length" "2")]) ;;------------------------------------- (define_insn "cmphi" @@ -264,14 +264,14 @@ ;;------------------------------------------------------------------- ;; Move byte value (define_insn "movqi" - [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,rR>,Q, Q,r, r") - (match_operand:QI 1 "general_operand" "rR>, Q, rR>,Q,OM,i"))] + [(set (match_operand:QI 0 "nonimmediate_operand" "=r,=rR>,rR>,Q, Q,r, r") + (match_operand:QI 1 "general_operand" " r, rR>, Q, rR>,Q,OM,i"))] "" { tms9900_debug_operands ("movqi", operands, 2); tms9900_inline_debug ("; movqi alt=%d\n", which_alternative); int val; - if (which_alternative >= 4) + if (which_alternative >= 5) { val = INTVAL(operands[1]) << 8; tms9900_inline_debug ("; movqi imm MUL const %d * 256\n", val); @@ -291,10 +291,17 @@ } else { - return ("movb %1, %0"); + if ((which_alternative == 0) && (REG_OFFSET (operands[1]) == 1)) { + // register to register, word to byte + output_asm_insn ("mov %1, %0", operands); + output_asm_insn ("swpb %0", operands); + return ""; + } else { + return ("movb %1, %0"); + } } } - [(set_attr "length" "2,4,4,6,2,4")]) + [(set_attr "length" "4,2,4,4,6,2,4")]) ;;------------------------------------------------------------------- ;; Move two-byte value (edit: removed spoiler tag cause I think this was missed) Edited January 8 by Tursi Quote Link to comment Share on other sites More sharing options...
JasonACT Posted January 1 Share Posted January 1 1 hour ago, Tursi said: [(call (match_operand:HI 0 "general_operand" "rR,Q") - (match_operand:HI 1 "general_operand" "g,g")) + (match_operand:HI 1 "general_operand" "g,g")) Constructive criticism... It was already right before the edit. 1 hour ago, Tursi said: + (match_operand:QI 1 "general_operand" " r, rR>, Q, rR>,Q,OM,i"))] https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gccint/Simple-Constraints.html#Simple-Constraints Quote whitespace Whitespace characters are ignored and can be inserted at any position except the first. This enables each alternative for different operands to be visually aligned in the machine description even if they have different number of constraints and modifiers. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 1 Share Posted January 1 On 12/31/2023 at 11:35 AM, PeteE said: +254 -256 bytes is +127 -128 words LOL. And so it is. Typing with my head turned off. 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 1 Share Posted January 1 (edited) 8 hours ago, JasonACT said: Constructive criticism... It was already right before the edit. https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gccint/Simple-Constraints.html#Simple-Constraints That's unnecessary, please don't do that. I already fixed it after I posted. Can we stay focused on the actual problems here and not waste bandwidth on nitpicks? Edited January 1 by Tursi Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 1 Share Posted January 1 Let me clarify a bit, since that feel like it comes off a bit harsh even after I tried to tone it down. I'm still figuring this out, and I am indeed learning from the things you post. However, I shouldn't have to worry every time I reach for 'submit' that JasonACT is going to nitpick the irrelevant parts of my post. I'm only communicating concepts here, not final code. I trust khanivore to read the diff, extract what he feels is useful, and discard the rest, including the incorrect whitespace. Since I'm testing every step, it's pretty late by the time I am posting conclusions, and I am far more interested in getting to bed than fixing whitespace. If I wanted to submit a final patch, I'd be doing that over at Github, where it's not only much easier for khanivore, but hidden from extra eyes. In short, I'm asking for you not to be searching for the easy parts to pick off, but only the relevant parts. Pick away at those! 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 2 Share Posted January 2 (edited) New bug, all my fault. The instruction dump let me see the warning I was missing Edited January 2 by Tursi Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 2 Share Posted January 2 On 1/1/2024 at 10:10 AM, Tursi said: Unfortunately, JasonACT's patch made things much worse, because it didn't work only on registers, but also on memory locations. So although I think it's on the right path, that offset is clearly not used the same way for memory addresses (or maybe it doesn't even get set). Yeah that's a gotcha I've hit a few times. Many opcodes are 16-bit only and we have to be careful to ensure we only ever use them on regs. I'm starting to use R0 as a permanent scratch reg for some cases like this. The reloads are very messy in gcc13 since most instructions in tms9900 affect the status reg so isolating moves from compares is a nightmare otherwise. 1 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 2 Share Posted January 2 On 1/1/2024 at 5:47 AM, JasonACT said: I'm not sure how far along you are though I'm deep down a rabbit hole with gcc13. The refactor was bigger than I thought. It is nearly working now and an initial comparison of the output of gcc13 and gcc4 is that the assembly code is about 5% smaller with gcc13. Which isn't that much, but gcc13 does allow more flexibility to define implicit compares and a few other bits so might be able to improve that further. I did also find and fix a few issues in the md file along the way which I can apply to the gcc4 branch as well. So I might maintain both in the short term. On 1/1/2024 at 5:47 AM, JasonACT said: I'm back at work tomorrow, so there's less hobby time for me now, sadly At least that's one problem I don't have any more 🙂 Quote Link to comment Share on other sites More sharing options...
TheMole Posted January 2 Share Posted January 2 24 minutes ago, khanivore said: I'm deep down a rabbit hole with gcc13. The refactor was bigger than I thought. It is nearly working now and an initial comparison of the output of gcc13 and gcc4 is that the assembly code is about 5% smaller with gcc13. Which isn't that much, but gcc13 does allow more flexibility to define implicit compares and a few other bits so might be able to improve that further. Feel free to let me know when I can test compiling gcc13 on my mac, still having no luck with gcc4 on modern macos's here... Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 2 Share Posted January 2 4 hours ago, TheMole said: Feel free to let me know when I can test compiling gcc13 on my mac, still having no luck with gcc4 on modern macos's here... Thanks. I've a bit of a way to go yet. There is a dev branch up in the repo but there are no scripts to create patches or anything like that yet. No libgcc either and limited testing. Quote Link to comment Share on other sites More sharing options...
JasonACT Posted January 3 Share Posted January 3 20 hours ago, khanivore said: Yeah that's a gotcha I've hit a few times. Many opcodes are 16-bit only and we have to be careful to ensure we only ever use them on regs. I'm starting to use R0 as a permanent scratch reg for some cases like this. The reloads are very messy in gcc13 since most instructions in tms9900 affect the status reg so isolating moves from compares is a nightmare otherwise. I've decided to go with this for movqi, since I think it covers some extra scenarios: else { if ((GET_CODE(operands[0]) == REG) && (GET_CODE(operands[1]) == REG)) { output_asm_insn ("mov %1, %0", operands); if (REG_OFFSET (operands[0]) != REG_OFFSET (operands[1])) output_asm_insn ("swpb %0", operands); return ""; } else return ("movb %1, %0"); } } [(set_attr "length" "4,4,4,6,2,4")]) 18 hours ago, khanivore said: I'm deep down a rabbit hole with gcc13. The refactor was bigger than I thought. It is nearly working now and an initial comparison of the output of gcc13 and gcc4 is that the assembly code is about 5% smaller with gcc13. Which isn't that much, but gcc13 does allow more flexibility to define implicit compares and a few other bits so might be able to improve that further. I did also find and fix a few issues in the md file along the way which I can apply to the gcc4 branch as well. So I might maintain both in the short term. 13 hours ago, khanivore said: There is a dev branch up in the repo I see it, wow, what a difference - welcome to Wonderland. I strangely find myself in Oz (meh, story of my life, Aussie Aussie Aussie, Oi Oi Oi) but being this heartless character: I think he's likeable enough though. 2 2 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 3 Share Posted January 3 On 1/1/2024 at 10:10 AM, Tursi said: You can see the new code was inserted there when it was time to write the value back to memory. While we can treat a byte as a 16-bit value on the registers (probably), we absolutely can not when it's a memory address. And given the code was correct before the patch, it seems likely that the new sequence is needed ONLY for register to register moves? Yes, offsets of a byte within a subreg should only apply to registers. If you could turn on the inline rtl dumps we could learn where the opcodes come from. e.g. : ; zero_extendqihi2-5 ; OP0 : (reg:HI 2 r2 [27])code=[reg:HI] ; OP1 : (reg:QI 2 r2 [orig:22 color.0 ] [22])code=[reg:QI] srl r2, 8 And it will tell you if subregs are being used and what their offset is so we could track own missing extends. It would be great if we could come up with a generic predicate pattern that captures and expands usage of subregs with offsets. Insomnia had inserted an extra pass in the compiler to try and do this but I have disabled it as I don't think its the right way to go. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.