+khanivore Posted January 3 Share Posted January 3 6 hours ago, JasonACT said: I've decided to go with this for movqi, since I think it covers some extra scenarios: Looks reasonable to me, covers either src or dst being a subreg and ensures registers only Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 3 Share Posted January 3 (edited) 8 hours ago, khanivore said: Yes, offsets of a byte within a subreg should only apply to registers. If you could turn on the inline rtl dumps we could learn where the opcodes come from. e.g. : Yeah, I turned that on a couple times but I was never sure which output to look at. I'm going to need to understand what the heck a subreg is though. For now the diff I posted above is working for all of my code, but I'll adapt JasonACT's latest tweak to movqi and see if it helps (edit: all good so far! But I've not yet seen the inverse pattern emitted, so not too surprised.) Edited January 3 by Tursi Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 4 Share Posted January 4 12 hours ago, Tursi said: Yeah, I turned that on a couple times but I was never sure which output to look at. I'm going to need to understand what the heck a subreg is though. For now the diff I posted above is working for all of my code, but I'll adapt JasonACT's latest tweak to movqi and see if it helps (edit: all good so far! But I've not yet seen the inverse pattern emitted, so not too surprised.) They are hard to get your head around and the documentation is not that helpful IMO: https://gcc.gnu.org/onlinedocs/gccint/Regs-and-Memory.html My understanding is that initially the compiler pretends it has an unlimited supply of pseudo-registers. In our case, the first pseudo reg is r16. After several passes it does a "reload" pass where it reduces all pseudo regs to physical regs. After reload we cannot allocate any more regs, so some insns need a condition of "reload_completed", especially in gcc13 for compares and if_then_else. Anyway in gcc4.4 at least, it seems sometimes during reload that gcc sometimes inserts a subreg expression to access part of a reg without inserting a trunc or expand insn. I'm finding it hard to create a case where this happens but I know it has happened in "andhi3" and more recently in "movqi" ? In this codelet: static volatile unsigned char color; /* Replicating a bug in vdp_bmcolor where a swpb was missed in andhi3 */ static void __attribute__ ((noinline)) setbackground(int cisint) { color = (color & 0xF0) | (cisint & 0x0F); } void t_bitwise_replace() { color = 0xaa; setbackground(0x55); } The inline rtl dumps show the operands like this. Physical reg r2 was allocated to replace pseudo-reg 27: ; andhi3-6 ; OP0 : (reg:HI 2 r2 [27])code=[reg:HI] ; OP1 : (reg:HI 2 r2 [27])code=[reg:HI] ; OP2 : (const_int 240 [0xf0])code=[const_int:VOID] andi r2, >F0 If you run "gcc -da" you get a dump of each compiler pass. I can see earlier on in the combiner phase (159) there is a subreg expression here: (set (reg:HI 27) (and:HI (subreg:HI (reg:QI 22 [ color.0 ]) 0) (const_int 240 [0xf0]))) But in this case, the subreg ref is gone by the final pass, so what I'm looking to see is whether we have any "(subreg:HI (reg:QI ..." or vice versa references in the final output itself and is there any pattern to when they appear. 2 Quote Link to comment Share on other sites More sharing options...
matthew180 Posted January 5 Share Posted January 5 22 hours ago, Tursi said: I'm going to need to understand what the heck a subreg is though. From the link khanivore provided, it looks to me that a `subreg` is like the low and high (AH and AL) parts of a larger register (like AX on the x86 CPUs). On x86, AL and AH are the high and low 8-bits of the AX 16-bit reg. And now, with 32-bit and 64-bit registers, AX is the low 16-bit of the 32-bit EAX register. And EAX is the low 32-bits of the 64-bit RAX register. The x86 has instruction encoding to use those "sub register" parts. The 9900 does not have sub-register access directly like the x86, but since the registers are memory-based, you can use offsets from the workspace pointer address to get access to the bytes (usually low-byte) of the registers for use with byte instructions. In the sequence above that results in `andi r2, >F0`, it seems it would have been better if the result was in the MSB of the register, since byte operations against registers on the 9900 manipulate the MSB, and you don't need to fuss about with SWPB. But it also depends on what comes next in the code. I'm probably not saying anything that is not already known by everyone here. Quote Link to comment Share on other sites More sharing options...
+dhe Posted January 5 Share Posted January 5 I had an idea for a trivial game. How to implement one part made me think for a minute, because unlike looking for a keystroke, or waiting for input, I wanted to spin off the user to a task for a specific time period and then bring them back in to the main routine. I was able to knock together a crude proto-type in gcc (linux). #include <stdio.h> #include <time.h> int time_up; // I'm a global variable /* function declaration */ void usr_func(); int main() { time_t T = time(NULL); struct tm tm = *localtime(&T); time_up = tm.tm_min+2; printf("System Date is: %02d/%02d/%04d\n", tm.tm_mday, tm.tm_mon + 1, tm.tm_year + 1900); printf("System Time is: %02d:%02d:%02d\n", tm.tm_hour, tm.tm_min, tm.tm_sec); printf("In Five Minutes it will be %02d:%02d:%02d\n", tm.tm_hour, tm.tm_min+5, tm.tm_sec); usr_func(); printf("User Function is done! \n"); return 0; } // end main void usr_func() { int character; time_t T = time(NULL); struct tm tm = *localtime(&T); printf("Give me some input.\n"); for(;;) { time_t T = time(NULL); struct tm tm = *localtime(&T); character = getchar(); if (tm.tm_min == time_up) return; } // end of for loop } // end usr_func I've looked at the libraries provided by @jedimatt42 and @Tursi and didn't see a clock or timer function. Does anyone have a library something like time.h for the TI-99/4A? Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 6 Share Posted January 6 2 hours ago, dhe said: I've looked at the libraries provided by @jedimatt42 and @Tursi and didn't see a clock or timer function. Does anyone have a library something like time.h for the TI-99/4A? Not I. The usual clock interface on the TI is a DSR access, although if you know the clock you have installed you can use hardware. But the CLOCK device would be used with OPEN/READ/CLOSE through DSRLNK. Nothing stops that from being wrapped with time.h functions, I just don't have a clock. 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted January 6 Share Posted January 6 ForceCommand has code for reading many 4A clocks abstracted into a programmer accessible datetime function. https://github.com/jedimatt42/fcmd/wiki/API#fc_datetime-- Which is implemented here: https://github.com/jedimatt42/fcmd/blob/e3c3f68cff5d66b1317333f6f79f681f4162695b/b5_clock.c#L50 1 Quote Link to comment Share on other sites More sharing options...
JasonACT Posted January 7 Share Posted January 7 On 12/4/2023 at 10:33 PM, khanivore said: This is still on gcc-4.4.0. The port to gcc-13.2 was successful, but the output was exactly the same, and the build time jumped to several hours, so for expediency, I'm sticking with gcc-4 for now. Took me a while to find this post, it was a while ago now, but I've just managed to build 13.2 with the current GitHub dev files, on my Windows machine (had to upgrade a few libs, hack some of my system header files and edit GCC's pretty-print.cc for hard-coded values for COMMON_LVB_REVERSE_VIDEO & COMMON_LVB_UNDERSCORE - not entirely sure why those 2 defines don't make it through configure - possibly a GCC bug). Total build time is 11 minutes and 20 seconds, on my 13 year old Windows PC I'm keen for a new update, when you get some free time, the latest snapshot doesn't include all the current fixes. 3 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 7 Share Posted January 7 4 hours ago, JasonACT said: Total build time is 11 minutes and 20 seconds, on my 13 year old Windows PC Not bad! I was building over nfs and have moved since to local SSD so must time it again. 4 hours ago, JasonACT said: I'm keen for a new update, when you get some free time, the latest snapshot doesn't include all the current fixes. Sure thing, let me spin one later today. I haven't bottomed out yet on a standard way to detect subreg use but I can at least accumulate known fixes to date. 3 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 7 Share Posted January 7 On 1/4/2024 at 6:59 PM, khanivore said: But in this case, the subreg ref is gone by the final pass, so what I'm looking to see is whether we have any "(subreg:HI (reg:QI ..." or vice versa references in the final output itself and is there any pattern to when they appear. I just checked and I'm afraid I misled you guys here. The subreg doesn't appear in the inline debugs either. After the combine phase, the only way to detect an omitted extend or trunc is to check for a non-zero offset. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 7 Share Posted January 7 @Tursi @JasonACT and others could I ask you to check something? I've noticed if I define LOAD_EXTEND_OP that gcc doesn't drop the zero_extend before and : --- a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.h +++ b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.h @@ -577,6 +577,9 @@ enum reg_class an identifier node for the subroutine name. */ #define RETURN_POPS_ARGS(FUNDECL,FUNTYPE,SIZE) 0 +/* MGB to load a byte into a word we need to zero extend it */ +#define LOAD_EXTEND_OP(MODE) ZERO_EXTEND + /* Passing Arguments in Registers. */ /* The number of argument registers we can use (R1..R8) */ Which means the offset check in andhi3 needs to be removed. Does this work also for the movqi case? Quote Link to comment Share on other sites More sharing options...
JasonACT Posted January 8 Share Posted January 8 17 hours ago, khanivore said: Does this work also for the movqi case? I'll start with a disclaimer! My build environment is somewhat different after the upgrades I needed for GCC 13.2 - but I added the #define above and it made no difference to the code generated or debug output. FYI - my test case for movqi is this: int test2 (unsigned char c, unsigned int len); unsigned char test () { return test2 (32, 32); } 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 8 Share Posted January 8 I was not sure if I was supposed to roll back previous changes before testing this patch, so I did that, and Super Space Acer dies even before the title page. Example.c fails as well. Took me a bit to see why since the failure was complete, but the first hit seems to be in vdpmemcpy on a write to VDP: // Write Data #define VDPWD *((volatile unsigned char*)0x8C00) void vdpmemcpy(int pAddr, const unsigned char *pSrc, int cnt) { VDP_SET_ADDRESS_WRITE(pAddr); while (cnt--) { VDPWD=*(pSrc++); } } pseg even def vdpmemcpy vdpmemcpy * Begin inline assembler code * 89 "../vdp.h" 1 swpb r1 get lower byte of VDP address movb r1,@>8c02 write it swpb r1 get high byte ori r1,>4000 add write flag movb r1,@>8c02 write it to VDP andi r1,>3fff mask back to original (probably) * 0 "" 2 * End of inline assembler code mov r3, r3 check count for zero jeq L4 return if so clr r1 zero index L3 mov r2, r4 copy CPU address to R4 a r1, r4 add index movb *r4, r4 get byte srl r4, 8 sign extend byte to int <--- bug or movb r4, @>8C00 copy byte to VDP <--- bug. Data is no longer in the MSB inc r1 increment index c r1, r3 check for end jne L3 loop if not L4 b *r11 return .size vdpmemcpy, .-vdpmemcpy Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 8 Share Posted January 8 57 minutes ago, Tursi said: I was not sure if I was supposed to roll back previous changes before testing this patch, so I did that, and Super Space Acer dies even before the title page. Example.c fails as well. Took me a bit to see why since the failure was complete, but the first hit seems to be in vdpmemcpy on a write to VDP: Thanks, yeah, I'm seeing the same behaviour. It inserts an extend (srl 8 ) but then continues to treat the reg as a byte. Ok, I'll have to abandon that one. Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 8 Share Posted January 8 I just merged 1.28 to main. I included a slightly reduced version of the fix to movqi and some other updates: Version 1.28 * Improve shift operations by moving shift count to R0 * Use R0 as a clobber in compare and other instructions * Fix issue in mov QI where truncate from HI was missed * Fix compare to zero by mov'ing to R0 instead of dest * Added improved 32-bit divide and multiply libgcc functions from mrvan 3 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 8 Share Posted January 8 (edited) Thanks for the patch. However this misses the bug I posted on Jan 1 in movqi where memory is being treated as a register. To repeat, this code: if (playership != SHIP_SELENA) { wrapplayerright(); } if (SHIP_C < 224-playerXspeed) SHIP_C+=playerXspeed; Compiles to this asm, which corrupts memory: L107 movb @SpriteTab+1, r1 get sprite X as BYTE (this is SHIP_C) mov @playerXspeed, r2 get speed as INT movb r1, r4 save X srl r4, 8 make int mov r2, r3 make copy of x speed ai r3, >FF20 substract 224 neg r3 fix sign (want 224-x) c r4, r3 test jhe L105 skip over if already at right edge swpb r2 get LSB of speed ab r1, r2 add sprite X mov r2, @SpriteTab+1 write it back <<<--- bug - it's a byte, not a word swpb @SpriteTab+1 swap the memory <<<--- bug - we don't want this at all jmp L105 I had proposed a patch that added a constraint to movqi specifically for register to register, and only used the new semantics in that case. I guess it got lost in the whitespace debate Edited January 9 by Tursi Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 9 Share Posted January 9 (edited) This is the former patch adapted to the new code. It just adds the new constraint and only does the new stuff if it's register to register. This works on my code. --- /home/tursilion/newtms9900-gcc/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.md 2024-01-08 16:06:49.787280900 -0700 +++ ./tms9900.md 2024-01-08 16:59:33.131332700 -0700 @@ -266,14 +266,14 @@ ;;------------------------------------------------------------------- ;; Move byte value (define_insn "movqi" - [(set (match_operand:QI 0 "nonimmediate_operand" "=rR>,rR>,Q, Q,r, r") - (match_operand:QI 1 "general_operand" "rR>, Q, rR>,Q,OM,i"))] + [(set (match_operand:QI 0 "nonimmediate_operand" "=r,=rR>,rR>,Q, Q,r, r") + (match_operand:QI 1 "general_operand" "r,rR>, Q, rR>,Q,OM,i"))] "" { tms9900_debug_operands ("movqi", operands, 2); tms9900_inline_debug ("; movqi alt=%d\n", which_alternative); int val; - if (which_alternative >= 4) + if (which_alternative >= 5) { val = INTVAL(operands[1]) & 0xff; tms9900_inline_debug ("; movqi imm MUL const %d * 256\n", val); @@ -285,15 +285,15 @@ else { operands[1] = GEN_INT(val<<8); - return ("li %0, %1"); + return ("li %0, %1"); } } else { - /* If src is a reg and has an offset, then it has been downgraded from a + /* Register to Register - If src has an offset, then it has been downgraded from a * HI, so move the entire 16-bit word and do a byte swap */ - if (REG_P (operands[1]) && REG_OFFSET (operands[1]) == 1) + if (which_alternative == 0 && REG_P (operands[1]) && REG_OFFSET (operands[1]) == 1) { output_asm_insn ("mov %1, %0", operands); output_asm_insn ("swpb %0", operands); @@ -304,7 +304,7 @@ return ""; } } - [(set_attr "length" "2,4,4,6,2,4")]) + [(set_attr "length" "4,2,4,4,6,2,4")]) ;;------------------------------------------------------------------- ;; Move two-byte value It does not cover the case of a register having the byte in the wrong place, but the destination being a memory address. But I have 15,000 lines of test code here and don't appear to hit that (I have not debugged the entire game yet, of course ). I suppose the only real fix for that would be a swapb, movb, swapb on the source register (or assuming the register base and using a movb @>83xx,@target, which would be only a few cycles faster and probably not worth it.) All my other previous cases are resolved with the new build. (Not sure what I hit in the li line, it's not intentionally changed ) Edit: I wonder now if it wouldn't always be more correct to swpb, movb, swpb. That way you're never overwriting data you aren't supposed to, even if we're pretty sure it's safe. But it does feel wasteful in the reg->reg case if we know we never need that other byte. Edited January 9 by Tursi Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 9 Share Posted January 9 10 hours ago, Tursi said: I had proposed a patch that added a constraint to movqi specifically for register to register, and only used the new semantics in that case. I guess it got lost in the whitespace debate Ah, I misunderstood, I thought that was an issue with the patch proposed by Jason, not the main code itself. Ok, will fix that. (edit: which it was, never mind, I have it reproduced now) Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 9 Share Posted January 9 12 hours ago, Tursi said: It does not cover the case of a register having the byte in the wrong place, but the destination being a memory address. But I have 15,000 lines of test code here and don't appear to hit that (I have not debugged the entire game yet, of course ). I suppose the only real fix for that would be a swapb, movb, swapb on the source register (or assuming the register base and using a movb @>83xx,@target, which would be only a few cycles faster and probably not worth it.) The bit that's confusing me now though is that R2 already was a QI and already was in the right place so the correct opcode to emit should have been "MOVB r2, @SpriteTab+1" which is what it was without the patch. So are we misinterpreting the offset on the src reg? I need to run some more tests .... 12 hours ago, Tursi said: Edit: I wonder now if it wouldn't always be more correct to swpb, movb, swpb. That way you're never overwriting data you aren't supposed to, even if we're pretty sure it's safe. But it does feel wasteful in the reg->reg case if we know we never need that other byte. Sounds a little wasteful alright, though should only be in exceptional cases. We could always do MOV %1, R0 and SWPB on R0. Still 3 instructions though. Or we could also define the WP location as you suggested and do MOVB Rs,@[WP+Rd*2+1] which I've seen in ROM code but looks hacky and I don't like fixing a dependency on WP location into the compiler. (BTW I also didn't address the byte subtract issue yet. Added an issue to the repo to track this one too) Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 9 Share Posted January 9 Not sure. I can only tell you it didn't work until I merged both JasonACT's fix and the register constraint. I do have a new one, though it's a bit baffling to me. For once it's not byte based - but rather loop optimization. This is a bit long, but I'll cut the unneeded bits. It doesn't trigger unless I leave most of the code in place. void whoded() { unsigned char rd,cd; unsigned char r,c; /*check boss specific collisions*/ for (uint8 a=0; a<NUM_SHOTS; a++) { // check for valid shot if (!shr[a]) continue; // note: in the test case, ALL VALUES are 0 and continue always executes. // check if hit a piece of boss if (checkdamage(shr[a], shc[a], pwrlvl&0x0f)) { playsfx_hitboss(); spdel(a+PLAYER_SHOT); shr[a]=0; continue; } // check if hit an engine for (uint8 b=0; b<3; ++b) { if (ent[b] != ENEMY_ENGINE) continue; spposn(b+ENEMY_SPRITE, r, c); rd = abs(r-shr[a]); if (rd <= 20) { cd = abs(c-shc[a]); if (cd <= 15) { if (f18a) { loadpal_f18a(f18WhitePalette, PAL_BOSS*4, 16); // set boss palette to white } else { VDP_SET_REGISTER(VDP_REG_CT, 0x0f); // CT at >03C0 (makes boss flash white, everything else already is) } bosscnt=3; // how many cycles to stay white (should be 3 frames per cycle) ep[b]-=damage[pwrlvl&0x07]; if (ep[b]<=0) { playsfx_explosion(); addscore(5); ep[b]=0; enr[b]=192; ent[b]=ENEMY_NONE; spdel(b+ENEMY_SPRITE); } else { playsfx_armor(); } spdel(a+PLAYER_SHOT); shr[a]=0; break; } } } } } This will be a bit harder to simulate, I guess, but you can create stubs for the functions. When checking for the end of the a loop, it comes up with a weird number that causes it to loop for thousands of cycles, instead of NUM_SHOTS which is 9. It's using the address in the shr array, but adding a hard coded value. Since this is pre-link, it shouldn't know the address to know a value to add, should it? Either way, the resulting value is way off. I've skipped the un-executed code here: def whoded whoded ai r10, >FFF2 fix stack mov r9, @>C(r10) save mov r11, @>A(r10) save li r1, shr prepare shr (shot row) array (int) mov r1, *r10 save to stack li r2, SpriteTab+88 base of engine sprites, I think? mov r2, @>4(r10) save to stack li r3, shc shc array (int) mov r3, @>2(r10) save to stack L49 mov *r10, r4 shr address->r4 mov *r4, r1 get value of shr[a] jne JMP_8 process if not zero b @L41 but in the crash case, it should be, so skip ahead JMP_8 mov @pwrlvl, r3 (...skipping...) andi r3, >F (...) ci r9, >3 jeq JMP_10 b @L48 JMP_10 L41 mov *r10, r4 get shr array address inct r4 next one mov r4, *r10 save it back mov @>4(r10), r1 engine? sprite base ai r1, >4 next sprite mov r1, @>4(r10) save it back mov @>2(r10), r2 get shc address inct r2 next one mov r2, @>2(r10) write it back ai r4, >9120 check if shr is finished counting? <<-- bug? Where the heck did this value come from? Way too small... shr is 10 bytes at >2284 jeq JMP_11 leap out if done b @L49 else loop around and go again JMP_11 ai r10, >A mov *r10+, r11 mov *r10+, r9 b *r11 .size whoded, .-whoded even I went to turn on debug in tms9900.c, but it's already there. I don't see any debug in the assembly files, though, did the #define change? Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 9 Share Posted January 9 (edited) Ah, I just had to read the comments. Unfortunately it doesn't seem to have emitted anything for the add? ; movhi-451 ; OP0 : (mem/c:HI (plus:HI (reg/f:HI 10 r10) ; (const_int 2 [0x2])) [4 %sfp+2 S2 A16])code=[mem:HI] ; OP1 : (reg:HI 2 r2)code=[reg:HI] ; movhi alt=2 mov r2, @>2(r10) ai r4, >9120 jeq JMP_11 b @L49 JMP_11 I should add this is not likely a new bug, it's just the first time I've gotten far enough to drill into this one. I've confirmed most of the rest of the boss code and that's the last part of the game to prove! Edited January 9 by Tursi 1 Quote Link to comment Share on other sites More sharing options...
+khanivore Posted January 9 Share Posted January 9 I can see a couple of places where AI gets emitted but no debug is printed. "*sub_const_hi" is adding a negative number and will use AI if it can't use DEC, DECT, INC or INCT, as will *rsubhi" which looks like it is the operand swapped version of the same function. I'm none the wiser where >9120 came from though I can't see what shr was defined as. Maybe something went wrong with the negation math. Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 9 Share Posted January 9 6 minutes ago, khanivore said: I can see a couple of places where AI gets emitted but no debug is printed. "*sub_const_hi" is adding a negative number and will use AI if it can't use DEC, DECT, INC or INCT, as will *rsubhi" which looks like it is the operand swapped version of the same function. I'm none the wiser where >9120 came from though I can't see what shr was defined as. Maybe something went wrong with the negation math. shr is an array of ints. int shr[NUM_SHOTS+1] - where NUM_SHOTS is 9. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted January 9 Share Posted January 9 ; movhi-451 ; OP0 : (mem/c:HI (plus:HI (reg/f:HI 10 r10) ; (const_int 2 [0x2])) [4 %sfp+2 S2 A16])code=[mem:HI] ; OP1 : (reg:HI 2 r2)code=[reg:HI] As an outside observer, lay person, I am guessing that this is the language used to program what code GCC emits. ? (I am enjoying watching the show. Thanks) (PS don't tell me Forth is cryptic anymore) 1 Quote Link to comment Share on other sites More sharing options...
Tursi Posted January 9 Share Posted January 9 Interestingly.. the constant changed when I changed all the local variables to ints (which I wanted to do just for cleaner code, but of course it doesn't affect the outer loop with this optimization.) Instead of >9120, it added >90E0, exactly 64 less. I don't know if that's a red herring or meaningful though Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.