Jump to content
IGNORED

GCC for the TI


insomnia

Recommended Posts

Nice work on patch 1.29.  Binary size is much improved over previous patches!

 

Patch 1.18: 8048 bytes

Patch 1.29: 7860 bytes (188 less)

 

I'm using CFLAGS+= -O1 -fno-function-cse

When I tried -O2, I get the same ICE that Tursi reported.

  • Like 5
Link to comment
Share on other sites

@khanivore I tried to compile Ghostbusters, given it requires the compiler to produce smaller code than we were getting previously...

 

This is the smallest example I can make:

#define CAR_ID_BEETLE    0

#define CALL_PLAYER_SFX \
    __asm__(                                                        \
        "bl @SfxLoop"                                               \
        : /* no outputs */                                          \
        : /* no arguments */                                        \
        : "r0","r1","r2","r3","r4","r5","r6","r7","r8","r9","r11","r12","r13","r14","r15","cc"   \
        )

#define CALL_PLAYER_SN \
    __asm__(                                                        \
        "bl @SongLoop"                                              \
        : /* no outputs */                                          \
        : /* no arguments */                                        \
        : "r0","r1","r2","r3","r4","r5","r6","r7","r8","r9","r11","r12","r13","r14","r15","cc"   \
        )

#define VSYNC_PLAY { vdpwaitvint(); { CALL_PLAYER_SFX; CALL_PLAYER_SN;} }

unsigned char read_keyboard(); // int read_keyboard() actually compiles OK
void vdpwaitvint();
void SongLoop();
void SfxLoop();

typedef struct
{
	unsigned int    account;
} game_state;

typedef struct
{
	int speed;			// Works in reverse, higher numbers are slower (more time acrued on the map)
	int capacity;		// Not used in game, only in the shop
	int price;			// Not used in game, only in the shop. Add two imaginary zeroes to this number for the display price
} car_stats;

extern game_state			game;

const car_stats cars[4] =
{
	{ 50,  5,  20 },	// Compact/Beetle
	{ 37,  9,  48 },	// Hearse
	{ 30, 11,  60 },	// Wagon
	{ 25,  7, 150 }		// High-Performance
};

void choose_car()
{
	int done = 0;
	int selected_car = CAR_ID_BEETLE;
	while (!done)
	{
		int key = 0;
		while (key != 13)
		{
			key = read_keyboard();
			VSYNC_PLAY;
		}
		if (cars[selected_car].price < game.account)
		{
			done = 1;
		}
	}
}

 

and I get this:

 

$ tms9900-gcc -Os -S ./zgb.c
./zgb.c: In function 'choose_car':
./zgb.c:65: error: unable to generate reloads for:
(insn 40 15 18 3 ./zgb.c:55 (set (cc0)
        (compare (reg:QI 21 [ D.1213 ])
            (mem/u/c/i:QI (symbol_ref/u:HI ("*LC1") [flags 0x2]) [0 S1 A8]))) 6 {cmpqi} (expr_list:REG_DEAD (reg:QI 21 [ D.1213 ])
        (nil)))
./zgb.c:65: internal compiler error: in find_reloads, at reload.c:3781
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.

 

Seems to have something to do with R9 and R13 (compile without -Os and it tells you it doesn't like them being used in "asm here").

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

I have been able to build Ghostbusters using a bit of a work around:

 

int __attribute__ ((noinline)) rk4 () {
    unsigned char c = read_keyboard ();
    return c;
}

 

This was for bank4, I needed the same in bank5...  Also changing "unsigned char key" in get_string() to "int key" to match the replacement call, along with do_account_screen() also using the new rk5() which already had "int key".  This is with -fno-function-cse to reduce the size - and it all fits!  Easily!

 

The game sadly doesn't play normally, it might be due to edits I had done before (though I don't think I did much more than trap the ghost under the height of reach of the beams)...  The buildings never seem to get haunted...  I also always have $0 showing, even though I could buy all the stuff I normally would.

 

I was left driving around just vacuuming up ghosts.  I initially didn't notice there was a problem because the first building I drove to had a ghost (even though it wasn't flashing) and I caught it, but after that I never found another one haunted.

Edited by JasonACT
I'll try rebuilding everything (including the libti stuff) from scratch tomorrow.
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Thanks for trying this stuff out, super useful for me since I haven't been able to build a working version of the new gcc patches on my mac yet.

Just wondering, would changing the read_keyboard function to return an int also work as a work-around? I instinctively tend to define things that essentially represent ASCII values as unsigned chars, but I think using chars is slower on the 9900, so it might make sense to just use ints.

 

Regarding the buildings not getting haunted, it might be related to libti99's gSaveIntCnt not being read correctly/not working correctly... Do the roamers flicker (alternate between their white outline green innards) on the map screen?

Edited by TheMole
  • Like 2
Link to comment
Share on other sites

6 hours ago, TheMole said:

Thanks for trying this stuff out, super useful for me since I haven't been able to build a working version of the new gcc patches on my mac yet.

That's quite alright.

6 hours ago, TheMole said:

Just wondering, would changing the read_keyboard function to return an int also work as a work-around? I instinctively tend to define things that essentially represent ASCII values as unsigned chars, but I think using chars is slower on the 9900, so it might make sense to just use ints.

Yes, that would be a work-around, I had "guessed" incorrectly it was returning an int when constructing the small test case and it didn't bomb - took me a while to find out why and make it unsigned char.

6 hours ago, TheMole said:

Regarding the buildings not getting haunted, it might be related to libti99's gSaveIntCnt not being read correctly/not working correctly... Do the roamers flicker (alternate between their white outline green innards) on the map screen?

I'll take a look today sometime.  I don't get the flicker on the map screen in any version though, my USB composite input device on my PC seems to be only recording at 30Hz.  Sometimes it switches to the other set of scanlines, so they go from outline to green or vice versa, but it's rare.

 

I re-built the whole set of files this morning, took a little bit of work using WinDiff to see what changes I made here and there for my build environment.  No change though.  Still has all the problems I described, including the "$000" money showing throughout the game.  The PK counter is also stuck at 0 (I had done that in a previous test, but this is with a totally fresh copy of the source code files).

Edited by JasonACT
Link to comment
Share on other sites

25 minutes ago, JasonACT said:

I don't get the flicker on the map screen in any version though, my USB composite input device on my PC seems to be only recording at 30Hz.  Sometimes it switches to the other set of scanlines, so they go from outline to green or vice versa, but it's rare.

I just remembered, I do have emulators installed :)  Yes, the ghosts on the map screen flicker, all the other issues persist.

Link to comment
Share on other sites

This doesn't look right @khanivore :

 

unsigned char gSaveIntCnt;
int haunted;

void test () {
    int toggleval = (haunted + gSaveIntCnt) & 0x001f;
    if (toggleval == 0x10)
    {
        haunted = 0;
    }
    else if (toggleval == 0)
    {
        haunted = 1;
    }
}

/* Compile with -Os

	pseg
	even
	def	test
test
	movb @gSaveIntCnt, r1
	a    @haunted, r1
	andi r1, >1F
	ci   r1, >10
	jne  L2
	clr  @haunted
	jmp  L4
L2
	mov  r1, r0
	jne  L4
	li   r1, >1
	mov  r1, @haunted
L4
	b    *r11
	.size	test, .-test
	cseg

	even
	def gSaveIntCnt
gSaveIntCnt
	bss 1

	even
	def haunted
haunted
	bss 2
*/

 

Link to comment
Share on other sites

@JasonACT it looks like addhi3 is behaving exactly as andhi3 did.  I think I'm just going to have to insert a call to check and correct the byte order in every insn that accepts a hi operand.

(also for extending before AND or ADD I think we need to use SRA instead of SWPB otherwise we could end up with random data in the MSB)

Edited by khanivore
Link to comment
Share on other sites

On 2/9/2024 at 8:14 AM, JasonACT said:

Seems to have something to do with R9 and R13 (compile without -Os and it tells you it doesn't like them being used in "asm here").

R9 and R13 are the frame pointer and arg pointer respectively.  You can remove the error in unoptimised builds by passing the flag -fomit-frame-pointer.


For optimised builds, reload seems to be saying it can't find a register to allocate for an insn.  My best guess here would that while we would expect gcc to simply save everything to the stack, during the optimising passes it has rearranged some instructions or whatever and needs one or more registers to allocate, but can't because the asm is declaring that it uses all of them (except r10).  This is unlikely to be fixable I'm afraid.

  • Like 1
Link to comment
Share on other sites

16 hours ago, khanivore said:

For optimised builds, reload seems to be saying it can't find a register to allocate for an insn.  My best guess here would that while we would expect gcc to simply save everything to the stack, during the optimising passes it has rearranged some instructions or whatever and needs one or more registers to allocate, but can't because the asm is declaring that it uses all of them (except r10).  This is unlikely to be fixable I'm afraid.

I wonder why changing the function return type from unsigned char read_keyboard() to int makes a difference?

Link to comment
Share on other sites

12 hours ago, JasonACT said:

I wonder why changing the function return type from unsigned char read_keyboard() to int makes a difference?

Just another guess, but maybe it wants to allocate a scratch reg to extend a byte but not if the return is in the native size?  Though extends are usually done in place, so maybe not.  Dunno tbh.

 

 

On 2/9/2024 at 4:36 PM, TheMole said:

I think using chars is slower on the 9900, so it might make sense to just use ints.

They're the same if just moving data, but when it comes to math or bitwise operations the compiler often has to generate extra code for bytes, so using ints will generally make for smaller and faster code.  We've also seen (but don't know why) that gcc upgrades some byte values to int unnecessarily in some cases.  When doing binary AND for example.

Link to comment
Share on other sites

So, are we stuck then? I'm not trying to be pushy, but I updated my main folder since all previous updates have been trivial to revert. If we're stuck I'm going to need to figure out how to get back to a working state, cause I've kind of sat on my project too long already. ;)

 

  • Like 2
Link to comment
Share on other sites

6 hours ago, Tursi said:

So, are we stuck then? I'm not trying to be pushy, but I updated my main folder since all previous updates have been trivial to revert. If we're stuck I'm going to need to figure out how to get back to a working state, cause I've kind of sat on my project too long already. ;)

 

Okay, that was easy enough. Just had to roll back one commit and reapply the previous fixes for movqi and addqi, and everything appears to build and run again.

 

When the next patch comes out, I'll deploy to a new folder again so I don't break my main workflow. :)

 

  • Like 4
Link to comment
Share on other sites

10 hours ago, Tursi said:

So, are we stuck then? I'm not trying to be pushy, but I updated my main folder since all previous updates have been trivial to revert. If we're stuck I'm going to need to figure out how to get back to a working state, cause I've kind of sat on my project too long already. ;)

 

Sorry didn't realise you were stuck.  Is it just the assert in do_subst() that is blocking you?  I posted a patch on feb 8th to allow constants to be full width which I had presumed would unblock that one.

I've fixed the addhi3 issue on my own branch - it's just a matter of calling the byte correct function there as well. I've also added calls to all the other byte and word move and math insns.  And asserts where conditions arise that we don't expect and don't handle.

The asm with optimised build and all registers reserved is not fixable AFAICT.  Best to just try and reduce registers reserved or build that one file unoptimised.

I'll run as many tests as I can and merge in 1.30 later today with the changes so far for folks to try out.  The only tiny remaining concern I have is when we have to upgrade a byte to a word (e.g. when adding an int to a char) there is no way to know at that point if the char is signed or unsigned so I don't know whether to emit SRL or SRA.  I'm sticking with SRA for now but that might give occasional unexpected results for unsigned chars.

ISTR I had a concern with your method of using constraints only but tbh I can't remember what that was now.  I think I just had a gut feeling it would only create more problems further down the road.

@JasonACT and @TheMole unfortunately I haven't been able to build ghostbusters to try it out.  I'm missing "TISNPlay.h" and other files.

(edit : 1.29 also includes literal constants so AI Rx,>FF00 becomes AB Rx,@CONST which fixes another reported issue)

Link to comment
Share on other sites

Incidentally, I think I've figured out why gcc drops the extend only in some cases.  These two funcs:
 

char c;
int i;
int z;

void f1()
{
  z = (c + i) & 0x1f;
}

void f2()
{
  z = (c + i) & 0x11f;
}


produce different output.  The second correct emits extendqihi2() but the first doesn't.  It looks like the optimiser knows 0x1f is LSB only so it thinks there is no need to convert the char.  But of course, in tms9900 there always is.  And arithmetic is always done on ints as found by @FarmerPotato.  At least now we know why it happens.  This also means the signed v unsigned SRA v SLA concern may be moot.

  • Like 2
Link to comment
Share on other sites

6 hours ago, khanivore said:

The asm with optimised build and all registers reserved is not fixable AFAICT.  Best to just try and reduce registers reserved or build that one file unoptimised.

Why is that? Only the compiler knows what registers it needs, so I can't save all the registers on my side. (Or I could switch workspaces, but one of the goals of the new player was to avoid unnecessarily wasted memory, and it's been working for this long.) It does need to touch all registers. Is there a different trashed list I can get away with, and just save a couple of registers on my side instead of having to assume I must save all of them? It seems like just turning off all optimizations can't be the best answer.

 

7 hours ago, khanivore said:

ISTR I had a concern with your method of using constraints only but tbh I can't remember what that was now.  I think I just had a gut feeling it would only create more problems further down the road.

It seems like I've had more issues than most so I am pretty confident in it. ;) All I did was add a constraint for register->register and handle that separately, since the original version handled register and memory as if they were interchangable, which is what caused the issue.

 

Link to comment
Share on other sites

1 hour ago, Tursi said:

Why is that? Only the compiler knows what registers it needs, so I can't save all the registers on my side. (Or I could switch workspaces, but one of the goals of the new player was to avoid unnecessarily wasted memory, and it's been working for this long.) It does need to touch all registers. Is there a different trashed list I can get away with, and just save a couple of registers on my side instead of having to assume I must save all of them? It seems like just turning off all optimizations can't be the best answer.

Oh, this is a regression?  I missed that.  Ok prolly related to my trying to improve register allocations with these patches.  I can revert them:

 

 /* Non-volatile registers to be saved across function calls */
-#define MAX_SAVED_REGS 2
+#define MAX_SAVED_REGS 6
 
 static int nvolregs[]={
    HARD_LR_REGNUM,
-   HARD_BP_REGNUM};
+   HARD_BP_REGNUM,
+   HARD_R12_REGNUM,
+   HARD_R13_REGNUM,
+   HARD_R14_REGNUM,
+   HARD_R15_REGNUM};

and 

 /* 0 for registers which must be preserved across function call boundaries */
 #define CALL_USED_REGISTERS \
-  {1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1}
+  {1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0}
 /* SC 1  2  3  4  5  6  7  8  BP SP LR CB AP 14 15*/

as to why, I haven't deep dived into the many layers of the many passes of the optimiser so all I can suggest is revert the above and suffer the extra spills to stack.

 

 

1 hour ago, Tursi said:

It seems like I've had more issues than most so I am pretty confident in it. ;) All I did was add a constraint for register->register and handle that separately, since the original version handled register and memory as if they were interchangable, which is what caused the issue.

Ah yes, it was swapping bytes in a mem location.  That's been handled now by a pre and post swap in movqi where MOVB is used for post swaps and MOV for pre swaps so the swap is only ever done in a reg.  I think your constraint worked by ensuring the SWPB was only ever done on a reg but since it now understands the need to only do SWPB after a movqi from a memory location it shouldn't be necessary and should remove the need to load into an interim register which the constraint may cause.

 

Link to comment
Share on other sites

8 hours ago, khanivore said:


@JasonACT and @TheMole unfortunately I haven't been able to build ghostbusters to try it out.  I'm missing "TISNPlay.h" and other files.

Two of the missing files (TISNPlay.h and TISfxPlay.h) are from Tursi's libtivgm2 package: https://github.com/tursilion/vgmcomp2/tree/main/Players/libtivgm2

 

The rest are from libti99: https://github.com/tursilion/libti99

Although I think you only need sound.h from there.

 

libtivgm2.a is actually included in the Ghostbusters repo (by accident, need to fix that later, didn't mean to distribute someone else's code). Make sure you edit the include path in the Makefile to point to the directory where you've downloaded the (three?) files in question.

 

Then, issuing 'make' should be all that's needed to build the thing on POSIX-y OS's (@JasonACT had to jump through some hoops to get it to build on Windows though, IIRC).

 

  • Like 2
Link to comment
Share on other sites

45 minutes ago, TheMole said:

Two of the missing files (TISNPlay.h and TISfxPlay.h) are from Tursi's libtivgm2 package: https://github.com/tursilion/vgmcomp2/tree/main/Players/libtivgm2

Thanks. I was missing that library.  Building now.  But has hit one of the assertions I didn't think could be hit, so need to do a little bit more work...

(edit: basically operands[2] to addhi3 has a missed extend from byte to int in _do_map_screen.  I had mistakenly assumed that could only happen to operands[1])

  • Like 1
Link to comment
Share on other sites

1 hour ago, khanivore said:

Oh, this is a regression?  I missed that.  Ok prolly related to my trying to improve register allocations with these patches.  I can revert them:

Yeah, that was from my music player. It's been working for years. ;)

 

Though it hasn't bitten my own code yet.

 

1 hour ago, khanivore said:

I think your constraint worked by ensuring the SWPB was only ever done on a reg but since it now understands the need to only do SWPB after a movqi from a memory location it shouldn't be necessary and should remove the need to load into an interim register which the constraint may cause.

 

I have my doubts on the movqi one since we've done this dance a few times now... we'll try another round. But I only added a completely new constraint - since all the existing ones are still there I don't see why a new temporary register would be allocated for it. It's still perfectly able to generate memory accesses and does so frequently.

 

I guess the good news is I've debugged the same hits enough times now that I know exactly where in the output code to look for them. ;)

 

Link to comment
Share on other sites

1 hour ago, TheMole said:

libtivgm2.a is actually included in the Ghostbusters repo (by accident, need to fix that later, didn't mean to distribute someone else's code). Make sure you edit the include path in the Makefile to point to the directory where you've downloaded the (three?) files in question.

It's public domain. Distribute at will. ;)

 

  • Like 5
Link to comment
Share on other sites

1 hour ago, khanivore said:

Oh, this is a regression?  I missed that.  Ok prolly related to my trying to improve register allocations with these patches.  I can revert them:

I want to add that I don't actually know what the /problem/ is, since it has not come up in any of my own tests - which includes the four sample apps in libvgmcomp2 and Super Space Acer. So my question was a legitimate question - how do I prep the compiler to call assembly that might change all the registers on it? If it's as simple as a different list and save a couple of critical registers myself that's easy to adapt to.

I'd like this to all get to a point where we can declare it stable enough to move forward and I'll happily tag my libraries as "use THIS version of the compiler". ;)

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, Tursi said:

I want to add that I don't actually know what the /problem/ is, since it has not come up in any of my own tests

I’d love to be able to able to give you a 100% definitive root cause analysis but I fear wading through a million lines of code to find it could take a lifetime.  All I can offer is to say if we had a known working config let’s just go back to it.

1 hour ago, Tursi said:

I'd like this to all get to a point where we can declare it stable enough to move forward and I'll happily tag my libraries as "use THIS version of the compiler". ;)

You and me both 🙂

Link to comment
Share on other sites

1 hour ago, khanivore said:

I’d love to be able to able to give you a 100% definitive root cause analysis but I fear wading through a million lines of code to find it could take a lifetime.  All I can offer is to say if we had a known working config let’s just go back to it.

No, it's the /symptoms/ I don't know. Can't even muse on the cause. But I figured you'd know which registers it was upset about.

 

If not, yeah, no worries. I know you're working on GCC13 so I think maybe the best bet is to just get this version generating correct code, then you can focus on improving the code quality in the newer one instead of going back and forth?

 

  • Like 3
Link to comment
Share on other sites

19 hours ago, Tursi said:

I have my doubts on the movqi one since we've done this dance a few times now... we'll try another round. But I only added a completely new constraint - since all the existing ones are still there I don't see why a new temporary register would be allocated for it. It's still perfectly able to generate memory accesses and does so frequently.

I reviewed the constraint patch and you're right, it shouldn't add a temp reg.  It works by simplifying the checks to alternate == 0 to correctly do mov + swpb instead of movb.  By constraining alt 0 to only regs it simplifies the checks as there is no need to check REG_P etc.  The effect should be exactly the same but there is more checking to do in code if there isn't a separate r,r constraint.

 

I was thinking it's not always going to be that simple.  For example, if we check if op1 is dead then we can do swpb+movb instead.  But thinking about it now, I can't see any advantage to that.  The destination is always going to be clobbered so why not always do the swap in the dest.  Unless it's not a reg, but the constraint ensures that it is, so no issue.

 

Then I was thinking what if the src is a subreg and the dest is a memory location.  The constraint won't allow that.  Well we haven't seen any case so far where a movqi tries to move a byte from a HI reg to a QI mem location and it doesn't seem likely we will so I think we can ignore that case.

 

An advantage of the constraint method is that it can have a worst case length for alt 0 so other alternatives are not affected.

 

A disadvantage is code duplication.  I was hoping to find an elegant solution that could be encapsulated in a function that would avoid lots of if/then/else clauses in each insn.  But it seems every case is different so this is unavoidable anyway.

 

We discovered andhi3 sometimes makes the same assumption, and now addhi3 as well.  I've since tested and found subhi3 will do it too (and I have to assume mul/div/mod will too).  And I've found the arithmetic insns can have missed extends in either operands[1] or operands[2].  But not both (hopefully!).

 

So I do concede a separate constraint for r,r is worthwhile and does simplify things and will add it in.  But I still have some work to do cover all the corner cases in the math insns where either operand may have the wrong byte order.  For example if addhi3 operands[2] is a subreg and it is not dead, then I think we have no choice but to move to an intermediate (r0), swap there, and then add.

 

 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...