Jump to content
IGNORED

GCC for the TI


insomnia

Recommended Posts

Just to be sure, I tried the follwoing:

  • Running the compiler without input -> works fine, complains about no input (obviously)
  • Trying to compile on a non-existing file -> works fine, complains that the file couldn't be openend
  • Trying to compile an empty file -> segfaults

It tried both of these without flags and with -O2 on the off chance that perhaps it was a specific codepath that was broken, but the results were exactly the same.

 

So, I don't think it makes much sense for me to PM you a file :)

However, on the off chance that it is helpful anyway, this is the file I tried compiling on my first attempt: https://raw.githubusercontent.com/themole-ti/ghostbusters/main/bank0/crt0.c

 

  • Like 1
Link to comment
Share on other sites

3 hours ago, TheMole said:

Just to be sure, I tried the follwoing:

  • Running the compiler without input -> works fine, complains about no input (obviously)
  • Trying to compile on a non-existing file -> works fine, complains that the file couldn't be openend
  • Trying to compile an empty file -> segfaults

It tried both of these without flags and with -O2 on the off chance that perhaps it was a specific codepath that was broken, but the results were exactly the same.

 

So, I don't think it makes much sense for me to PM you a file :)

However, on the off chance that it is helpful anyway, this is the file I tried compiling on my first attempt: https://raw.githubusercontent.com/themole-ti/ghostbusters/main/bank0/crt0.c

 

Compiles fine for me.  Though I had to comment out include tramplines.h

Is there any way to efficiently distribute binaries?  deb, VM image, docker, tarball?  But I guess if your CPU is arm then my x86-64 bins are no good anyway?  (I'm not even going to think about how you would build a tms9900 target compiler for arm host on an x86 build host 🙂)

  • Like 1
Link to comment
Share on other sites

(I'm not even going to think about how you would build a tms9900 target compiler for arm host on an x86 build host 🙂)

Canadian Cross!! (No, I haven't done one for over 15 years. Anything I was knew is obsolete.)

Linux exes are a pain in the ass. They hard code dependencies in the executable so there's no guarantee they'll run on another system even if it is the same cpu. That said, a lot of people use Docker because it's easier to ship an entire environment than to fix the problem that needed it in the first place. It's not the worst thing I've ever used. That would still be Lotus Notes. ;) Docker does cause some conflicts on Windows with WSL2 that made me have to roll back to WSL1 on my old PC, I don't know if they have fixed those.

 

I might be able to help a little with the segfault... MAYBE. I isolated two segfaults in my Super Space Acer code that I worked around with the old compiler.

 

One was a very long string constant - anything longer than 1k would fail to compile. (ie: const char x[] = "hello world for 1024+1 bytes...."; ) It'd be nice if this was extended, not sure where such a limit exists, but IMO there's no harm filling a whole bank with a single string. (That'd be 8k, but my string is a bit under 2k ;) ). Of course, I worked around it by splitting the string, though that does insert an unwanted NUL in the middle.

 

I thought that the other was related to dividing by a signed char, but stepping back through my git history, I can't reproduce.

 

Running the install.sh under Ubuntu WSL, it seemed to finish binutils fine (which, frankly, I don't think I managed before). GCC warned me about needing GMP 4.1+ and MPFR 2.3.2+, and I told the machine to MPFR itself, but that didn't work. It's very possible I didn't build on this installation before. I installed those libs and ran again.

 

It looked like binutils and gcc itself built okay, but I had some issues with libiberty and libgcc:

 

Quote

make[3]: Entering directory '/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build/libiberty/testsuite'
make[3]: Nothing to be done for 'install'.
make[3]: Leaving directory '/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build/libiberty/testsuite'
make[2]: Leaving directory '/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build/libiberty'
/bin/bash: line 3: cd: tms9900/libstdc++-v3: No such file or directory
make[1]: *** [Makefile:10624: install-target-libstdc++-v3] Error 1
make[1]: Leaving directory '/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build'
make: *** [Makefile:2476: install] Error 2

...  
 

/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build/./gcc/xgcc -B/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build/./gcc/ -B/home/tursilion/newtms9900-gcc/newgcc9900/tms9900/bin/ -B/home/tursilion/newtms9900-gcc/newgcc9900/tms9900/lib/ -isystem /home/tursilion/newtms9900-gcc/newgcc9900/tms9900/include -isystem /home/tursilion/newtms9900-gcc/newgcc9900/tms9900/sys-include -g -O2 -O2  -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wcast-qual -Wold-style-definition  -isystem ./include   -g  -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -Dinhibit_libc  -I. -I. -I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include  -DHAVE_CC_TLS -o _ffsdi2.o -MT _ffsdi2.o -MD -MP -MF _ffsdi2.dep -DL_ffsdi2 -c ../../../libgcc/../gcc/libgcc2.c \

../../../libgcc/../gcc/libgcc2.c: In function ‘__ffsdi2’:
../../../libgcc/../gcc/libgcc2.c:547: error: unrecognizable insn:
(insn 101 100 102 22 ../../../libgcc/../gcc/libgcc2.c:545 (set (subreg:HI (reg:SI 21 [ prephitmp.34 ]) 2)
        (const_int 65535 [0xffff])) -1 (nil))
../../../libgcc/../gcc/libgcc2.c:547: internal compiler error: in extract_insn, at recog.c:2048
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[1]: *** [Makefile:359: _ffsdi2.o] Error 1
make[1]: Leaving directory '/home/tursilion/newtms9900-gcc/build/gcc-4.4.0/build/tms9900/libgcc'
make: *** [Makefile:11552: all-target-libgcc] Error 2
=== Failed to build libgcc.a ===

I seem to remember seeing those before, so probably not new. I never cared in the past, and sure enough the executables I care about are installed where I need them.

It's late and I shouldn't be up, but I did a quick test on libti99 (which is actually a new version that merges the Coleco and TI versions - I'll release it properly soon) and Super Space Acer (which builds and runs on the old compiler, too, so no bug fix checking).

 

With this build TESTLIB fails right after asking whether to enable F18A tests. It ends up setting an illegal graphics mode. The code is clearly running, I can watch the heat map and see it scanning the keyboard and running the graphics tests. EXAMPLE fails as well, similar symptoms.

 

Everything's right except VDP register 1, which is returned from SET_BITMAP_RAW as an unsigned char (it was an INT in the other libti99, for exactly this reason, but it was working in the old compiler - an assertion I will double check before I close this email ;) )

 

It was probably enough to just look at vdp_setgraphics.c - it shows the bug.

 

The call looks like this:

 

void set_graphics(unsigned char sprite_mode) {
    unsigned char x = set_graphics_raw(sprite_mode);
    VDP_SET_REGISTER(VDP_REG_MODE1, x);
    VDP_REG1_KSCAN_MIRROR = x;
}

unsigned char set_graphics_raw(unsigned char sprite_mode) {
    vdpchar = vdpchar_default;
    scrn_scroll = scrn_scroll_default;

	unsigned char unblank = VDP_MODE1_16K | VDP_MODE1_UNBLANK | VDP_MODE1_INT | sprite_mode;

    (.. do a bunch of register setup which is all correct .. )

	return unblank;   <-- we lose it here. For some reason, instead of calculating the value above (which is E0 + sprite_mode (0) in this case), it does a SETO R1
}

 

I'll attach the two files for your review in case it helps. I'll also attach the assembly for the old version of the compiler, which seems to work.

 

vdp_setgraphics_bug.zip



 

  • Like 3
Link to comment
Share on other sites

1 hour ago, Tursi said:

../../../libgcc/../gcc/libgcc2.c:547: error: unrecognizable insn:
(insn 101 100 102 22 ../../../libgcc/../gcc/libgcc2.c:545 (set (subreg:HI (reg:SI 21 [ prephitmp.34 ]) 2)
        (const_int 65535 [0xffff])) -1 (nil))
../../../libgcc/../gcc/libgcc2.c:547: internal compiler error: in extract_insn, at recog.c:2048

This looks like the same issue I saw when initialising a long.  It says it can't find an insn to set a HI (16-bit) front a const in the expansion of init SI (32-bit).  Even though "movhi" is right there.  I'll keep looking into this one.

(edit: just noticed, this is building libgcc, which is broken anyway for now, but not used unless using floats)

 

1 hour ago, Tursi said:

I'll attach the two files for your review in case it helps. I'll also attach the assembly for the old version of the compiler, which seems to work.

Very good, thanks, I'll look into this as well

  • Like 1
Link to comment
Share on other sites

Update: I fetched the latest from the main branch, and while it still errors out, it doesn't segfault anymore... maybe the error message itself is

helpful:

	[CC] bank0/crt0.c...
bank0/crt0.c:1: internal compiler error: in subreg_highpart_offset, at emit-rtl.c:1304
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
make: *** [bank0/crt0.o] Error 1

 

 

3 hours ago, khanivore said:

Compiles fine for me.  Though I had to comment out include tramplines.h

Is there any way to efficiently distribute binaries?  deb, VM image, docker, tarball?  But I guess if your CPU is arm then my x86-64 bins are no good anyway?  (I'm not even going to think about how you would build a tms9900 target compiler for arm host on an x86 build host 🙂)

I think it's something macos (and perhaps arm) specific, so don't worry too much about it. I don't want this to get in the way of the progress you're making (have I said thank you for picking up where Insomnia left off already?)! Maybe we can get back to it once you're comfortable with the stability on Linux...

  • Like 1
Link to comment
Share on other sites

Oops, I missed a byte shift.  value should be in MSB before &.  Defining DEBUG in tms9900.c dumps the insn expansions into the .s file.   In tms9900.md:1404 this:
 

      val = INTVAL(operands[2]) & 0xFF00;


was causing :

 

; iorqi3-28
; OP0 : (reg:QI 1 r1)code=[reg:QI]
; OP1 : (reg:QI 1 r1)code=[reg:QI]
; OP2 : (const_int -32 [0xffffffffffffffe0])code=[const_int:VOID]

; iorqi3 intval=FFFFFFE0 val=FF00
        seto r1


but should be:

 

; iorqi3-28
; OP0 : (reg:QI 1 r1)code=[reg:QI]
; OP1 : (reg:QI 1 r1)code=[reg:QI]
; OP2 : (const_int -32 [0xffffffffffffffe0])code=[const_int:VOID]

; iorqi3 intval=FFFFFFE0 val=E000
        ori  r1, >E000

 

Unit test added

  • Like 2
Link to comment
Share on other sites

4 hours ago, Tursi said:

One was a very long string constant - anything longer than 1k would fail to compile. (ie: const char x[] = "hello world for 1024+1 bytes...."; ) It'd be nice if this was extended, not sure where such a limit exists, but IMO there's no harm filling a whole bank with a single string. (That'd be 8k, but my string is a bit under 2k ;) ). Of course, I worked around it by splitting the string, though that does insert an unwanted NUL in the middle.

 

I'm not seeing this one, init of a string of 1029 chars works fine for me.  Maybe related to bss init in crt0 or other?

 

(correction, while it compiles ok, it crashes when run in the emulator, looks like maybe an assembler limitation, most likely binutils-2.19.1/gas/config/tc-tms9900.c:566)

  • Like 1
Link to comment
Share on other sites

1 hour ago, TheMole said:

Update: I fetched the latest from the main branch, and while it still errors out, it doesn't segfault anymore... maybe the error message itself is

helpful:

Unfortunately, I still can't reproduce that one.  I even tried a clean checkout in case something on my branch fixed it, but I get no errors at all here (aside from the known ones in libgcc)

1 hour ago, TheMole said:

I think it's something macos (and perhaps arm) specific, so don't worry too much about it. I don't want this to get in the way of the progress you're making (have I said thank you for picking up where Insomnia left off already?)! Maybe we can get back to it once you're comfortable with the stability on Linux...

Thanks! No problem I'm happy to help and learn something new.

  • Like 1
Link to comment
Share on other sites

5 hours ago, khanivore said:

I'm not seeing this one, init of a string of 1029 chars works fine for me.  Maybe related to bss init in crt0 or other?

 

(correction, while it compiles ok, it crashes when run in the emulator, looks like maybe an assembler limitation, most likely binutils-2.19.1/gas/config/tc-tms9900.c:566)

Nope, it was definitely the compiler that crashed. I wasn't anywhere near running code yet by that point. Try 2k... it might not have been exactly 1k.

 

I worked around it, but I was surprised by it.

 

Also not terribly important. There are other ways to put large amounts of data in there. ;)

 

  • Like 1
Link to comment
Share on other sites

6 hours ago, khanivore said:

Oops, I missed a byte shift.  value should be in MSB before &.  Defining DEBUG in tms9900.c dumps the insn expansions into the .s file.   In tms9900.md:1404 this:

Ah, that makes sense. I was trying to figure out why the compiler came up with >FFFF for that sequence - sign extension happened. ;) Good catch!

 

  • Like 2
Link to comment
Share on other sites

I will just put out another data point here.  I have about 40 unit tests that I've written in the past that test various things: my own TI code, @Tursi's libTi99, as well as gcc tms9900 compiler output. 

 

Almost all of my unit tests pass with yesterday's release (patch gcc-4.4.0-tms9900-1.23.patch).  I saw a couple of issues, during testing: The first being background/ foreground colors not being set as expected, and the second being unexpected output on one of my unit tests.  I still need to investigate, but thought I'd post what I've found so far.

 

The GCC version I built with is 13.2.1, and I am running Fedora Linux 39 x64.

 

Great work so far with the GCC updates, @khanivore!

  • Like 3
Link to comment
Share on other sites

38 minutes ago, Tursi said:

Nope, it was definitely the compiler that crashed. I wasn't anywhere near running code yet by that point. Try 2k... it might not have been exactly 1k.

 

I worked around it, but I was surprised by it.

 

Also not terribly important. There are other ways to put large amounts of data in there. ;)

 

Ok, have it now.  I was using cc1 for my tests but if I use gcc -c I see the error.  Looks like the tms9900.c file just tries to put all the text in one block where other backends split it across multiple lines.  Should be an easy fix.

  • Like 1
Link to comment
Share on other sites

Built and tested again. Libti99ALL works now (there are a couple of bugs but I don't know if they are my side or gcc - but it's the same as the old compiler.) However, Super Space Acer fails - coming up corrupted and then crashing. It will take me longer to dig into it to see where it's failing as it's much, much more complex, but as far as the title page /runs/, the title page is just corrupted. So I have a good idea where to debug, early in the startup.

  • Like 1
Link to comment
Share on other sites

In this case, the bug was in my RLE unpack function. At a point where it's supposed to mask off a bit in a byte value, it instead zeros it. The rest of the function looks correct.

 

void RLEUnpack(unsigned int p, const unsigned char *buf, unsigned int nMax) {
	unsigned char z;
	int cnt;	// looks like the boss pack code has some bugs and packs too many bytes, we need this

	cnt = nMax;
	VDP_SET_ADDRESS_WRITE(p);
	while (cnt > 0) {
		z=*buf;
		if (z&0x80) {
			// run of byte
			buf++;
			z&=0x7f;
			raw_vdpmemset(*buf, z);
			buf++;
		} else {
			// sequence of data
			buf++;
			raw_vdpmemcpy(buf, z);
			buf+=z;
		}
		cnt-=z;
	}
}


This generates this asm. Our inputs are R1=>0000, R2=>6D72, R3=>1800
The first bytes at >6D72 are 8B 00 01 04 B7 00 01 10

 

	def	RLEUnpack
RLEUnpack
	ai   r10, >FFF6             Stack setup
	mov  r10, r0
	mov  r11, *r0+
	mov  r9, *r0+
	mov  r13, *r0+
	mov  r14, *r0+
	mov  r15, *r0
	mov  r2, r14                save CPU address to R14
	mov  r3, r15                save byte count to R15
	mov  r1, r2                 move VDP address to R2
	sla  r2, 8                  get LSB of address
	movb r2, @>8C02             write to VDP address
	ori  r1, >4000              merge command bits to VDP address for write
	srl  r1, >8                 unnecessary (in this case) clear of LSB
	sla  r1, 8
	movb r1, @>8C02             write command and high byte to VDP address
	jmp  L163                   jump into loop terminator, it'll come back to L166
L166
	movb *r14+, r13             get first byte into R13 (8D) and increment (optimization!)
	jgt  L164                   jump if it's positive
	jeq  L164                   or zero
	
* handler for run of byte (>80 bit set)	
	clr r13                     ** Bug? We are supposed to do z&=0x7f to remove the >80 bit <<-----
	movb r13, r2                copy the result into r2 for the call (count)
	srl  r2, 8                  make it a byte
	movb *r14+, r1              get the data byte we need into R1 (and increment)
	li   r3, raw_vdpmemset      address of the function to call
	bl   *r3                    call it (why not use immediate?)
	jmp  L165                   jump down to wrap up this token
	
* handler for sequence of data (>80 bit clear)	
L164
	movb r13, r9                copy the (now a count) into R9 (R9 temp is unneeded)
	srl  r9, 8                  make byte count into a word
	mov  r9, r2                 copy the word into R2 for the function call 
	mov  r14, r1                copy the data address into R1 for the function call
	li   r3, raw_vdpmemcpy      address of the function to call
	bl   *r3                    call it (again, immediate?)
	a    r9, r14                add the count to the source address
L165
	srl  r13, 8                 make byte count into a word
	s    r13, r15               subtract it from the total count
L163
	mov  r15, r15               check remaining count
	jgt  L166                   if still positive, loop around

	mov  *r10+, r11             else, restore stack and return
	mov  *r10+, r9
	mov  *r10+, r13
	mov  *r10+, r14
	mov  *r10+, r15
	b    *r11

 

Link to comment
Share on other sites

4 hours ago, TheBF said:

It's really cool to see the compiler output. 

How hard will it be to convince the compiler to use: 

 

BL @raw_vdpmemcpy

In cases where the call is made more than once, loading it in a register is good for performance, but I don't know if there's enough information to decide which way is quicker. There are a few other places where it could be more optimal - like loading the address to VDP would be quicker to just OR the >4000 into the original value and use SWPB. But, the compiler is also trying to account for every possible case, and sometimes that leads to slightly less optimal code. Overall I'm usually pretty impressed by the GCC output. ;)

 

  • Like 3
Link to comment
Share on other sites

7 hours ago, Tursi said:

In this case, the bug was in my RLE unpack function. At a point where it's supposed to mask off a bit in a byte value, it instead zeros it. The rest of the function looks correct.

Oops again, same bug in AND as was in OR.  Missing right shift in tms9900.md:1326.  Should be :     

val = (INTVAL(operands[2]) << 8) & 0xFF00;     

My test missed it because this code path is only executed for immediate.  I'll add another test.

Regarding all the bit shifts, one thing I think I can do is define "strict" and "nonstrict" versions of byte extensions.  nonstrict means we don't care about the low byte in a reg if it is only ever used for byte ops and could use SWPB which should be faster the SRL. 

Though in the unnecessary case above, the shift actually comes from the VDP_SET_ADDRESS_WRITE macro

VDPWA=(((x|0x4000)>>8));

The compiler doesn't know why you are shifting right, but it knows it needs to shift it left to do a MOVB.

Another possible improvement, I'm thinking that saving R13,R14,R15 on every function call is excessive as we never emit BLWP.  I could make R15 the SP and R14 the BP to make R1-R10 general regs and reduce stack/mem use.

  • Like 1
Link to comment
Share on other sites

7 hours ago, TheBF said:

It's really cool to see the compiler output. 

How hard will it be to convince the compiler to use: 

 

BL @raw_vdpmemcpy

 

 

In fact it does, but not when using the optimiser flags.  It thinks it is faster not to.  I'll have to look into why it thinks that.

 

  • Like 1
Link to comment
Share on other sites

10 hours ago, khanivore said:

Though in the unnecessary case above, the shift actually comes from the VDP_SET_ADDRESS_WRITE macro

VDPWA=(((x|0x4000)>>8));

The compiler doesn't know why you are shifting right, but it knows it needs to shift it left to do a MOVB.

Another possible improvement, I'm thinking that saving R13,R14,R15 on every function call is excessive as we never emit BLWP.  I could make R15 the SP and R14 the BP to make R1-R10 general regs and reduce stack/mem use.

Ahh, interesting. Some 8-bit compilers, like SDCC for the Z80, recognize that sequence as accessing a single byte of a temporary and directly reach for it instead of shifting. I have no idea how they detect that though. ;)

 

I'd say you're right on R13-R15. If we save that on every function call, removing that could be a big win.

 

  • Like 2
Link to comment
Share on other sites

On 12/6/2023 at 2:42 PM, chue said:

I saw a couple of issues, during testing: The first being background/ foreground colors not being set as expected, and the second being unexpected output on one of my unit tests.  I still need to investigate

Just to close on the above, these are code issues on my end and not issues in the compiler.

  • Like 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...