Jump to content
IGNORED

GCC for the TI


insomnia

Recommended Posts

17 hours ago, Tursi said:

I suppose I could optimize my macro using pointers to save those shifts. That wouldn't hurt my feelings too much and would be worth the cycles. ;)

 

"premature optimisation is the root of all evil" 🙂

I'm going to focus on functionality first and then performance later.  There are lots of peephole optimisations that I haven't touched yet.  TBH, I'm not convinced they are needed.  e.g. doing a once-off setup of a VDP reg might not be worth optimising.  And in cases that are, it will probably be safer to hand-code some inline assembly than wrestling with GCC predicates to try and do it and risk breaking something else in the process.

  • Thanks 2
Link to comment
Share on other sites

Thanks to @mrvan I have floating point support now in libgcc.  The methods make calls to the console ROM to do ADD,SUB,MUL and DIV.  This is on branch "libgcc" for now.  I'll merge once Ive done some more testing.

Some caveats/notes:

floats and doubles are the same.  Both are 64-bit values encoded in TI real format.

I found a bug in byte compares.  The params are swapped.  Will fix this on next release.

I told gcc we are using decimal floats (base 10).  Most other floats use base 2 but I didn't want 1+1=1.99999 etc so I think it's better to tell it to use base 10 internally (Decimal128) as that will map better to TI floats without rounding errors.  Unfortunately, though this means a small patch to the compiler which I wanted to avoid but I'll keep it small.

The console ROM clobbers some values in the address range >83C0 to >83EO so I've moved WP to >83A0.   I think this is safe as I think it is only used by TI basic.

I can't get the compiler to find libgcc.a automatically and have to add it the the cmd line as a library.  Not sure why it doesn't find it relative to the path of gcc.

But apart from that it seems to be working.

image.thumb.png.142d0c13861af180d016deb2c8701b64.png

  • Like 6
Link to comment
Share on other sites

18 hours ago, Tursi said:

Ahh, interesting. Some 8-bit compilers, like SDCC for the Z80, recognize that sequence as accessing a single byte of a temporary and directly reach for it instead of shifting. I have no idea how they detect that though. ;)

 

I'd say you're right on R13-R15. If we save that on every function call, removing that could be a big win.

 

The issue with the tms9900 is MOVB accesses the MSB whereas I think any other CPU, MOVB would know to access the LSB.  It does know when you have an 8-bit byte and want a 16-bit short and emits "extendqihi" which does the shift.  I guess we could emit movb Rsrc,@ws+Rdst*2+1 to store byte values .... but maybe safer for now to use SWPB or SRL.

I did try moving SP to R15 and it works well, gcc uses R1-R10 as general regs resulting in fewer memory accesses.  But it would mean changing and rebuilding lots of things including libti99 so I've just archived it is a patch in the repo for now.

  • Like 2
Link to comment
Share on other sites

For the floating point routines, I think it would be safe to use >83E0 as the workspace ( the existing GPLWS ), most C applications are only using the GPLWS temporarily... so hopefully none of them are expecting it to be persistent... That would avoid mandating more reserved scratchpad usage. 

  • Like 1
Link to comment
Share on other sites

7 hours ago, khanivore said:

working.

image.thumb.png.142d0c13861af180d016deb2c8701b64.png

What converts  the float to string? Is it a printf format or other?

 
I've been studying floating point precision and conversion to decimal string.
 

One thing I proved about base2 floating point: you can be standard-C compliant and still round to avoid 5.999... The C definition is: sufficient precision to re-construct the original bits.  I find that printf is naive and does not round when the standard permits it.
 

 
Anyhow I like TI Radix-100, it adds character. 


 

Link to comment
Share on other sites

1 hour ago, FarmerPotato said:

What converts  the float to string? Is it a printf format or other?

 
I've been studying floating point precision and conversion to decimal string.
 

One thing I proved about base2 floating point: you can be standard-C compliant and still round the last place to avoid 9999.  The definition is: sufficient precision to re-construct the original bits.  Tragically, I find that printf is naive and does not choose to round when the standard permits it.
 

 
 


 

I wrote a quick-and-dirty ftoa() function.  It just dumps all the digits followed by the exponent.  That’s why all the trailing zeros.  Since the digits are base 10 already there is no need for any rounding.

(edit: internally, I've told the compiler to parse code into decmal128 format which is radix 10 IEEE 754 with densely packed decimal (10 bits per 3 digits) so that radix 10 to radix 100 should be lossless.  I convert decimal128 to string and then string to TI float which seems inefficient but it is how the compiler does it for other formats anyway.  And much easier than trying to parse packed decimal.  Also we're compiling on a fast host so we don't mind too much about the efficiency of the compiler itself).

Link to comment
Share on other sites

28 minutes ago, FarmerPotato said:

That is the interrupt workspace. Console ROM uses it for VDP R#1, pointer to sound list, others. 

I thought that was >83C0?  Ok I’m probably getting away with it since ints are disabled.  Not sure what other area we can safely use if we want to use console ROM routines.

Link to comment
Share on other sites

2 hours ago, jedimatt42 said:

For the floating point routines, I think it would be safe to use >83E0 as the workspace ( the existing GPLWS ), most C applications are only using the GPLWS temporarily... so hopefully none of them are expecting it to be persistent... That would avoid mandating more reserved scratchpad usage. 

We are currently using >83E0 though.  I noticed that our base pointer (r9) had been zeroed.  I could save r9,r10,r11 before calling the console I guess.

Link to comment
Share on other sites

I'd better start again ... I'm conflating some issues and confusing myself .. and probably everyone else.

The crt0.asm in libti99 uses WP at >8300.  But when we call ROM routines for floating point, the console routines use some of this area as a scratchpad.  So I tried using >83C0 on the basis that we have ints disabled but as @FarmerPotato rightly pointed out the console stores some values in there too and there were conflicts as well.  I'm using >83A0 for now as I thought it would be safe to use based on https://www.unige.ch/medecine/nouspikel/ti99/padram.htm

@jedimatt42,  @mrvan's floating point wrapper functions do switch to GPLWS before calling the console and switch back again afterwards. I don't think we can use >83E0 for all C code as I suspect there would be conflicts there too.  e.g. FLAGS at >83FD.

 

 

  • Thanks 1
Link to comment
Share on other sites

4 minutes ago, khanivore said:

I'm using >83A0 for now as I thought it would be safe to use based on https://www.unige.ch/medecine/nouspikel/ti99/padram.htm

 

You should not have a problem using >83A0. In my floating point library (FPL) in fbForth 2+, I use both >8380 (32-byte GPL subroutine stack) and >83A0 (32-byte GPL data stack) because I am not using any GPL functions that might use them. My FPL also makes use of the console ROM FP functions. I use >83A0 as the FPL workspace and >8380 as a stack for up to 4 FP numbers. I also use one or more GPL workspace locations, when they are not otherwise in use, but I am careful to restore >83FA – >83FF when done. As an example, I use >83FE as a stack pointer for my FPL transcendental functions, restoring it to contain >8C02 (VDP RAM Write Address) when done.

 

...lee

  • Like 5
Link to comment
Share on other sites

1 hour ago, khanivore said:

I wrote a quick-and-dirty ftoa() function.  It just dumps all the digits followed by the exponent.  That’s why all the trailing zeros.  Since the digits are base 10 already there is no need for any rounding.

(edit: internally, I've told the compiler to parse code into decmal128 format which is radix 10 IEEE 754 with densely packed decimal (10 bits per 3 digits) so that radix 10 to radix 100 should be lossless.  I convert decimal128 to string and then string to TI float which seems inefficient but it is how the compiler does it for other formats anyway.  And much easier than trying to parse packed decimal.  Also we're compiling on a fast host so we don't mind too much about the efficiency of the compiler itself).

Wow, I did not know IEEE-754 had other representations than Base2. Packing 3 digits into 10 bits is RADIX 1000!  That's a lot of bit field shifting to get it out. 
 

 

 


 

 

  • Like 2
Link to comment
Share on other sites

On 12/8/2023 at 6:39 AM, khanivore said:

"premature optimisation is the root of all evil" 🙂

I'm going to focus on functionality first and then performance later.  There are lots of peephole optimisations that I haven't touched yet.  TBH, I'm not convinced they are needed.  e.g. doing a once-off setup of a VDP reg might not be worth optimising.  And in cases that are, it will probably be safer to hand-code some inline assembly than wrestling with GCC predicates to try and do it and risk breaking something else in the process.

I'm in production, it's well beyond premature ;) I don't tend to wait for things!

 

  • Haha 1
Link to comment
Share on other sites

4 hours ago, Tursi said:

Could we maybe avoid calling ROM routines, or restrict such calls to an optional library? One of the things I strive for is ROM independence - if you build it into the compiler that will be a major pain for me as I can't work around that.

 

 

 

Ok I'll see if I can make them optional.  ROM routines are only called for float operations.  In a way they kind of already as they are in libgcc which isn't being linked in by default.  All the compiler does is emit "BL __addf3" etc so its up to whatever you link in to satisfy the link.  We could always add the floating point emulation library but it will bloat the binaries a bit.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

@Tursi I agree with you that the compiler itself should not depend on any ROM, as it should be possible to use it to compile for other targets.
On the other hand it would be very helpful if the compiler were distributed with a very thin (optional) library and by library I mean anything. It could just be a bunch of macros and not necessarily a fully fledged pre-compiled library to link.
This is what is done on nearly all other 8-bit compilers (CC65 for the 6502, Z88DK for the I8080 and Z80, CMOC for the 6809, LCC1802 for the 1802, CC6303 for the 6800/3, etc.).
You get a compiler together with a little (or big in some cases) dev-kit that allows you to handle some input/output on the target machine.

So that your project only depends on one devkit, which you can install in one single step.

Ideally your library or a sub-set of it could be optionally distributed with the compiler as long as you are not strongly against it.

Link to comment
Share on other sites

10 hours ago, khanivore said:

Ok I'll see if I can make them optional.  ROM routines are only called for float operations.  In a way they kind of already as they are in libgcc which isn't being linked in by default.  All the compiler does is emit "BL __addf3" etc so its up to whatever you link in to satisfy the link.  We could always add the floating point emulation library but it will bloat the binaries a bit.

Yeah, I think this should be okay, maybe having both options available, depending on the target you're compiling for. But it's definitely worth keeping in mind that some people would end up using this on a platform without the TI ROMs. I myself have been mulling a personal project for a while now that would require changing the ROMs, ala jiffyDOS on the C64...

 

  • Like 1
Link to comment
Share on other sites

On 12/8/2023 at 7:05 AM, khanivore said:

I did try moving SP to R15 and it works well, gcc uses R1-R10 as general regs resulting in fewer memory accesses.  But it would mean changing and rebuilding lots of things including libti99 so I've just archived it is a patch in the repo for now.

I just noticed this... don't be afraid of needing libti99 to be rebuilt. I ship the binary but I really do expect everyone to build it for their own compiler (and hell, I need to rebuild the Coleco version every time SDCC updates ;) ).

 

I didn't see any new patches, so I applied the one fix you mentioned for my RLE bug manually and rebuilt. Just for kicks I also disabled the stack save of R13-R15. With those fixes the test apps mostly work (again, I haven't debugged the few weirdnesses yet, they may be my fault), and Super Space Acer runs the title page and attract screen correctly. More than that I haven't debugged yet either. So as far as I'm concerned it's at least at par with the previous with that fix. ;)

 

Link to comment
Share on other sites

8 hours ago, Tursi said:

I just noticed this... don't be afraid of needing libti99 to be rebuilt. I ship the binary but I really do expect everyone to build it for their own compiler (and hell, I need to rebuild the Coleco version every time SDCC updates ;) ).

 

I didn't see any new patches, so I applied the one fix you mentioned for my RLE bug manually and rebuilt. Just for kicks I also disabled the stack save of R13-R15. With those fixes the test apps mostly work (again, I haven't debugged the few weirdnesses yet, they may be my fault), and Super Space Acer runs the title page and attract screen correctly. More than that I haven't debugged yet either. So as far as I'm concerned it's at least at par with the previous with that fix. ;)

 

Sounds good, thanks.  I'm just running a new patch through some final tests now and hope to release soon.  I've made the changes to R13-R15 and added a switch to output TI float or IEEE floats.

  • Like 1
Link to comment
Share on other sites

I've merged my latest changes to main and bumped the patch number version to 1.25. 

Here is a .deb as well if any linux users want to try it out instead of building from src.


This version includes several changes:
 

  • Uses TI float format internally for floats and doubles (can be disabled with -mno-ti99-floats)
  • libgcc now includes floating point operations to call console ROM routines
    • NOTE Do not include libgcc if you don't want any dependency on console ROM
  • Moves stack pointer from R10 to R15 and base pointer from R9 to R14
    • This change allows R1-R10 to be used as general regs which improves performance
    • NOTE Check your crt0.asm file to ensure it is setting up R15 as the
    • stack pointer and not R10 which was used previously
  • Fixes byte compares
  • Fixes immediate AND and OR operations on byte values
  • Fixes shift of long by a variable
  • Removes insns relating to 32-bit arithmetic and shift
    • The compiler emits 16-bit instructions to perform these
  • Includes more unit tests for bytes, shorts, longs and floats

     

 

Edited by khanivore
removed deb pkg
  • Like 5
Link to comment
Share on other sites

Thank you @khanivore for your work on TMS9900 GCC!

 

I had downloaded the 1.25 patches from the dev branch yesterday; today I pulled down the latest 1.25 patches from main. 

 

In comparing the two I see that in one of my projects the compiled output went from 6528 bytes (yesterday's 1.25 patch) to 7552 bytes (today's 1.25).  

In another of my projects targeting the 8K cartridge ROM space, the output was smaller than 8K yesterday but today is larger.

 

I still need to investigate where the size difference comes from, but just wanted to post this feedback.

  • Like 1
Link to comment
Share on other sites

I've got some too... something has broken r11 tracking. The drawlinefast method in libti99 calls out to an assembly function using inline assembly and specifying that r11 is munged.

 

	__asm__(
	"	mov %0,@>8324		; y2 -> r2\n"
	"	mov %1,@>8326		; x2 -> r3\n"
	"	mov %2,@>8328		; y1 -> r4\n"
	"	mov %3,@>832a		; x1 -> r5\n"
	"	mov %4,@>833a		; mode -> r13\n"
	"	movb @gBitmapColor,@>833c	; color -> r14 MSB\n"
	"	bl @bm_asmdrawlinefast\n"
	:
	: "r" (y2), "r" (x2), "r" (y1), "r" (x1), "r" (mode)
	: "r11"
	);

 

This block no longer saves off r11 before the inline assembly or restores it afterwards. Here's the source and the two asm files, though they won't tell you much more. 

inlineasm.zip

  • Like 1
Link to comment
Share on other sites

I was curious about the size difference, so I checked my example.c and it grew a bit too. It's a very simple app, so I thought analyzing it might be helpful. example.zip

 

What I'm seeing is a strange paranoia around the stack pointer - it is saving and restoring itself to the stack. Which, if it was actually necessary, wouldn't work anyway. That's probably most of the increased size.

 

The other is the handling of function calls. Previously R9 was used for function calls and was guaranteed to be preserved. The new code chose R4 in this case and because it wanted to use vdpmemset more than once, it pre-emptively saved it to the stack, instead of letting the called function decide if that was necessary. So the stack save/restore added a few instructions as well.

 

In tracing this I also noticed that the string pointer to a function call was saved to the stack instead of stored in a register like other values on function calls. Opportunity for improvement there, eventually. ;)

 

Zip contains C source, and commented assembly for old and new compiler output.

 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...