-
Posts
190 -
Joined
-
Last visited
Content Type
Profiles
Forums
Blogs
Gallery
Events
Store
Everything posted by khanivore
-
Self Learning Kicad as I design a FDC/IO card
khanivore replied to RickyDean's topic in TI-99/4A Development
Just on the traces, vertical on side and horizontal on the other is generally a good idea but not every trace needs to follow that. Some traces can just route around others and avoid needing any via. I see some traces have vias in the middle with no connection to the other side - and some with no connection at all - you can delete these. If you set a snap to a fraction of the pin pitch you could get the tracks straighter and more parallel. Some of the traces look a little tightly spaced. Id also make all traces either 0, 45 or 90 degrees for aesthetics. Finally consider filling unused areas with copper. One side ground and the other vcc. Make the whole board a decoupling cap and do a bit to help the environment by etching less copper and have fewer drills 🙂 -
Yeah that's a tailcall optimisation. If gcc sees that you are calling a function with a similar return to the current function, it does a B to the function instead of a BL to reduce the code size.
-
I don't know why anyone would need that much bandwidth. I had a 16Mb 4G cellular router for years and it met my needs nearly all of the time (I'm out in the sticks, ADSL was only 2Mb, so I ditched my landline ages ago). I finally got 1Gb fibre last year but it didn't make that much difference. The main improvement is if the family is all on Netflix and I'm on a zoom/teams call it's perfectly stable now, which is nice, but I can't think of any reason to get more than 1Gb.
-
FWIW I've been hosting a few sites on digital ocean droplets for several years and don't remember any outages. Uptime of 462 days on one of my VMs. (edit: of course I've no idea what s/w you are running, I just use bare metal VMs, so this may be irrelevant)
-
Also, just to add that while the SLA is not correct, it isn't as simple as just removing it 'cos then the LSB of r1 will contain junk which will propagate through to the SOC and the address will still be wrong. So I'd need to emit ANDI r1,0xFF or something which will bloat the code even more. Maybe best just to make this one an errata. I had a try at creating a peephole but not easy to match up all the QI and HI operands.
-
Yeah I've looked through the gcc code but I'm not seeing any obvious way to change this behaviour. It seems that if you declare a struct as __packed__ then gcc says ok fine I will make no assumptions whatsoever about alignment and fetch everything one byte at time. I think it's highly unlikely that any structs used by the console or others will contain unaligned 16-bit ints so it should be safe to omit the __packed__ attribute.
-
Thanks - good catch. It looks like for some strange reason gcc thinks it must access elements in a packed struct byte-by-byte, even though they are 16-bit values and are already aligned. I'll have to look into why it does that. The erroneous SLA is coming from GCC itself. Because it thinks any byte value can be treated as a word value, it thinks it needs to move the LSB of r1 to the MSB before SOC, but of course in TMS9900 it already is in the MSB. I can see an offset of -1 in operand[1] to the "ashlhi3" (arith left shift) insn so I'll add a call to my offset checking function there and see if I can eliminate that shift. (edit: just ran a quick test without __packed__ and it generated "mov @>4008, @x" so a short-term work-around may be just drop the attribute)
-
I deliberately avoid mentioning radians 🙂 but that's cool
-
Infinite. Take the earth. Each line of longitude is a full circle (a great circle) with 360 degrees. Well, the volume of a sphere is actually 4/3 pi r^3. Continuing with the earth as an example (not perfectly spherical but close enough for this), in navigation, we define a nautical mile as one minute of one degree of a line of longitude so the circumference of the earth is 360 x 60 or 21,600 nautical miles (there are 1852m in a nautical mile and 1690 in a statute mile, so that's about 25,000 miles or 40,000 km). The radius is then PI/d = 3,438 nautical miles (/60 = 57.29 degrees as you said) and the volume then is 170 billion cubic nautical miles (/60^3 = 787,900 cubic "degrees") or 1 x 10^21 cubic metres.
-
Yep, that's exactly what it does. No special incantations needed. Just simply dictated whether you call any other functions or not.
-
A follow up on register allocation ... tl;dr "preserving regs creates smaller object code". Using R12-R15 as a test case, if we mark a reg as 0 in the CALL_USED_REGISTERS this means that reg must be "preserved across function call boundaries". Then the function prologue generates code like this: ai r10, >FFF4 mov r10, r0 mov r11, *r0+ mov r9, *r0+ mov r12, *r0+ mov r13, *r0+ mov r14, *r0+ mov r15, *r0 and a function can safely call any other functions it wants without saving any regs, since it knows the called function will save them. OTOH if we don't mark them as preserved then the prologue is much shorter, and instead declares a stack frame .... ai r10, >FFF0 mov r9, @>E(r10) mov r11, @>C(r10) ... because it must save everything it is using into the frame before calling another function like this: mov r3, @>2(r10) mov r4, @>4(r10) mov r5, @>6(r10) mov r6, @>8(r10) mov r13, @>A(r10) bl @f2 I see two issues with this: 1) the code to save the regs on the stack frame is not as efficient since it using labels instead of register indirect; 2) the caller doesn't know what the callee may use so must be conservative and save everything. So it seems to me we would be better marking more regs as preserved and the onus is on each function to save the regs they use. This shouldn't affect inline assembly that declares clobbers. Also I think 8 regs for function params is excessive. If we reduce to 4 and mark R5-R8 as general available, but to be preserved, it seems we can further reduce output code size. A counter argument is that leaf functions have to save what they use if regs are preserved but don't need to save anything if not. So the code for leaf functions does get bigger. But I built libti99 with and without preserved regs though and overall the code is smaller with preserved regs.
-
As @JasonACT said, R0 is assumed to be a clobber, no need to declare it. No insns expect R0 to have a persistent value. R10 is Stack pointer, so must be preserved (easiest option) or declared clobber (though this might confuse gcc because it wouldn't be able to save it if it has no free regs). R11 is already clobbered on entry to a function so no need to preserve it. Unless you are an inline, in which case, yes, do declare it as clobbered or save it. R9 is the frame pointer. It should also be preserved or declared clobbered. It isn't always used, especially if build is optimised, and never if you use -fomit-frame-pointer, but better to preserve or delcare clobbered. You can also clobber as many regs as you have params (up to 8). So if your function has 5 params, these are in R1-R5 and can be clobbered since they are passed by value (edit: I can think you safely use any reg R1-R8 without preserving). Any other regs should be preserved or clobbered in case they are in use by the calling function.
-
Very good. That does reproduce it. I googled and other is some chatter about this error in the GCC mailing lists so it could be a compiler bug? I built using my last gcc13 build and it doesn't appear, so it should go away once we upgrade. I seems to be to do with the "r" constraint and for some reason reload doesn't feel able to alloc a new reg here. If I change it to general "g" it does compile but the output is useless: swpb @>2(r10) movb @>2(r10),@>8c02 swpb @>2(r10) ori @>2(r10),>4000 movb @>2(r10),@>8c02 andi @>2(r10),>3fff Another simple workaround would be to use r0 as a scratch which would also obviate the final andi. Cheers mate! I'll try to keep it up to date as we find out new stuff.
-
Yeah, 'fraid so. I'm glad you did find a workaround so probably best to just wait and see does a pattern emerge in time to help us reproduce it.
-
I have been taking a lot of notes the past few months on what I've learned on the GCC backend. I figured now is as good time as any to collate these into some kind of archive so I created a blog here https://tms9900-gcc.blogspot.com/ if anyone is interested. Comments and corrections welcome. Thanks!
-
I don't know tbh. The comment in gcc/reload1.c just says: /* If this was an ASM, make sure that all the reload insns we have generated are valid. If not, give an error and delete them. */ but without reproducing it, I can't see what "reload insns" it means. Do you need to declare the condition code clobber? I think that would be implied. Not sure if related though.
-
Built fine for me too ... once I had paths set up to xdt99 and so on. But it doesn't get far, unfortunately. It bombed almost immediately with a bad opcode after trying to execute BL *R12 when R12 contained >6000. I'm thinking the compiler had a function address stored in R12 but it got trampled by your trampoline or other inline assembly? I built with -fno-function-cse and it got further and printed some junk on the screen, but that was about it.
-
I see it. It looks like the fix I made to limit strings to 64 bytes has a bugged corner case. If the 63rd byte is non printable (\n in this case) then it closes the already-closed string and resets the count resulting in an extra closing single quote. It should check if in_text before closing the string when the length is reached. diff --git a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c index d7b9fb5..42a13e0 100644 --- a/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c +++ b/dev/gcc-4.4.0/gcc/config/tms9900/tms9900.c @@ -720,7 +720,7 @@ void tms9900_output_ascii(FILE* stream, const char* ptr, int len) if (ISPRINT(c)) { /* End TEXT statement */ - if (count==64) + if (in_text && count==64) { fprintf (stream, "'\n"); in_text = 0;
-
Hi @TheMole, I've built ghostbusters and am trying to run it on my own emulator, because, you know, I just like to be awkward. It's not running yet (I'm still trying to figure out the bank switching) But I've noticed at startup, it does this: 601C:0209 LI 9,>000A 6020:04D9 CLR *9 6022:0460 B @>7854 7854:064A DECT 10 7856:C68B MOV 11,*10 7858:0300 LIMI >0000 785C:02E0 LWPI >8300 7860:06A0 BL @>7758 7758:0201 LI 1,>BABE Which doesn't seem to have any adverse effects, but I'm not sure why it writes to >000A? I think the assembler is not doing what you intend with "LI r9, >ASM_ADDRESS " Also, it saves the stack reg in >7854 and calls >7758 (which I guess is detect_32k) but both of these calls are BEFORE the stack frame is initialised, so are actually trampling on random memory locations (R10 = >07FE at this point in my environment, which is harmless for now). It would be safer to init SP in cart_header.asm before calling _start. Like I said not having any adverse effects for now, but could in future.
-
Any wake up on a multi user OS is not going to happen at a precise time unfortunately unless you are running on an isolated CPU core. What I did was create a recurring timerfd and do a blocking read on the socket to synchronise. At least that averages out at the right rate. I didn't find any VDP timing issues but when it came to cassette load and save I had to jump through a few hoops.
-
I have merged the 1.30 branch to main. I found some issues in the lib1funcs implementations such as not saving some scratch regs. I'm still seeing some failures in corner cases, such as unoptimised arithmetic shift of a 32-bit value by a count of 16 or more, but I'm releasing anyway as these could take some time to work through. Release notes for 1.30 are: gcc patch 1.30 * Pass constants as wides to force_const_mem to avoid assert in combine.c:do_SUBST * Added calls to correct byte order on all byte and word arith and move * Changed inline debug to dump entire insn not just operands * Removed wrongly associative constraints on subtract * Added separate reg constraints to addhi3, andhi3, subhi3 to allow longer lengths for subreg offset fixes * Added more unit tests * Removed constraints in movqi - causes assert in reload * Added 32 bit shift operations * Marked R0 as fixed so allocator won't use it * Added reg saves in lib1funcs as some regs were being trampled
-
Yep, the VAX approach makes a lot more sense. It would have been nice if the TMS9900 designers had built-in a level of indirection into branch as "B Rx" is pointless. It would have messed up "B @LABEL" though so probably more consistent to have it as it is.
-
Ah, good point, yes I had forgotten it still needs a double indirection.
-
Yeah, I’ll merge that tomorrow. I’ll include the 32-bit shift lib funcs too.
-
Are you building with -O2 or -Os ? I think I might have changed that in the libti99 Makefile. I definitely saw calls to __ashlsi3 and friends in my unit tests too.
