Spaced Cowboy Posted November 16, 2022 Share Posted November 16, 2022 So with the board off to the PCB fabricators, and the desire to be able to use it when it arrives back, I started thinking about how I was going to program it. There's a whole bunch of stuff at Github, although it's all pretty preliminary. I'm just starting the thread here to keep track of where I am on the project, and because it's just finally started to actually produce something 'xtal' (pronounced "crystal") has 3 parts, the driver code which is the usual thing to interact with, the compiler (xtal-c) and the assembler (xtal-a). The reasons for the new language are in the README.md at GitHub, so I won't go over them here. As it stands, the "language" is just an expression parser, and it can't even do that properly yet, but I've been laying quite a bit of foundational work, and there is starting to be some light at the end of the tunnel. Right now, if I pass in the test code: Input expression 2 + 3 * 5 - 8 / 3 the compiler will spit out: ; Assembly code produced at 22:11:54 on Nov 15 2022 ; ; ; Assembly begins ; ---- .include lib/printReg.s move.4 #$2 r1 move.4 #$3 r2 move.4 #$5 r3 muls.4 r2 r3 r4 adds.4 r1 r4 r5 move.4 #$8 r6 move.4 #$3 r7 divs.4 r6 r7 r8 subs.4 r5 r8 r9 call printReg r9 ; ---- ; Assembly complete ... I haven't introduced types yet (there will be 's8', 'u8', 's16', 'u16', 's32', 'u32' and 'float' as well as compound variants for structures/classes) so everything is taking on the type 'signed 32-bit integer' or 's32' for short. That means we're calling muls.4 rather than mulu.4 There's no constant-elimination yet either, which is why it's not just a move.4 #xxx r1 That will then be fed into the 'xtal-a' assembler by 'xtal', and produce a binary, ready-to-run executable. The assembler doesn't know all of the mnemonics above yet (it currently only understands move and mulx) but if we limit the assembly to .org $600 .include stdmacros.s move.4 #$1122 r1 move.4 #$2233 r2 muls.4 r1 r2 r3 Then the assembler will produce: .org $600 lda #$0 sta $c6 sta $c7 lda #$11 sta $c5 lda #$22 sta $c4 lda #$0 sta $ca sta $cb lda #$22 sta $c9 lda #$33 sta $c8 lda $c4 eor $c8 php lda $c4 bpl check2_1 ; {25} sec lda #$0 sbc $c4 sta $c4 lda #$0 sbc $c5 sta $c5 lda #$0 sbc $c6 sta $c6 lda #$0 sbc $c7 sta $c7 check2_1: lda $c8 bpl doMul32_1 ; {25} sec lda #$0 sbc $c8 sta $c8 lda #$0 sbc $c9 sta $c9 lda #$0 sbc $ca sta $ca lda #$0 sbc $cb sta $cb doMul32_1: lda #$0 sta $cc sta $cd sta $ce sta $cf ldx #$20 loop_1: lsr $c7 ror $c6 ror $c5 ror $c4 bcc next_1 ; {25} clc lda $c8 adc $cc sta $cc lda $c9 adc $cd sta $cd lda $ca adc $ce sta $ce lda $cb adc $cf sta $cf next_1: asl $c8 rol $c9 rol $ca rol $cb dex bpl loop_1 ; {-46} plp bpl done_1 ; {25} sec lda #$0 sbc $cc sta $cc lda #$0 sbc $cd sta $cd lda #$0 sbc $ce sta $ce lda #$0 sbc $cf sta $cf done_1: , which, if I paste into the online assembler at masswerk, and then transfer to the online emulator, and run it, it will run to completion and end up with the memory at $CC..$CF (r3) showing as: ... and $1122 x $2233 is in fact $249EDC6 Next steps are to implement add, div, sub, and then finally printReg (as a first analogue to printf()) and the compiler will be able to compile arbitrary arithmetic into executable code. That's when the real fun begins There's a long way before this is going to be a useful language, but it just felt like a real milestone to see the macro-assembler actually doing the things that the compiler output will need. Ultimately the FPGA will be implementing the 32-bit ALU, so instead of that long stream of 6502 instructions, it'll be a series of LDA/STA to any registers that need updating, and a few LDA/STA to the command-fifo. However, I think it's a good idea for the compiler to work with a stock XE/XL as well, if for no other reason than I want to compare the results It also lets me work on it while waiting for the board to arrive back ... 2 Quote Link to comment Share on other sites More sharing options...
TGB1718 Posted November 16, 2022 Share Posted November 16, 2022 Apart from changing the label names for MAC/65, it assembles and run with the same result. Looks like a nice project, I look forward to seeing more functionality 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 19, 2022 Author Share Posted November 19, 2022 So a little progress. The test code is still the '2 + 3*5 - 8/3' and the compiler is still producing the same intermediate-level assembly code with the move.4, adds.4 etc. "opcodes", but now I've implemented the 32-bit signed add/subtract/divide functions, so the assembler can then produce the 6502 equivalent - those 10 pseudo-opcodes turn into 289 assembly instructions for the 6502, and when assembled and executed, I see (in memory) 00C0: 00 00 00 00 02 00 00 00 ;········ 00C8: 00 00 00 00 00 00 00 00 ;········ 00D0: 0F 00 00 00 11 00 00 00 ;········ 00D8: 02 00 00 00 03 00 00 00 ;········ 00E0: 02 00 00 00 0F 00 00 00 ;········ 00E8: 00 00 00 00 00 00 00 00 ;········ 00F0: 00 00 00 00 00 00 00 00 ;········ 00F8: 00 00 00 00 00 00 00 00 ;········ Given that there are 16 32-bit "registers" starting with r0 (at $C0..$C3) in zero-page through to r15 ($FC..$FF), I'm hoping to see 15 (2+3*5-8/3 as an integer) in r9, or the memory at $E4..$E7 (with $E4 being the low byte as is standard on the 6502), and ... there it is I tracked down the last bug in the 32-bit signed divide this evening, after adding much better (hierarchical) error handling, so now I can see stuff like: [Scanner.cc scan: 208] Unknown assembly token 'bogus' in macro _add32 at line 5 .. referenced from macro _mul32u at line 9 .. referenced from macro _mul32 at line 8 .. referenced from file Tests/test03 at line 4 .. even though all the macros and input files are recursively resolved and catenated together at the start. Next up is printreg (basically binary->ascii followed by a CIO call) and then I can go back to adding features to the high-level front-end rather than the lower-level assembler. I don't think my assembly implementation is going to win any prizes here - that expression takes ~6000 clock-cycles to execute (over 2/3 of that is the 32-bit divide, though) but that's not really the goal anyway - the goal is to have something to put up against when the FPGA does it and make sure that condition-flags and so on are preserved correctly, just as if the 6502 was doing things on its own, just a lot faster. I think there's a possibility for the 6502 to give the 68K a run for its money here. Assuming an 8 MHz 68k, division taking about 140 clocks, that's about 17 usecs. The 6502 will do it in 15 clocks (three LDA immediate, followed by STA to zero-page instructions) at 1.79MHz, or 8 usecs. Certainly not every operation will be 32-bit divide but muls can take 70 clocks on the 68K as well (so roughly at parity with the enhanced 6502) though the 68K will handily beat the 6502 when it comes to add/sub. That is, until the 6502 gets implemented @70MHz on the FPGA, anyway 1 Quote Link to comment Share on other sites More sharing options...
+mytek Posted November 19, 2022 Share Posted November 19, 2022 I can't even pretend to understand what you are creating, but I can say you are quite the genius Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 19, 2022 Author Share Posted November 19, 2022 [grin] I've been called a lot of things during the many times I have travelled around this star, but "genius" is a new one Laying some new flooring in the shed today, so nothing expected to change until that's in. Quote Link to comment Share on other sites More sharing options...
Alfred Posted November 19, 2022 Share Posted November 19, 2022 14 hours ago, Spaced Cowboy said: I think there's a possibility for the 6502 to give the 68K a run for its money here. Assuming an 8 MHz 68k, division taking about 140 clocks, that's about 17 usecs. The 6502 will do it in 15 clocks (three LDA immediate, followed by STA to zero-page instructions) at 1.79MHz, or 8 usecs. Certainly not every operation will be 32-bit divide but muls can take 70 clocks on the 68K as well (so roughly at parity with the enhanced 6502) though the 68K will handily beat the 6502 when it comes to add/sub. That is, until the 6502 gets implemented @70MHz on the FPGA, anyway I don't understand your math here. A 6502 will not do 32 bit division or multiplication in 15 clocks, we're talking hundreds even with the use of zero page. If you're saying the 6502 only has to prep the fpga with those few instructions, well then that's not really a 6502 vs a 68000, that's custom fpga vs generic 68000. I do like the idea of a lower level language that has features to make assembler coding easier. Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 20, 2022 Author Share Posted November 20, 2022 I think you do understand, you just disagree Let me re-phrase the above: "I think there's a possibility that the enhanced-with-an-FPGA 6502 can give the 68K a run for its money here". The context for this entire project is as support for the plug-in board (that's now shipped! ) so I thought that was a given, and the explanation of where those 15 clocks came from sort of gives it away as well IMHO. It's certainly not a slam-dunk for the 6502 - it could be, if I let the FPGA take over, but that's not the goal. I want to keep the "spirit" of the XL/XE and that means the 6502 is running the show, and the clock speed is a big hurdle. Atari themselves, back in the day, used an external ALU to help out the 6502 in some arcade games, so I think I'm on reasonably solid ground doing the same for the 8-bits. And I do want it to work (even if not to the same performance-level) on stock systems. To your last point, I plan to introduce an "asm" instruction at the high level which supports both standard 6502 mnemonics and these pseudo-opcodes as well. Sometimes the high-level language just doesn't cut it (like maybe in a DLI where every clock counts). Adding a simple way for variables to manifest as registers will keep the macro-assembler an active part of the high-level language. One other point is that all the assembly code is read in as source at compile-time. If there's a better way to do any of the code that's there, you can just replace part of the existing library and you have an improved assembler (and therefore an improved language too). [Still laying flooring, so not expecting to do anything on this tomorrow either] 1 Quote Link to comment Share on other sites More sharing options...
Alfred Posted November 20, 2022 Share Posted November 20, 2022 So what does the 6502 do while the fpga is doing it's thing, just wait or loop on some status register ? Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 20, 2022 Author Share Posted November 20, 2022 By the time the 6502 has fetched the next instruction (presumably LDx, the result is there waiting. 1.79MHz is glacial to an FPGA. Recall that the memory being used to store the data here is the on-chip on the FPGA, it’s just being mapped into the 6502 address space. Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 26, 2022 Author Share Posted November 26, 2022 Another minor milestone hit. The assembler now understands the 'call' statement and will: push arguments into the "function" registers when it sees 'call' handle any clobbered registers, saving them on entry and restoring them on return. These steps are elided if the command is 'exec' rather than 'call' which allows for a bit more flexibility if you know you don't need the save/restore. jsr to the requested routine So if the input file is still (here the contents of the file 'tests/01/test.01'): 2 + 3 * 5 - 8 / 3 And if the user issues the terminal command: prompt% xtal tests/01/test.01 -D org=16384 'xtal' (the driver code) will first run the compiler which will create: ; Assembly code produced at 18:14:34 on Nov 25 2022 ; ; ; Assembly begins ; ---- .include printReg.s .include stdmacros.s move.4 #$2 r1 move.4 #$3 r2 move.4 #$5 r3 muls.4 r2 r3 r4 adds.4 r1 r4 r5 move.4 #$8 r6 move.4 #$3 r7 divs.4 r6 r7 r8 subs.4 r5 r6 r9 call printReg r9 rts ; ---- ; Assembly complete Then run the assembler with the -D org=16384 to produce the default binary (since no output name was specified). Currently that is '/tmp/out.com', where /tmp is mapped to the PCL1: device set up by Atari800MacX. If I then go to the emulator window and run 'out.com' (no need to type the .com, of course), I see the below And 2 + 3*5 - 8/3 => 2 + 15 - 2 in integer arithmetic => 15, which is what is printed In this case, there's a call to "printReg", which internally has an 'exec' to "printLine" as the last statement, which is basically a JSR to CIO. Since it's the last statement, there's no point in saving/restoring the registers used by printLine (since they're a subset of those used by printReg) and we're about to restore the ones used by printReg anyway. 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 27, 2022 Author Share Posted November 27, 2022 One more update before things start to quieten down on the xtal front (I have something of a death-march ahead of me in the lead up to Xmas, being loaned out to another team to help them meet their deadlines) ... We now have statements (well, statement, anyway: "print" is what you get ) and variables with a defined type (s32, signed 32-bit ints). The test code now looks like s32 hotel; hotel= 9; s32 peacocks; peacocks= 12; print peacocks + hotel; which, when run through the compiler (the "-d" flag bumps the debugging level up by one, so you get to see the subcommands that the driver program will execute)... prompt% /opt/xtal/bin/xtal -d Tests/test.04 /opt/xtal/bin/xtal-c -d -o /var/folders/88/dgfn4_5s7976s72nnfx1y12w0000gn/T/5PN2qmWq.asm 'Tests/test.04' /opt/xtal/bin/xtal-a -D 'org=0x4000' -d -o '/tmp/out.com' /var/folders/88/dgfn4_5s7976s72nnfx1y12w0000gn/T/5PN2qmWq.asm (The assembler chooses a nice safe spot in memory to put the code unless you direct it otherwise. I don't want to recall just how long it took me to realise I was overwriting stuff I really oughtn't be, because my default code-start was originally $600, and the code was longer than 1 page... [sigh]) Anyway, the contents of the temporary assembly file produced by the compiler are: prompt% cat /var/folders/88/dgfn4_5s7976s72nnfx1y12w0000gn/T/5PN2qmWq.asm ; Assembly code produced at 21:54:59 on Nov 26 2022 ; ; ; Assembly begins ; --------------- .include printReg.s .include stdmacros.s move.4 #$9 r1 move.4 r1 S_hotel move.4 #$c r1 move.4 r1 S_peacocks move.4 S_peacocks r1 move.4 S_hotel r2 adds.4 r1 r2 r3 call printReg r3 rts @S_hotel: .word 0,0 @S_peacocks: .word 0,0 ; ---- ; Assembly ends ... with the @ before the label for 'hotel' and 'peacocks' indicating a global label. All variables are currently global... Register handling is a little primitive right now, and there's no peephole optimisations to remove redundant load/stores to the same register/location but fixing that is just implementing a filter-operation on the Abstract Syntax Tree that represents the program in symbolic form in the Compiler. Using the PCLink driver on the emulator, mapping /tmp to PCL1 as before, I can then run the 'OUT.COM' binary file produced and I see: ... and 9+12 does in fact equal 21 As well as introducing variables, and hence the symbol table, and statement recognition, the assembler now also has a base Emitter class which currently has a single subclass which handles a stock Atari 8-bit. When the FPGA comes online, I will only have to add another subclass of Emitter to define how the different hardware can resolve the code requirements to take advantage of the new hardware. Anyway, things will slow for a bit, but there is the vague outline of something resembling code there now 2 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 28, 2022 Author Share Posted November 28, 2022 Actually another update squeezed in before I get swamped. Now we have conditions in the high-level code, and in the same way as 'C' or 'C++', the result of a comparison is stored within a register, so it can be used as a test (next up is "if/then/else" ...) The simple test code for this is: s32 x; x= 1 < 2; print x; x= 1 <= 2; print x; x= 1 != 2; print x; x= 1 == 1; print x; x= 1 >= 1; print x; x= 1 <= 1; print x; x= 2 > 1; print x; x= 2 >= 1; print x; x= 2 != 1; print x; so all of these ought to evaluate to 'true', and x ought to be 1, which is a printable value. There's no need for brackets because the precedence rules mean that testing conditions has a higher precedence than assignment, which is just as well because I haven't got my scanner parsing brackets yet Compiling just the first one ("x = 1 < 2") produces assembly code that looks pretty awful in terms of register use [see below], but I'm expecting that to be cleaned up during optimisation at the end, when I'll run a register-colouring algorithm over the registers and optimise their allocation... move.4 #$1 r1 move.4 #$2 r2 .push context block cmp_5PN2qmWq 1 _cmp32 r1,r2 bmi there here: move.4 #0 r2 bpl done there: move.4 #1 r2 done: .pop context move.4 r2 S_x move.4 S_x r1 call printReg r1 The .push context / .pop context allow for labels to be associated with a given "context" in the assembler, so there's no danger of multiple instances of the assembly construct from getting confused over which 'here', 'there' or 'done' label to jump to. The assembly that the compiler produces can make use of any of the macros (like _cmp32) in stdmacros.s - I didn't include the header/footer boiler-plate this time... Running the compiler-driver over the test file at the top of this post gives me an executable 'out.com', and running that in the emulator gives me: ... as expected The next chunk (if/then/else) will be a bit more major surgery, because I'm going to need to modify the structure of the nodes in my Abstract Syntax Tree (AST). Currently each node can have only 2 child nodes, which makes it a binary tree, and fairly simple to construct and traverse, but an IF statement is effectively 3 paths of code: The expressions making up the criteria for the IF itself The statements to perform if the IF criteria evaluates true The statements to perform if the IF criteria evaluates false When we encounter an IF, we need to evaluate the first child then selectively choose which of the other two children to jump to and start executing. What this means is that the logic for how the AST nodes are chained together will need to be changed, which is fairly fundamental. The upside is that once it's done, things like WHILE and FOR loops ought to be a lot easier 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted November 30, 2022 Author Share Posted November 30, 2022 Ok, a little further down the path. Moving to 3-child AST nodes wasn't quite as difficult as I thought it was going to be, so we now have if/else. There's currently a few restrictions still on the parsing side, so you have to create a block-structure for each clause - you can't do: if (x > 4) print x; ... it has to be: if (x > 4) [ print x ] And I'm using [..] rather than {..} as block demarcation - we don't have '{' or '}' in ATASCII - and it seems easier to disambiguate array access x[..] than general brackets (..) especially since we're using general brackets for the condition part. At some point I'll look into removing the need for blocks when it's a simple single statement (you'll always need them for compound statements in a block), but that day is not today. as I mentioned last time, WHILE is actually pretty similar to an IF without an ELSE, except there's a jump back to the top of the if-block at the end. So we now also have while, along the lines of the below. Similar block-restrictions apply to WHILE as they do to IF. You have to use the block construct even if it's only one statement. s32 i; i=1; while (i <= 32) [ print i; i= i + 1; ] and since a FOR loop can be expressed as a WHILE with an initialiser block, we also have FOR loops, along the lines of the below, and again with the block construct restriction... s32 i; s32 j; j = 4; for (i=1; i<=10; i=i+1) [ if (i < j) [ print j; ] else [ print i; ] ] Wrapping it all up, we also have very primitive function support - the compiler will understand a functional form, and expects to find a function called main(). Currently functions can't take any arguments (still got to figure that out) or return any value - they're really just organisational structure right now. Still, it means a program now looks something like: void main() [ s32 i; for (i=1; i<=10; i=i+1) [ print i; ] ] That program, when compiled, produces (yes, it is compulsory to have a blue-text screen in every post! ) Next up is a real big one that I've been putting off - I want to start getting types sorted out. It would be cool to be able to specify s8 (or u8) and have the correct maths operations called for each type working with each other type. There's a lot of combinations there, so even once I have the parsing working, the back-end code for the assembler is going to take a while. I'll also have to figure out when it's acceptable to narrow or widen an arithmetic operation, and whether trying to store a 32-bit value into an 8-bit location ought to be an error or a warning. I'm leaning towards warning (which can be circumvented using a cast, later) but we'll see. 2 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 4, 2022 Author Share Posted December 4, 2022 A little bit more progress. I ran into a little bit of trouble with the testing for the new types - I don't have unary minus in my scanner yet which makes testing an assign statement with a negative number ... problematic 🤣 so I implemented unsigned 8-bit vars (unsigned char in 'C' parlance) and I'll wait until I update the scanner code (I have some other stuff I want to do first) before returning to complete the type-set. We do understand more than one type now, though, and that surfaced some issues with making sure that when you assign a 1-byte variable to a 4-byte "register", then later copy the 4-byte register to permanent variable storage, that the correct value was copied. I'm actually over-clearing at the moment, but I think it'll be easier to get a final correct version of the language then run through it with the peephole optimization stage / register colouring than try to fix it as I go, and in the words of Knuth "premature optimization is the root of all evil", so though I'm not wonderfully happy with the quality of the assembly it's producing, I have a plan for becoming happy That said, we can now do maths on both types, so something like: void main() [ s32 a; u8 b; s32 c; a = 1000; b = 5; c = a + b; if (c != 1005) [ print 0; ] else [ print 1; ] c = b + a; if (c != 1005) [ print 0; ] else [ print 1; ] ] ... will print '1' twice. I have a *lot* of tests like this The ones that test negative numbers use huge numbers to set the correct bit-patterns in the "registers" to be treated as 32-bit signed negative numbers, so they look a little odd. I'll probably re-write them when I get the unary minus sorted out. The other thing I've done is implement the first pass at functions. At the moment, we can pass a value to the function in the code, but it won't be picked up by the assembly, however you can return a value correctly, and function-call registers will be used in both directions, so you can write something like: u8 dummy() [ return (20); ] void main() [ s32 result; print 10; result = dummy(10); print result; print dummy(30)+10; ] ... and expect to see (here's the blue-screen!) ... The '10' is a simple print statement, the '20' is the result of calling into the dummy() function - and it doesn't matter what argument you pass in to it - and the '30' is the result of calling dummy() and adding 10. Quite a bit to go on the functional side yet, and quite a bit of clean-up to do around a few things, but I think the next task is pointers. I'm heading towards the ubiquitous "hello world!" application as a milestone for the language, and I'm going to need pointers and strings (char * pointers) to do that... Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 7, 2022 Author Share Posted December 7, 2022 Aaand now we have pointers. I didn't want to waste 4-bytes (s32) on a pointer-type, so I also implemented the u16 type, and a pointer is an internally-marked u16 datatype, which is kind of nice since it means all the standard arithmetic ops ought to work on it. You can now write something like: s32 main() [ u8 a; u8 *b; u8 c; s32 d; s32 *e; s32 f; u16 h; u16 *i; u16 j; a = 42; print a; b = &a; c = *b; print c; d = 420000; print d; e = &d; f = *e; print f; h = 50000; print h; i = &h; j = *i; print j; return (0); ] and expect to see the following: Onwards and upwards, I'm getting tired of typing out a type for every variable, so I might try the comma operator next... Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 8, 2022 Author Share Posted December 8, 2022 No blue-screen for this one, it's just a minor update to the above. Pointer arithmetic isn't supposed to be the same as integer arithmetic, so if you have something like: s32 main() [ s32 a,b; s32 *p; a = 42; b = 84; p = &a + 1; print p; ] ... and if a is laid out in memory immediately after b (spoiler: it is), then you expect '84' to be printed out, not some number corresponding to 4 bytes starting at the address of {a + 1 byte}. Since a and b are 32-bits long, p ought to advance by 4 bytes, not 1 when it's incremented. This will become critical later on when we start to get arrays. I also disallowed anything other than add/subtract on pointer types - dividing or multiplying a pointer could possibly be useful in some really weird situation, but it would complicate things and it's not worth it, IMHO. I also took the opportunity to generalise the automatic widening of the register-types (1->2 or 4, 2->4) as an operation in the abstract syntax tree (AST), so arithmetic can be performed more easily with mixed types, and I used the same code to do the 'scaling' of the offsets (with another AST operation). [aside: we also got the comma operator, as you can see from the above declaration of a and b. There's a bug with type management that's preventing me from declaring two pointers on the same line separated by commas, but it's on the list...] Quote Link to comment Share on other sites More sharing options...
TGB1718 Posted December 8, 2022 Share Posted December 8, 2022 I watch this thread with interest, nice to see the new functionality 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 9, 2022 Author Share Posted December 9, 2022 This next bit turned out to be quite a chunk of work. All the existing pointer examples have been using R-values, not L-values. The terms L-value and R-value come from the fact that an L-value is usually on the left-side of an equation, and an R-value is on the right, but the real meaning of that is more subtle... An L-value is persistent in memory, so it can be referred to in future operations. A value used in an R-value calculation *might* be persistent, or it might be ephemeral. In any event, it is unchanged by the current statement, and any references used during calculation can be discarded after the statement has been executed. We've been using pointers as R-values, as in: u8 x,z; u8 *y; x= 12; y= &x; z= *y; Notice that the pointer is on the right of any statement (declarations don't matter here). That meant a whole bunch of things to do with assignment and creation could be ignored. Well, back in the dim and distant past of 2 posts ago, anyway. Now to make pointers truly useful we want to do something like: s32 c,d; s32 *x; s32 *y; s32 main() [ c = 18; d = 32; x = &c; y = &d; *x = *y; print c; return (0); ] ... where we're using pointers as both L-values and R-values. This meant modifying the AST node so that it knew which type of value it was at any given node position, and so it could react accordingly. In the above, of course, we expect the value that x points to (c) to be set to the value that y points to (d), and then we print out c, so we expect to see '32' printed out... (I thought this one deserved a blue-screen ) What is kind of worth mentioning is that in general, the R-values can be in registers whereas L-values can't. However, in *this* language, a "register" is in fact just a known-address chunk of memory because A,X and Y is pretty limiting for a high-level language. Traditionally you can't have L-values implemented as registers because they don't have the permanence of being persisted in memory... For the standard XL/E, I think that still rings true - there's only going to be 16 32-bit "registers" available, which won't be sufficient for all but the smallest programs. For the expanded FPGA-aware XL/E, that isn't really the case, and it might be entirely possible to do away with L-Value/R-Value semantics. Something to ponder when I get around to the non-stock version of the 'Emitter' class. This semantics can be kind of useful though.. Using L-Value and R-Value annotations in the AST tree, it's now possible to parse s32 a,b,c; s32 main() [ a = b = c = 4; print a; return (0); ] .. because the AST tree can recognize (and if necessary self-reorganize to handle) the different types of semantic, so that the entire expression can be parsed. I won't show another blue-screen, but '4' is indeed printed in the above. Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 10, 2022 Author Share Posted December 10, 2022 Hip Hip Arrays! There's a couple of things that have been updated now - first, and least important, is that we have parentheses in the scanner logic, so something like: a= 5; b= 10; c= 15; d= 20; e= (a+b) * (c+d); print e; ... will work and print out 525. The more important part of this update is that we can now define arrays - only fixed length, and only one dimensional as yet, but given the program: s32 a; u8 b[25]; s32 main() [ u8 i; for (i=0; i<25; i=i+1) [ b[i] = i; ] a = 0; for (i=0; i<25; i=i+1) [ a = a + b[i]; ] print a; for (i=0; i<10; i=i+1) [ print b[i*2]; ] return(0); ] ... and see the result below ... I confess to being slightly nervous about parsing the [] in different circumstances, with different meanings at different times - but it turned out to be no problem in the end, and we can now parse the syntax, reserve the correct storage, and use arrays in both L-values and R-values. Using the AST node approach, we can ascribe an expression as the index to the array, so things like b[i*2] are all perfectly valid. There is also now a 'dump' available on the AST tree, because things are getting too complicated to keep the entire structure in my head. So for example the second loop above where 'a' is being set to zero and added to looks like the below in the abstract syntax tree: A_INTLIT 0 A_WIDEN A_IDENT a A_ASSIGN A_INTLIT 0 A_IDENT i A_ASSIGN A_WHILE, start L3 A_IDENT rval i A_INTLIT 25 A_LE A_IDENT rval a A_ADDR b A_IDENT rval i A_ADD A_DEREF rval A_WIDEN A_ADD A_IDENT a A_ASSIGN A_IDENT rval i A_INTLIT 1 A_ADD A_IDENT i A_ASSIGN where indentation indicates hierarchy. You can see the 'for' loop being decomposed into an 'initial condition + while' construct, and you can see the A_WIDEN node being inserted to convert the U8 type into an S32 before the addition occurs. You can also see the WIDEN happening in the 'a=0' statement (which isn't actually necessary, xtal initialises all globals to 0). Because '0' fits into the U8 range, the 'integer literal' statement assigns a U8 type to represent the '0', but the very next thing that happens is that we assign an S32 from that literal identifier, so we need to widen the 1-byte value to a 4-byte value. Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 10, 2022 Author Share Posted December 10, 2022 Hello world! So this feels like a major milestone, even though there's still quite a way to go. Now that we have pointers and arrays, we can start to think about strings. As far as xtal is concerned, a string is (similar to C) a series of bytes, terminated in a byte with value 0. Specifically, the variable referencing the data in memory is given a U8_PTR type, so it needs 2 bytes to store the reference. I've put in some code to interpret a few escape codes, but they don't actually do anything at the moment - right now I'm just wrapping a call to CIOV with setting up the device-0 CIO control block to point towards the string, and fill out its length etc. When we finally get to the point where printf() is realisable (which *cough* needs me to get arguments-to-functions working, as well as variadic functions) then I'll worry about that sort of thing. What it means, though is that you can enter a program like: s32 main() [ u8 *s; s = "Hello "; print s; s = "world!"; print s; return (0); ] (with two strings separated so I could make sure we weren't getting inadvertent newlines or any problems like that) and see something like: Next up I think I'm going to round out some of the missing operators (I still don't have unary minus!) and then go over the types, filling out the remaining ones (S8, S16, U32) and then I think function parameters are on the horizon... Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 12, 2022 Author Share Posted December 12, 2022 No new code demonstration this time, more of a "thoughts on how this is going to work" post The big next step is function arguments and since I want to make the language capable of recursion, on a stock XL/XE it'll have to use a stack, and spill registers and function parameters / local variables to the stack. The stack on the basic 6502 is pretty measly for a C program, 256 bytes with a hard stop might just about handle the Towers Of Hanoi, but it wouldn't handle anything much more complex. On the enhanced machine, I think I can probably take over page-1 as well as page-0, and keep the stack internally on the FPGA so that you can PHA as much as you like, and no matter what the 6502's SP, you'll get back "the right thing" when you PLA in turn, even if it wrapped around in the middle... So,... a stack. In high memory because there's no reason not to do that once we're outside of page-0, which leads to a consideration of efficiency. At the moment, the compiler just assumes all registers are 32-bits in size, so if you store an 8-bit value, it will just use byte-1 of a 4-byte array, and if you assign a U8 to an S32, it will call an AST WIDEN operation, which essentially back-fills the other bytes with zero. That's all well and good, and generally works fine (if being wasteful of precious page-zero RAM), but it means there's an implicit assumption that the alignment of a register is every 4 bytes. That means that copying an unsigned 8-bit integer value to the stack would be wasteful in terms of time (operations) as well as space. So, I've tagged where I am in the repository, and I've started looking at changing the register representation. We used to have: c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf [-----r0-----] [-----r1-----] [-----r2-----] [-----r3-----] d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dd dd de df [-----r4-----] [-----r5-----] [-----r6-----] [-----r7-----] ... etc (on to r15 which occupies $fd..$ff), with the first byte of each 4 in the "register" being the low byte, as is the 6502 style. That thinking permeated through the code, so there were a few assumptions written in with that in mind. However, if I want this stack to be more efficient (and I do, since it's in high-memory), I want not to be shackled to this idea of a 4-byte register. The 6502 is unique in the CPUs I use these days in that it doesn't have a memory quanta that must be observed for alignment purposes. Everything is just a byte, or a collection of bytes... So with that in mind, I thought I'd change my register-allocation code... Now instead of 15, 4-byte registers, which we use parts of, I have 64 1-byte registers, and when we want a 16-bit value, we allocate 2 contiguous registers. Similarly for 4-byte values, we allocate 4 contiguous 1-byte registers. The way all the code is written was always as {the address of byte-0 of this register + a 0->3 byte offset} when doing multi-byte operations, so the code for working with multiple-byte values would follow a paradigm like: ;/*************************************************************************\ ;|* Type: Arithmetic operation ;|* ;|* add the 16-bit value at location %1 to the 16-bit value at %2, storing ;|* the result in %3 ;|* ;|* Clobbers: A ;|* Arguments: ;|* %1 : address of source operand #1 ;|* %2 : address of source operand #2 ;|* %3 : address of destination operand ;\*************************************************************************/ .macro _add16u .if %1 != %2 clc lda %1 adc %2 sta %3 lda %1+1 adc %2+1 sta %3+1 .else _asl16 %1, %3 .endif .endmacro .. so only the value of the memory location pointing to the low-byte of a multi-byte value was going to be important - everything else is a relative offset from that. So assuming I still define all the register locations to the assembler (spoiler: I do) as memory locations in zero-page, then we now have a register layout that looks like: c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dd dd de df r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 (again, and on to r63 at $ff).... so when I pass in that first byte's memory location, it still all works, and I can have a 1-byte register at $c0 followed by a 4-byte register at $c1..$c4, followed by a 2-byte register at $c5..$c6 etc. The relatively large space, small number of variations (1,2,4 bytes), the fact that they're powers of each other, and that I always start a search for space at r0 means we oughtn't suffer too much from memory fragmentation for registers. And what this did mean was that I was compelled to provide a bit more information from the compiler to the assembler in terms of when registers were assigned and released, as well as tightening up a few of those 4-byte assumptions in the compiler. I'm at the point where programs that used to work are now working again, but I haven't done any exhaustive tests, so this is still on a branch, not in the mainline code yet. That work in tightening things up is what will play into making the stack more efficient. We genuinely now have registers that are (for example) only 2 bytes wide, we're still lacking anything like a function signature (something else to add to the list), but once we have that it ought to be genuinely possible to push only the bytes that are necessary onto this high-memory stack. In terms of what I'm trying to accomplish for stack usage, the requirements are: Be able to push arguments to the stack, when there are more than 16 bytes-worth of arguments to a function. We have enough space in zero-page for 16 bytes (and this was originally just 4 args, but now could be a lot more, given the new register layout) Be able to store local variables for a function, and easily recover the space later Be able to spill registers to the stack if we ever run out. You'd have to write a pretty large function to use up 64 bytes of register space, but I've seen some large functions... Be able to call functions recursively, storing current state on the stack so it can be easily unwound later. This is similar to the local-variables thing above. To this end, I think I'm going to need both a stack pointer (2 bytes in ZP) and a frame pointer (2 more bytes in ZP). The stack pointer is for when you are pushing things onto the stack, and calling functions, the frame pointer is how you quickly unwind the stack when you leave a function and get back to the previous state. So, the TODO list looks like: Verify the new register allocation works with all the test code to date [ - see below] Circle back around and define all the missing types and their interaction with each other [ - see below] Figure out my function-calling ABI Detect the function-signature and enforce it on function-call Design the stack and implement it [ - see below] Figure out local variables, and how to represent them in the compiler (currently leaning towards a hierarchy of symbol tables, but we'll see) [ - see below] Get all this stuff working The next target, after hitting "Hello, world!" is to get a printf() like function up and running - variadic functions need functions, which need all the above, so it's a distant target, but once I have that I can start to think about user-defined types/classes. Huh, thought this would be a short post (no code, after all) but it turned out to be one of the longer ones.. 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 14, 2022 Author Share Posted December 14, 2022 Many thanks to @dmsc for pointing out his text-mode simulator - still debugging register allocation, but now it's as easy as: @elysium xtal % xtal xtal-c/Tests/19-arrays/array.xt @elysium xtal % atarisim /tmp/out.com 748169516 0 2 4 6 8 10 12 14 16 18 (The '748169516' is what is currently going wrong ) And in fact, an hour or so after posting the above, using the simulator for a far faster turnaround, I figured out that the problem was in the A_WIDEN node - where a smaller-width register is 'upscaled' to a larger-width register so a math operation is easier. In this case, because I'd now made all the registers abut each other in memory, and they didn't have 4-bytes each, I ended up with a newly allocated register overlapping a previously-widened register because I hadn't updated the memory-use map when widening. Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 14, 2022 Author Share Posted December 14, 2022 So, I've moved over the code from the fixed-size registers to the sized-as-large-as-they-need-to-be registers, and created a test-suite based on the simulator from @dmsc. The branch is now merged into main, so it's the POR from now on. I actually found a couple of small bugs that had crept in while upgrades had been made subsequently to the initial testing of something - so the whole exercise was very useful in more than one way. Right now, there's a 'make test' in the 'tests' folder which produces output (as I type ) that looks like: So hopefully it will catch errors as they occur now, even if it's in code that was written ages ago 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 17, 2022 Author Share Posted December 17, 2022 So it's been a couple of days ... a combination of me being incredibly busy at work, and the types being a little more difficult to get working than I'd expected. There was some fairly obscure stuff working just because the only signed type was 'signed 32-bit int', and when I started inserting more signed types (S8, S16), a few bugs crawled out of the woodwork. Anyway we now have signed and unsigned variants of each of 8, 16, and 32-bit variables. The XL/XE only have 40 chars by default, so rather than "unsigned long x" in the program text, it's just "u32 x" and similarly for u16, u8 and the signed versions s8, s16, s32. The test suite has expanded quite a bit, and we now have tests for all the fundamental maths ops (add, sub, div, mul) within each type, and also mixed operations amongst the types, testing out how types are promoted. One gotcha that caught me out is that I am trying to be efficient when I come across an integer literal in the source code, so the compiler looks at the value and decides what storage class (8-, 16-, 32-bits) it ought to use for that variable. So if you have a series of small integer multiplications, you'll probably end up with S8 or U8 types holding the values, and then they get multiplied together, and then of course the total exceeds the storage capacity. I'm going to implement constant-folding at a later point in the compiler, so this won't be an issue (that sequence of (for example) "32 x 64" will be replaced by a single 2048, and all the items around will be variables (which can be promoted as necessary, its only sequences of integer literals that have this problem). At some point I'm also going to implement casting, so if there was some weird reason, you could always write "(u16)32 x (u16)64". Neither of these are actually implemented yet though, so the workaround at the moment is to replace at least one of the offending constants with a variable, which works for now. Anyway, working through the list above, progress is being made 1 Quote Link to comment Share on other sites More sharing options...
Spaced Cowboy Posted December 20, 2022 Author Share Posted December 20, 2022 So the first step has been taken. I thought the "allocation of variables on a high memory stack" might be an easy (hah!) introduction to the whole function-parameters thing, since that's what they're destined to become. So now you can create a program like: u32 a; s32 dummy() [ u32 a; a = 40; print a; return (0); ] s32 main() [ a = 1000; print a; dummy(a); return (0); ] ... and expect to see the output: elysium% atarisim /tmp/out.com 1000 40 The first thing that happens in 'dummy' is that the stack pointer is adjusted for the size of any local variables within that frame: ; [@dummy:] ; [_sub16i $4,$8D] ; [.push context macro '_sub16i' 1] 60d5 sec 60d6 lda $8d 60d8 sbc #$4 60da sta $8d 60dc lda $8e 60de sbc #$0 60e0 sta $8e ; [.pop context] ... where $8D,$8E is the 16-bit stack pointer (it's currently initialized to somewhere relatively safe in himem at $9FFF, but it'll be configurable eventually. The stack grows downwards, and code will grow upwards. [Hmm. Mental note: we won't be incrementing/decrementing $8e by anything other than 1, so maybe a "bcc/bcs +2" followed by "inc/dec $8e" might be slightly more efficient for adjusting the second byte if necessary] Since we have a u32 local variable, we need 4 bytes of stack space - and this is adjusted back once the function ends (by any means, since all function exits end up making a jump to the function-local label 'endFunc:')... ; [endFunc:] ; [_add16i $4,$8D] ; [.push context macro '_add16i' 1] 616c clc 616d lda $8d 616f adc #$4 6171 sta $8d 6173 lda $8e 6175 adc #$0 6177 sta $8e ; [.pop context] 6179 rts I'm pretty sure there are corner-cases that my tests don't cover just yet with all this - accessing memory to store data in is a fairly fundamental operation, and using a handmade stack-pointer (rather than having one built into the CPU) makes it a little trickier, especially with the pre/post inc/dec operations thrown into the mix as well as 3 different sizes of pointer. So I think I'll take a bit of time to write some more tests and flush out some bugs before going onto function parameters. I put the above program in as a test for now, which it passes Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.