LLVM-MOS: Simple roguelikes

damosan · May 7, 2022

I've been playing with LLVM-MOS for a bit now and have created a collection of "almost done" roguelikes using the toolchain.

What's included: basic CIO code to read/write from disk, an assembly autorun.sys that copies the OS to RAM and then loads a binary and executes it, the start of a VBI based sound library, a small printf()/sprintf(), etc.

Personally I'd classify the whole thing as a hack job but there are bits that you may find useful for other things. Take the code...make it better...do what you want.

Impressions of LLVM-MOS: I like it. It's a modern compiler that generates some tight code. You still pay the penalty for using structures so global arrays are the way to go. You can use pragmas to align variables (such as lookup tables aligned to pages, fonts, P/M graphics) and while the compiler does an OK job of filling the gaps care should be taken. The C++ compiler has some issues so stick with C.

damosan314/roguelike: A collection of nearly done simple roguelikes for the Atari 8bit using the MOS-LLVM C compiler (github.com)

+Philsan · May 8, 2022

On the three ATR I created there are DAT files only + DOS.SYS.

ilmenit · May 9, 2022

On 5/7/2022 at 2:31 PM, damosan said:

I've been playing with LLVM-MOS for a bit now and have created a collection of "almost done" roguelikes using the toolchain.

Can you add a compiled outputs?

damosan · May 9, 2022

The batch file is a hack ... so you will need to edit the file to match your setup. But here are the three ATRs.

Once you're in the dungeon press '?' to get help. At some Yes/No prompts you can press 'C' to cheat.

RL2.ATR RL3.ATR RL4.ATR

ilmenit · May 12, 2022

How mature was the compiler for this project? Did you encounter any issues that required e.g. rewriting some portions of C code to have it compiled or moving to Asm?

Did you check quality of the generated code?

damosan · May 12, 2022

1 hour ago, ilmenit said:

How mature was the compiler for this project? Did you encounter any issues that required e.g. rewriting some portions of C code to have it compiled or moving to Asm?

Did you check quality of the generated code?

At first I tried to do this with the C++ compiler and it had issues. So I switched to C.

The C compiler is pretty darn good. I started by writing standard C code for everything without applying any of the standard CC65 code optimizations. The code ran pretty quick. Small test apps compile down very small. But once you start writing naïve C apps the binaries get large. "naïve" in this context is standard C e.g. passing structs, using nested structs, structs in general, generic linked lists, etc.

As RL2 started to actually take shape the binary was pushing right up against 0xbc00 which is a problem because the static locals area went beyond 0xbc00. So then I started rewriting parts of the game using CC65 style "easy" optimizations such as reducing the amount of structure references across the board, aligning small lookups to page boundaries, etc. In general the standard CC65 optimizations results in smaller code.

I compiled the following code snippet in a GR8 test app - with will optimization flags set:

void plot_savmsc( word x, word y ) {
    word address = ( y * 40 ) + ( x / 8 );
    byte mask    = 0x80;
    byte current_byte;

    current_byte = PEEK( SAVMSC + address );
    mask = 0x80 / ( x % 8 );
    POKE( SAVMSC + address, current_byte | mask );
}

word plot_savmsc_test( void ) {
    word x, y;

    lbzero( (byte *)SAVMSC, 8192 );
    print_at( 0, 23, "Naive plot" );
    clear_clock();

    for( y = 0; y < 192; y++ )
        for( x = 0; x < 320; x++ )
            plot_savmsc( x, y );

    return getJiffies();
}

The above code fills a graphics 8 screen. CC65 ran in 4587 jiffies while MOS did so in 2631. Another test leveraging lookups saw CC65 running in 2663 jiffies while MOS ran in 396 jiffies.

I have C code that leverages page zero that brings CC65 down to 664 jiffies while MOS runs in 358.

The compiler is pretty good.

You don't have complete control over the binary like you do with CC65. At first the game font was aligned to a 1k boundary via a clang attribute. This allows you to create font or player missile regions easily enough but it tends to create gaps in the binary. As you write additional code these gaps get filled. I decided I needed a custom loader for the game so I didn't have to rely on the compiler aligning large memory objects. So I took the ramrom.asm and modified it to act as a font / game loader.

I will say the size optimization is pretty good - single use functions are basically inlined to save subroutine pre/post code. I normally compiled with -Oz; when I compiled without this flag the raw binary was very large (over 50k). When I started generating map files to see what the compiler was doing I was shocked to find a three line routine took ~3k of ram...but the compiler was doing the right thing. Dead code removal is 100%.

The memory allocator is pretty good. It seems that each malloc() takes 2 bytes to keep track of things. I could be wrong here.

Clang has a rather...extensive...syntax on inline assembly. It's *far* easier to write assembly functions and just link them in.

Declaring variables VOLATILE is important.

My only real complaints are as follows:

1) How the compiler handles locals in a function. It basically creates a memory area named <function_name>_stk (or something like that) which wasted about ~450 bytes. I know software stacks are slow...but I'd like the option to have one. The heap starts at the end of the *_stk definitions.

2) The included supplied printf() is a full implementation so eats a ton of memory (like 4k). If your code references LONGs you will bring in the math libraries for 32+ bit integers. That's a ton of code. Sticking to bytes / words keeps things small. The a_printf() function in the game supports strings, bytes, words, and characters.

3) (Not really a complaint) But the standard library is very small. I had to create fopen()/fclose()/fread()/fwrite() using CIO calls. Not a big deal if you have some C and Atari experience. Atari programmers new to C may wish to stick with CC65 in the beginning.

4) Using extended memory will require some hacking compared to CC65 (once you figure out the linker file format CC65 makes this almost easy).

Edited May 12, 2022 by damosan

ilmenit · May 13, 2022

Oh, wow, thank you for that extensive answer! Definitely for my next Atari project in C I'm going to replace try it instead of CC65. Seems to be much better alternative to KickC which has reported big issues with compilation time due to optimizations (even minutes for making a single build).

damosan · May 13, 2022

7 minutes ago, ilmenit said:

Oh, wow, thank you for that extensive answer! Definitely for my next Atari project in C I'm going to replace try it instead of CC65. Seems to be much better alternative to KickC which has reported big issues with compilation time due to optimizations (even minutes for making a single build).

It's possible to write code that compiles with both compilers depending on what CC65 features you leverage (CC65's atari.h for example). For a time my source code would compile with either CC65 or MOS but I eventually settled on MOS.

dmsc · May 14, 2022

Hi!

On 5/13/2022 at 7:29 AM, ilmenit said:

Oh, wow, thank you for that extensive answer! Definitely for my next Atari project in C I'm going to replace try it instead of CC65. Seems to be much better alternative to KickC which has reported big issues with compilation time due to optimizations (even minutes for making a single build).

I recommend you also try VBCC ( http://www.compilers.de/vbcc.html ), in my experiments it produces faster code and it is very stable.

Have Fun!

mysterymath · May 16, 2022

Hey, thanks for taking LLVM-MOS through along on such a thorough jog! Your report has been excellent feedback; I've got a much better idea of things to work on next.

On 5/12/2022 at 5:14 PM, damosan said:

1) How the compiler handles locals in a function. It basically creates a memory area named <function_name>_stk (or something like that) which wasted about ~450 bytes. I know software stacks are slow...but I'd like the option to have one. The heap starts at the end of the *_stk definitions.

Regarding this one, the current approach of keeping all the _stk regions separate was more due to laziness/expediency on my part. I've had in mind an approach to allow these regions to overlap like dynamic stack frames would; once implemented, this should take exactly the same amount of space that the worst-case dynamic stack would. For example, trivially, all leaf functions (ignoring interrupts) could share the same static stack frame, since none could be active simultaneously. There's a lot more such relationships that could be gleaned by a detailed examination of the program's call graph. See https://github.com/llvm-mos/llvm-mos/issues/183.

On 5/12/2022 at 5:14 PM, damosan said:

At first I tried to do this with the C++ compiler and it had issues.

If you get the chance, would you mind filing a bug report against us with whatever you found? To the best of our knowledge, freestanding C++ should broadly work, sans exception handling. This may point to a gap in our automated testing, SDK, or docs, so I'd greatly appreciate any examples you could provide.

Edited May 16, 2022 by mysterymath
Add comment about C++ issues.

ilmenit · May 16, 2022

7 hours ago, mysterymath said:

Hey, thanks for taking LLVM-MOS through along on such a thorough jog!

Do you plan to support usage of extended (or cartridge) memory through banking?

danwinslow · May 16, 2022

9 hours ago, mysterymath said:

once implemented, this should take exactly the same amount of space that the worst-case dynamic stack would. For example, trivially, all leaf functions (ignoring interrupts) could share the same static stack frame, since none could be active simultaneously.

This might be an oversimplification, or (somewhat more likely) I don't understand what you mean, but...recursion and also threading come to mind. Recursion you could reasonably expect to see. Threading sounds unlikely but, for instance, I have written a pre-emptively threaded program on the Atari with 4 'threads' running off of an interrupt. In both cases, having exactly the same local stack area might be problematic.

Edited May 16, 2022 by danwinslow

mysterymath · May 17, 2022

17 hours ago, ilmenit said:

Do you plan to support usage of extended (or cartridge) memory through banking?

It depends on what you mean by support. It should be possible to make use of banks now via custom linker scripts. Bank switching isn't automatic, and you'd have to manually assign variables and functions to bank sections manually via __attribute__((section)).

15 hours ago, danwinslow said:

This might be an oversimplification, or (somewhat more likely) I don't understand what you mean, but...recursion and also threading come to mind. Recursion you could reasonably expect to see. Threading sounds unlikely but, for instance, I have written a pre-emptively threaded program on the Atari with 4 'threads' running off of an interrupt. In both cases, having exactly the same local stack area might be problematic.

We analyze the whole program's graph of what functions can call what functions, and then generally operate in a "conservative" fashion. Anything that the compiler can't prove is safe it doesn't do. So, if a region of the program might possibly be recursive, it'll use dynamic stacks.

For interrupt handling, the analysis does require that asynchronous entries that could overlap with main and/or themselves be annotated with an __attribute__. This generally isn't too bad on the 6502, since this is also how you get the interrupt calling convention. But for something like using a threading library, it might be too onerous. We do need to add a compiler flag to disable the whole thing; probably -fno-static-stacks. See https://github.com/llvm-mos/llvm-mos/issues/185 to track this.

ilmenit · May 17, 2022

7 minutes ago, mysterymath said:

It depends on what you mean by support. It should be possible to make use of banks now via custom linker scripts. Bank switching isn't automatic, and you'd have to manually assign variables and functions to bank sections manually via __attribute__((section)).

Perfect, I went through the https://llvm-mos.org/wiki/Linker_Script and it should do the work.

damosan · May 19, 2022

On 5/17/2022 at 4:18 AM, ilmenit said:

Perfect, I went through the https://llvm-mos.org/wiki/Linker_Script and it should do the work.

I would love to see an example of this in action i.e. main program + 1-4 banks. Ideally any data defined within that bank stays there vs. being lumped together with other vars. The bank switcher code would have to reside in main memory outside the window of course and require a stack to remember which bank it came from when making the switch to a new bank.

Wrathchild · May 19, 2022

5 minutes ago, damosan said:

and require a stack to remember which bank it came from

I did something like that here in CC65, getting the compiler to build tables for you would be quite an undertaking/achievement.

damosan · May 20, 2022

13 hours ago, Wrathchild said:

I did something like that here in CC65, getting the compiler to build tables for you would be quite an undertaking/achievement.

Yeah I've done it with CC65 - once you understand the linker config everything just falls into place.

LLVM-MOS: Simple roguelikes

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members