Jump to content
IGNORED

KickC Benchmark Tests (cf Mad Pascal) ... WIP


Recommended Posts

6 minutes ago, zbyti said:

In Mad Pascal 2D Arrays relies (under the hood) on multiplication - its only for clarification ;)

I assumed as much :) 

When I first wrote the QR benchmark, I was copying the 2D version (thinking it meant 2D in x/y plane), then realised it was for 2D matrix. I'd wrote it as a 1D with multiply, then saw there was a 1D version :D

 

Just adding a floating point library to allow me to write benchmarks that use Single in pascal.

 

Link to comment
Share on other sites

Last push from me tonight.

 

I've changed "while(){}" to "do{} while()" which has improved performance to same level as "for{}", but maybe considered cheating.

Still much slower than the MP FOR/WHILE benchmarks. Looking forward to trying to see what can be further improved there.

 

I've also changed the suite to use common setup, so the code size is down to 6.7k.

I'm going to investigate the xex format at some point and use custom sections, as there's still room to get that down by 300 bytes of zeroes.

 

  • Like 1
Link to comment
Share on other sites

I've add flames benchmark to the kickc side.

Thanks @zbyti for the help earlier, and for publishing your own version, that helped me track down the screen bugs I was getting (DMACTL!)

 

Here are the results. Just look at the flames score :D 

 

image.thumb.png.0bf77d62e6b957955b912253b9819c74.png

 

The difference in my version is I'm not moving the pointer:

 

for (uint8_t i: 0..255) {
	*(p0 + i) = (*(p0+30 + i) + *(p0+31 + i) + *(p0+32 + i) + *(p0+63 + i)) >> 2;
	*(p1 + i) = (*(p1+30 + i) + *(p1+31 + i) + *(p1+32 + i) + *(p1+63 + i)) >> 2;
	*(p2 + i) = (*(p2+30 + i) + *(p2+31 + i) + *(p2+32 + i) + *(p2+63 + i)) >> 2;
}

The first version I wrote moved each of the pointers, but that involved adding to a ZP value, and the score was 148, so only slightly faster than the MP version.

 

With this version, it just increments a register and uses indexed addressing, which greatly boosts the speed:

    ldx #0
  __b7:
    // /home/markf/dev/personal/atari/projects/benchmarks/./src/fire.c:48
    lda fireScreen-$1f+$1e,x
    clc
    adc fireScreen-$1f+$1f,x
    clc
    adc fireScreen-$1f+$20,x
    clc
    adc fireScreen-$1f+$3f,x
    lsr
    lsr
    sta fireScreen-$1f,x

XEX file is 7867 bytes.

suite.xex

Edited by fenrock
add xex
  • Like 1
Link to comment
Share on other sites

Final one for tonight.

 

Made some improvements on 1899 sieve. Thought I'd made 1028 better, but my memory is failing me.

 

image.png.574525682fe728aa21b7e2756c6f7f18.png

 

I did notice that in the sieve tests, the blanking out the initial array (with memcpy) is taking 76 frames of the score, which is quite a large proportion of the time to run.

 

suite.xex

Edited by fenrock
Link to comment
Share on other sites

hello tebe, do you also play with the kickc?

 

i have millfork and kickc to play with.
somehow tend to kickc.

it would be nice if there were more demos in kickc for the atari to learn more.

I don't know how the include atari-gtia.h "works with the command?
there is in the atari-gtia.h "char COLPF0;"
I can do this in my program with "GTIA-> COLPF0 = 0x28;" call.
how do these two commands play together?

 

greeting

Link to comment
Share on other sites

8 hours ago, tebe said:

how can you select the addressing mode in KickC ?

If you mean from the example code I pasted, I don't. The compiler seeks the best 6502 code fragment to fulfil the given piece of code, and it recognised a way of fitting those statements into lda V,x; adc W, x, etc.

@JesperGravgaard can perhaps explain it better (the compiler's optimizations are explained here)

Link to comment
Share on other sites

1 hour ago, funkheld said:

I don't know how the include atari-gtia.h "works with the command?
there is in the atari-gtia.h "char COLPF0;"
I can do this in my program with "GTIA-> COLPF0 = 0x28;" call.
how do these two commands play together?

 

This is all to do with C struct pointers

 

In atari-xl.h there are defined two values:

// Atari GTIA write registers
struct ATARI_GTIA_WRITE * const GTIA = 0xd000;

// Atari GTIA read registers
struct ATARI_GTIA_READ * const GTIA_READ = 0xd000;

These define struct pointers.

 

The first ("GTIA") is a pointer of the struct type "ATARI_GTIA_WRITE", but starting at location 0xd000 in memory.

 

Thus to access any of the struct's fields, you use standard pointer struct access.

GTIA->COLPF0 = 0x28;

kickc recognises that this is a member access to a struct located at 0xd000, so it adds the COLPF0 offset to 0xd000, to get 0xd016 and uses that as its value creating something similar to:

 

lda #$28
sta $d016

but if you inspect the asm generated, you'll see it's using labels and constants to calculate the location.

 

Link to comment
Share on other sites

13 minutes ago, funkheld said:

Hi good afternoon.


I found an atari-io.c with atari-iocb.h and atari-mem.h

the atari-io.c does not work.
does something have to be put in the atari-xl.h?

 

Thank you.
greeting

src-atari-io.zip 9.09 kB · 1 download

This was my original code used before I submit the atari-conio.h work to kickc!

 

The atari-iocb.h and atari-mem.h files are ones you can use locally in your development, as I haven't introduced those to kickc yet.

However, you may have some conflicts in atari-mem.h as I think I copied the ones I needed into kickc, so just remove them from atari-mem.h.

 

Eventually in the next couple of weeks I am going to be writing a bunch of IO libraries for the Atari for kickc and you will be able to load them from there instead.

 

Link to comment
Share on other sites

On the benchmarks themselves, I've pushed a new version that undoes the loop optimisation in sieve 1899 to make it more canonical.

I've replaced the memset with manual clearing of the table which has increased performance, showing that a very large proportion of time was spent not doing the sieve, but just setting up the table ? 

 

suite.thumb.png.691603d5517a540ae3a1f2c3536d0c00.png

 

I did write an asm only memory clear to further reduce the time spent clearing data rather than sieving, which knocked another 5 frames off the score, but didn't check that in.

@zbyti It might be better to make the test sieve a larger range once than doing the same one 10 times, or turn off the counter while the memory is being reset (easily done by setting the global on/off value).

 

Link to comment
Share on other sites

14 minutes ago, fenrock said:

 

@zbyti It might be better to make the test sieve a larger range once than doing the same one 10 times, or turn off the counter while the memory is being reset (easily done by setting the global on/off value).

Yes, but from now everything up to you. From that moment I only focusing on my 8-bit chess journey. Of course I'll try to follow you and @tebe on GitHub.

 

About sieve bigger then life ;) look at attachment.

bit_sieve.atl bit_sieve.xex

Edited by zbyti
atalan bit sieve
Link to comment
Share on other sites

6 minutes ago, zbyti said:

About sieve bigger then life ;) look at attachment.

bit_sieve.atl 907 B · 1 download   bit_sieve.xex 671 B · 0 downloads

I've seen those algorithms before, but didn't use them to keep it as canonical as possible.

e.g. https://en.wikipedia.org/wiki/Sieve_of_Atkin has plenty of ways of making this faster, but that wasn't the point of the test :)

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...