Jump to content

rensoup

Members
  • Posts

    1,903
  • Joined

  • Last visited

  • Days Won

    5

Posts posted by rensoup

  1.  

    I've just got this idea, what about using Huffman coding where alphabet is build from 72-bit letters (9 bytes). If the entropy is low (and it is in SAP Type-R), this would yeld well over 9 times compression.

    Just looked at the real data - 89KiB SAP contains 790 symbols with this frequency:

    1,8,8,8,1,6,6,8,1,8,8,6,8,1,2,142,69,59,142,47,3,24,20,22,7,2,3,25,12,12,12,60,5,28,28,29,3,10,22,14,13,18,2,2,2,1,1,1,1,1,5,14,32,11,17,5,48,8,22,16,20,1,44,5,22,17,17,3,27,8,12,9,7,2,1,1,2,8,8,14,5,4,3,20,8,7,9,1,6,6,8,6,6,6,3,6,6,3,3,7,1,9,6,7,6,3,3,6,1,11,4,2,6,10,5,6,5,4,2,2,1,1,4,5,2,1,1,6,19,15,18,14,11,14,3,15,18,14,7,3,14,47,64,38,138,142,17,88,72,56,4,18,296,47,29,22,16,68,48,38,23,4,12,16,12,8,4,6,4,4,2,5,2,3,4,2,20,40,20,16,10,16,11,9,52,3,13,31,31,24,12,2,64,28,23,24,14,10,14,10,5,2,2,4,20,13,11,123,19,12,16,12,12,12,18,12,12,12,12,34,12,12,6,13,12,6,12,12,12,12,16,14,11,11,6,12,6,3,39,9,6,9,6,6,6,9,6,6,3,9,12,85,35,12,12,6,18,12,17,12,12,12,18,12,12,12,2,6,9,7,6,61,1,8,6,9,6,6,6,9,6,6,6,6,16,6,6,3,2,6,6,3,6,6,6,6,14,10,1,5,1,6,9,6,62,7,6,7,6,6,6,9,6,6,6,6,18,6,6,3,2,6,6,6,6,6,6,9,6,6,6,1,8,2,3,6,2,6,11,5,8,6,9,6,6,6,9,6,6,1,1,1,1,2,7,4,1,1,2,1,1,1,1,1,1,1,8,6,4,2,1,1,1,2,2,2,2,1,1,1,1,4,32,84,4,4,51,8,32,83,5,2,5,57,6,16,22,2,4,1,2,59,1,2,8,6,4,4,8,6,8,8,4,6,4,4,8,4,10,9,4,3,2,2,4,3,4,4,2,3,2,2,4,2,5,4,2,3,2,4,3,4,4,10,1,11,11,11,1,5,108,66,52,48,90,97,92,78,71,92,90,103,15,74,66,7,26,70,101,54,50,20,2,4,10,14,8,8,6,4,6,4,2,2,4,6,4,2,4,2,2,4,5,5,4,2,17,37,21,16,6,4,1,5,4,4,4,6,4,2,4,3,1,4,2,10,14,8,8,14,12,17,12,9,8,12,8,1,7,8,1,5,10,14,8,8,4,3,4,15,10,8,8,1,4,6,4,4,4,4,8,4,4,2,5,13,17,13,12,8,22,6,24,60,88,47,43,29,29,31,24,3,10,13,12,10,6,9,22,14,10,3,15,13,15,12,6,5,3,1,14,17,19,13,9,1,9,5,3,4,1,5,4,4,4,4,1,4,4,5,4,4,2,15,35,52,28,24,42,28,3,39,28,18,6,22,25,26,16,23,3,16,13,17,23,18,5,14,11,7,16,6,4,6,4,2,4,4,4,2,2,4,2,2,6,3,6,6,6,10,1,3,7,13,10,3,12,10,5,3,15,12,18,12,6,6,12,15,10,3,12,10,5,3,1,1,1,1,1,1,1,1,1,4,9,15,7,7,1,7,6,1,7,6,1,1,10,15,10,10,6,4,1,1,1,2,2,1,1,1,1,1,2,30,5,3,7,5,71,6,6,7,7,1,8,1,9,6,6,11,2,5,2,6,1,1,2,1,3,3,1,10,10,1,1,2,1,2,4,3,2,2,2,2,1,2,3,2,2,2,1,2,2,3,2,1,1,1,1,2,2,1,2,2,1,1,1,1,1,1

    a lot of singletons, unfortunately, but still looks promising.

    The only issue will be the large symbol table (do not remember the correct name).

     

    Another idea: split SAP-R to 9 streams, delta them (good especially for AUDCn) and huffman, reverse when playing. Will result in small symbol tables and quite large decompression routine :]

     

    Damn and I was hoping for an existing solution :twisted:

  2.  

    In my code above I made an optimization: I replace the "copy distance" field with a "copy address" field (= position - distance), so the copy can be done directly. This also allows (by modifying the compressor) to use *any* place in the target memory as a source, so you could stream frames using the read buffer and the target screen, so you can send only changed bytes.

     

    Ok that was confusing... you're saying that instead of of using the regular LZ4 encoder, I could use the LZ4 api to set the LZ4 dictionary to come from the previously compressed source data, instead of the decompressed destination ?

     

    something like:

     

    For every new chunk of 9 bytes to compress:

     

    -Set dictionary to be compressed source data, so the dictionary grows a bit more for every 9 byte chunk which in theory improves compression every time

     

    -Compress 9 bytes using new dictionary.

     

    ?

     

     

    LZ4 is exactly that, it stores offsets to old data instead of repeating it.

     

    Yes that bit I know, but these are offset in the decompressed data but from you're saying above I could change that to use offsets in the compressed source ?

  3. Did you benchmark LZO? It's supposed to be designed around decompression speed.

     

    Also, if the idea is just to blast this to the screen as you decompress, you could even add a special flag to the decompressor which says "Move to this address before continuing to decompress". Then you can just let the decompression loose on an entire run of data to do a frame update without having to write "same" data to parts of the screen.

     

    found this post: https://forums.nesdev.com/viewtopic.php?f=2&t=13671

     

     

    I started with the assumption LZO would be best of the no-RAM algos, and indeed it was. Its speed was a surprise however, on desktops it is extremely fast, here it's over twice as slow as zlib. Being GPL it is not usable for many closed projects, though commercial licenses are available.

     

     

    ouch...

  4. Pure LZSS might be what you're looking for. The compression ratio is worse than LZ4, but the coding for the decompressor is absolutely trivial.

     

    It is what a lot of games in the late 1980s and early 1990s used before programmers tweaked the details of the algorithm to get even better compression, but at some cost in speed.

     

     

     

    Thanks, I found a 65c816 decoder for it ( https://forums.nesdev.com/viewtopic.php?f=12&t=5771) but its author says it requires 4KB of mem... that unfortunately seems like a no go for me!

     

     

     

    It's a really simple concept, but, as you've found, you need to keep the decompressed data around in order to be referenced.

     

    This usually isn't a problem, but you do need to know that you've got to do that.

     

    Well that's why I was hoping for a modified RLE, because the non repeated bytes are already stored decompressed in the source. So if some sequences are repeated, you could just store an offset and length.

    I'm just surprised that nobody has done that.

     

    I actually gave it a shot yesterday... it was a lot more annoying than I thought but it seems to work. The gains vary from nothing to about 25% compared to regular RLE and I get the feeling there's more that can be done.

     

     

     

    LZ4 was optimized for speed of compression/decompression for use in realtime on 32-bit CPUs ... I've yet to find a case where it actually makes sense to use it on 8-bit or 16-bit computers/consoles.

     

    Something like Jorgen Ibsen's aPlib offers better compression, with only a little extra cost in speed ... and if you really need the speed, then pure LZSS is hard to beat.

     

    At 300-600 bytes/frame I would agree but dmsc's LZ4 decoder benchmark seems promising though.

  5.  

    In my code above I made an optimization: I replace the "copy distance" field with a "copy address" field (= position - distance), so the copy can be done directly. This also allows (by modifying the compressor) to use *any* place in the target memory as a source, so you could stream frames using the read buffer and the target screen, so you can send only changed bytes.

     

     

    Indeed, not what I want to do at all though... sorry for the confusion :arrow:

     

     

     

     

    Well, the LZ4 format codes two types of runs: literal runs and copy-over runs, so you can simply limit the window size to allow for streaming with any buffer size. Of course, the compression will be a lot worse with smaller windows.

     

     

    Yes unfortunately for SAP-R, the window has to be 128-256 bytes for any decent compression, 128 *9 *2 =~2.3KB just for buffers, you're starting to lose the benefit of the compression. I get the feeling that the code would be clunky too. A cheaper compressor might do just as well.

  6. are we talking about packing 32x4 bytes= 128 total? is there any chance to win here? as sprite data might be very pixelized. not sure if using compression helps here a lot? how many of those frames do you want to pack as stream? what is total size? I mean that's something we must know to find the "best" packer in your use case scenario.

     

    Ok, so I guess that was a little confusing. I mixed 2 use cases into one question.

     

    My first use case was simple:

     

    Like pirx said a decent streaming compression scheme would be perfect for SAP-R which exactly what I was trying to do!

     

    lz4 deinterleaved Pokey data compresses pretty well... so I modified XXL's lz4 decompression code to only output 9 bytes per frame only to realize that it didn't work because it was trying to copy data that I'd already discarded (yeah I know very little about compression :) )

     

    I also realized that it was taking about 10 scanlines to output 9 bytes!

     

    I had another use case in mind (compressing sprites) but at 300 bytes/frame, it just wouldn't cut it either. With the code posted before, I have hope for this one.

  7.  

    LZ4 is very fast

     

    gpl3.txt - 35147 bytes

    exomizer - 12382 bytes + depacker 1 page =~ 12.3 KB, decompress 128 frames (2.6 sec)

    deflate - 11559 bytes + depacker 2 pages =~ 11.8 KB, decompress 179 frames (3.6 sec)

    LZ4 - 15622 bytes + depacker <150 bytes =~ 15.3 KB, decompress 55 frames (1,1 sec)

    there is also a bootloader with lz4 streaming decompression

     

    well 15.3KB/ 55 =280 bytes per frame which is what I was seeing as well...

     

    Sure it's faster than exomizer/deflate but they're at the far end of the spectrum when it comes to speed.

     

     

    I was looking for something between lz4 & rle in terms of speed and compression ratio. Looks like I've got 2 possibilities with dmsc & irgendwer at least when it's ok for self referencing.

     

     

    btw I tried compressing data with your rle implementation that I found in MADS, it seems to perform a litte worse than this one sometimes (files can be 1-10% bigger):

     

    https://csdb.dk/release/?id=34685

  8. Hi!

     

     

    I don't understand this. LZ4 can be decompressed "almost" in place (with about 8 bytes of gap between compressed/decompressed data).

     

     

    With a "small" (124 bytes) implementation, I get a little less than 49 cycles per byte, this is 500 bytes/PAL frame on GR.0 (the slowest possible), about 650 bytes/PAL frame without DMA. This is my main "copy" loop, in ZP:

     

            org     $80
    docopy
    	jsr	getsrc
    dst = * + 1
    	sta 	output_addr
    	inc	dst
            bne     skip4
            inc	dst+1
    skip4:
    	dex
    	bne	docopy
            dey
    	bne	docopy
    	rts
    
    getsrc
    src = * + 1
    	lda     packed_data
    	inc	src
            beq     skip5
            rts
    skip5:	inc	src+1
    	rts
    
    Note that the "docopy" loop uses JSR to the read routine, and the pointer is incremented on each byte read/write, this is to minimize size.

     

    You can implement it faster if you use indexed addressing, the main copy loop should be like:

     

    docopy
    	lda     source_addr, X
    	sta 	output_addr, X
    	inx
            bne     docopy
            dey
    	bne	docopy
    	rts
    
    This would make 14 cycles/byte in the best case, so this is max about 1700 bytes/PAL frame. Certainly three times faster than my original code should be possible.

     

    Attached is a sample source code, and a program that compressed a XEX file using LZ4 (windows/linux versions).

     

     

    Sounds interesting as well, 1700 bytes best case is a lot better for lz4. :thumbsup:

  9.  

    Three years ago I've created "autogamy" and used it to compress the level data for "dye". (Also used in "E-Type".)

     

    Features:

    * fast and compact depacker (using self modifying code)

    * nearly as good as LZ4 - for small files often even better

    * in-place decompression if the source data ends at least 4 bytes behind the end of the target buffer

    * compression uses also self-referencing - so no streaming (windowing currently unsupported)

     

    Archive contains packer for Windows (exe) and source-code for de-packer cc65 header / ca65 source.

     

    You may like give it a try.

     

    Thanks! :thumbsup:

     

    I will give it a try for sure. Sounds like it would be a good fit for one of my use cases.

     

    I've got another use case which requires -no- self-ref so I will look into what's in MADS and possibly have to try my own RLE variant (not even sure it will compress much better than RLE though)

  10. download the MADS assembler and have a look at the various crunchers.

     

    me used exomizer, deflate, lz4 in the past but sticked now to c64 depacker called "B2" & pucrunch.

     

    I've started using MADS and indeed there's plenty of crunchers. I will take a closer look but perhaps some folks here have experience with them and can tell me if one of them fits the requirements?

     

    Can't find any ref to "B2" but pucrunch/exomizer aren't quite what I'm after... way too slow to decrunch.

  11. Hi!

     

     

    LZ4 should be as fast (or even faster) than RLE. Sure you are using an optimized implementation?

     

     

    Really ? well I tried Fox and XXL's version, it's optimized for size so probably not for speed. It looks like it can decrunch about 200-300 bytes max per frame at 50fps.

     

    The other problem with LZ4 for what I have in mind, is that it can't decrunch chunks as it uses previously decrunched data (while RLE can)

     

    I would still be interested to get a faster version if you have one.

     

    Thanks!

  12.  

    Nope, Altirra generally uses the system theme color settings including the window background color. Blame Microsoft for removing the UI in Windows 10 to configure theme colors and not updating the theme colors when dark mode is enabled in the system. I've thought about overriding this but it's a mess with dark mode not being properly exposed to Win32 and some system controls still using unconfigurable colors and theme images.

     

    Well I'm still stuck on W8 :-o I was asking for a dark background colour in debug windows not necessarily something that strictly follows windows conventions.

    I thought that'd be an easy change but I'm not a Windows expert. I've only ever used Winforms and it's easy with those.

     

     

    No, Altirra only the parses the address/opcode bytes and line numbers from the listing and uses the source field only for looking for special directives. It does not try to parse the assembly lines, which is not trivial (consider that semicolons can be embedded in string literals).

     

    :? maybe I phrased that wrong again. I'm already using source debugging with Alt-Sh-O and it works great. It just doesn't seem to default to using it and I have to use ALT-Sh-O, then point to the masm file to activate it.

    I was hoping it would do this automatically but that's not a big deal if it doesn't.

     

     

  13.  

    Debug > Options > Change Font.

     

    Damn I somehow missed that... Much better now :thumbsup:

     

    Might there be a color setting that I missed too (the white background is a little harsh) ?

     

     

    The emulator will load .lab and .lst files with the same name by default, but this won't work if you are loading the executable in code instead of letting the emulator do it. You can use a breakpoint to trigger the .loadsym command, or use .reload to reload the symbol file that has already been loaded.

     

    I'm loading from the command line and I wasn't paying attention again because it does indeed load the symbols... but it doesn't have all the comments that I wrote in the source (I still have to use Sh-Alt-O for that )... comments should be loaded from the lst file, right ?

     

     

     

    By the way, the current version of xBIOS (4.3) still does not implement burst I/O and is copying a byte at a time from the I/O buffer to the load address instead of loading sectors directly. On an actual 1050 floppy disk drive, this causes it to miss sectors and load at two-thirds speed compared to DOS 2.0S.

     

    That sounds like a dealbreaker...

    • Like 2
  14. I've recently started a bit dev on the A8 and I'm using Altirra for debugging and while it's working pretty well overall, there are a few things that could help:

     

    Could I make the debug fonts (much) bigger ? i tried the /portable switch which created an .ini file which has fonts settings but don't seem to do anything ?

     

    Is there a command switch to automatically load the .lst file because I'm using Alt-Shift-O every time ?

     

    Although I'm using an obj file right now, I'm planning on using xBios to be able to use the whole memory space as well as loading extra data. So that means an ATR file.

    Any way making the loading time instant ? not like warp speed where everything get speed up (although I guess I could use it if I could bind it to a key).

    Or perhaps there's a way to mount a directory as if it was an ATR?

     

    Thanks

  15. Hi Snicklin,

     

     

    1) I can't see why you couldn't use the buffer between loads for other purposes. If you're not using it, then what is the problem?

     

    Well the code could store some internal state in that buffer, maybe even just one byte. Although that's unlikely, I'd rather be absolutely sure.

     

     

    2) Alike to (1), if you're not using xBIOS, you could compress it when not in use, do something else with the area, then inflate it back to its original location before being used again.

     

    Same as 1) but that's a lot more likely. Let's say I create a disk with xBios, run it, then break into the debugger to dump the $0800-$0BFF area. Perhaps that area contains the current directory structure as well as internal states?

     

    I then add more files or update the files on disk, changing their filesizes. The initial load from the bootsector would be up to date but after decrunching the old xBios, I'd end up with old data and almost certainly a crash.

     

    Just speculating :)

  16. I'm planning to use xBios (seems like a great tool btw) and I'm wondering if I could get even more RAM back.

     

    the wiki says:

    $0700-$07FF        xBIOS I/O buffer$0800-$0BFF        xBIOS

    1) I'm guessing I can use the IO buffer safely between loads ?

     

    2) Assuming I've got a decompression routine already in memory that I use for other purposes, could I keep a compressed version of the $800-$c00 area ?

    I assume there is some data in there that is dependent on disk layout, and perhaps temporary variables so I can't just dump the memory and compress it ?

     

    Thanks.

  17. Hi,

     

    Just wanted to say this the most impressive 2600 game I've seen so far.

     

    It does have gameplay limitations but damn it doesn't look like a 2600 game, heck it looks better than the Atari 8bit conversion which was great (and the first game I ever got).

     

    Bravo :)

×
×
  • Create New...