Jump to content
IGNORED

Any compressor between rle and lz4 ?


Recommended Posts

Hi!

 

Well I got curious, so I took my 9 deinterleaved raw SAPR channels and compressed them with LZSS v0.01. and the total was about 4KB vs 2KB for LZ4 best setting.

Doh! ... I've only-just realized that rensoup is actually talking about *music* compression, silly me!

Well, I tested the LZSS compressor with *very* small windows (8 bits for offset+length), for my test sample the best combination was 5 bytes for the match offset and 3 bytes for the length, but this makes the buffer too big (9*32 = 288, > 256).

 

So, I settled for match offsets and lengths of 4 bits.

 

Attached is the code and two samples: one song extracted from the RMT distribution: "4Tk35 by Caruso" (compressed to 30% of original), and the Archon loading screen (compressed to 12% of original).

 

The attached "lzss.c" reads a TYPE-R SAP file on standard input and writes a compressed binary to standard output, the ASM file includes this to assemble a working player.

 

Overall, the compression is just not great, as the small window size can't capture the repetitions through the song, and there is much waste - for example, channel 9 is POKEY register AUDCTL, this does not varies in the songs I tried, and channels 1,3,5 and 7 are AUDC*, the values hold there are limited.

playlzs.asm

sap-lzss.zip

playlzs1.xex

playlzs2.xex

  • Like 2
Link to comment
Share on other sites

Hi!

 

Ok. Noob question.

 

For the 65816 SNES rom depack what would you recommend?

I think that depends on the data size, available RAM and speed needed. But, just try LZ4 and see if your data is already compressible enough to make it worth.

Link to comment
Share on other sites

Hi!

 

 

I think that depends on the data size, available RAM and speed needed. But, just try LZ4 and see if your data is already compressible enough to make it worth.

Yeah lz4 is my standard depacker for all my demos but I thought there is already an 65816 version. Ok. Can run it in emu mode (SEP #$30 ;))

Link to comment
Share on other sites

 

Having said which ... if you only need to decompress 9-bytes of data per frame to feed the POKEY registers, then decompression speed really isn't your problem IMHO. You're going to be more concerned with managing ring-buffers.

 

Yes, I wasn't very clear on that, speed isn't essential in this case (it is for decrunching sprites). I managed to get my RLE stuff working and the code is horrible... way too much setup code. dmsc' solution is much cleaner.

Link to comment
Share on other sites

 

Did you implement optimal parsing? (or at least, lazy parsing with at least checking 4 bytes in advance)

 

Most papers say that you can get a lot of extra compression ration with that (and it is what LZ4HC compressor does).

Nope, not in my SWD encoding.

 

I used the official LZ4 compressor for testing ... I have no idea if that has lazy encoding.

 

Lazy encoding does sound like a good idea for *any* of the LZ-style compressors, and I've been meaning to try it out.

 

All right ... now I've got to "thank" you for destroying my productivity in the last few days and sending me back down the rabbit-hole of compression strategies! :mad: ;)

 

When it comes to the fascinating subject of the bragging-rights of the best optimal coding, it seems from articles like this ...

 

http://cbloomrants.blogspot.com/2012/09/09-04-12-lz4-optimal-parse.html

 

... that we're only talking about a less than a 0.5% improvment in the compression compared to the much simpler lazy encoding's 1.0% advantage over greedy encoding.

 

 

Anyway, I guess that I'll be going back and adding at-least lazy encoding to my compression testbed, just so that I can have another go at trying to beat aPlib.

  • Like 1
Link to comment
Share on other sites

 

So, this is equivalent to the data:

XX YY ZZ MM NN QQ RR SS TT XX YY ZZ MM NN QQ RR SS TT XX YY ZZ MM NN QQ RR SS TT XX YY ZZ MM UU QQ RR SS TT XX ....Hope above example makes it clearer.

 

Yes, very clear now, I thought you meant standard RLE.

 

 

 

Well, I tested the LZSS compressor with *very* small windows (8 bits for offset+length), for my test sample the best combination was 5 bytes for the match offset and 3 bytes for the length, but this makes the buffer too big (9*32 = 288, > 256).

 

GREAT! :thumbsup: :thumbsup: :thumbsup:

 

Well... almost.

 

I've tried it on a bunch of SAPs and they all fail to decrunch/play at some point... not sure if I've done anything wrong ? I've attached an example that failed. (I did "lzss <shadows.sap >test.lzs")

 

 

So I got my modified RLE to work and for a 35KB raw, RLE would get it down to 12.5KB, my modified RLE did 9KB, your LZS did 8.5KB. But your LZS did better on other files.

 

The decode time on the 6502 was similar, about 10 scanlines.

 

There are others things that I could do that might improve the compression quite a bit but I'm ok with what I got now. And if your LZS works I'll probably use that instead!

Your method is much cleaner, my setup code is a lot more messy and I have to expand the files to make them multiples of 9 when packing.

 

Sure a better compression ratio would be great but as it is, it makes it possible using some of those costly RMTs into actual productions.

 

 

PS:People have suggested delta encoding, I gave it a quick shot but ended up with results slightly worse.

shadows.zip

Edited by rensoup
  • Like 1
Link to comment
Share on other sites

I made one more small mod to my LZSS code earlier, to make the nibble-unpacking faster by 14-cycles-per-byte, mostly just so that it's a bit of practical LZSS code that people can look at, rather than it being something that'll help you in your (already solved) problem.

Sure a better compression ratio would be great but as it is, it makes it possible using some of those costly RMTs into actual productions.


Having something that works is always the most-import goal ... congratulations!!! :-D

  • Like 2
Link to comment
Share on other sites

Hi!

 

GREAT! :thumbsup: :thumbsup: :thumbsup:

 

Well... almost.

 

I've tried it on a bunch of SAPs and they all fail to decrunch/play at some point... not sure if I've done anything wrong ? I've attached an example that failed. (I did "lzss <shadows.sap >test.lzs")

Well, the posted code does not detect the end of the compressed stream, so it keeps playing random notes after the end of the song.

 

Also, if you are using Windows, I discovered that the C program does not work, because the redirections are in "text mode", as opposed to "binary mode". I only tried on windows, this is why I did not see the problem,

 

Attached is a new program with the changes:

- On Windows, the lzss.c sets binary mode on files.

- The playlzs.asm detects the end of the stream and properly terminates.

- Added a lot more statistics to the lzss program, now shows empty and fixed channels that should be left out of the player.

 

PS:People have suggested delta encoding, I gave it a quick shot but ended up with results slightly worse.

Delta encoding is good when you do Huffman or other type of entropy coding, as it reduces the set of values, but will not give you too much with the kind of data in SAP.

sap-lzss.v2.zip

playlzs.asm

  • Like 4
Link to comment
Share on other sites

Hi!

 

All right ... now I've got to "thank" you for destroying my productivity in the last few days and sending me back down the rabbit-hole of compression strategies! :mad: ;)

 

When it comes to the fascinating subject of the bragging-rights of the best optimal coding, it seems from articles like this ...

 

http://cbloomrants.blogspot.com/2012/09/09-04-12-lz4-optimal-parse.html

 

... that we're only talking about a less than a 0.5% improvment in the compression compared to the much simpler lazy encoding's 1.0% advantage over greedy encoding.

 

 

Anyway, I guess that I'll be going back and adding at-least lazy encoding to my compression testbed, just so that I can have another go at trying to beat aPlib.

Great articles, after seeing this http://cbloomrants.blogspot.com/2008/10/10-10-08-7_10.htmlI think will also implement optimal parsing to the LZSS coder ;)

  • Like 5
Link to comment
Share on other sites

Also, if you are using Windows, I discovered that the C program does not work, because the redirections are in "text mode", as opposed to "binary mode". I only tried on windows, this is why I did not see the problem,

 

Attached is a new program with the changes:

- On Windows, the lzss.c sets binary mode on files.

- The playlzs.asm detects the end of the stream and properly terminates.

- Added a lot more statistics to the lzss program, now shows empty and fixed channels that should be left out of the player.

 

 

Yes I'm under Windows, so your fix worked perfectly. The files are a bit bigger now but still slightly smaller than my RLE.

 

Thanks again! I'm still baffled that nobody has done this before :roll:

 

 

Guess I won't use my RLE :-D. Your code is much smaller, speed is the same, files can be a little smaller or quite a bit smaller, and there aren't any silly restrictions (I have to ORG the compressed data, and have to set manually the song duration in the code for looping. )

 

I'm just gonna go ahead and post it anyway so you can have a laugh :twisted:

 

If you look at some of the compressed channels files, the compression is outrageously bad. I could probably implement RLE on duplicate sequences and save a good amount of space but compared to your solution, it wouldn't be very elegant code.

sapRLE.zip

Link to comment
Share on other sites

Also, if you are using Windows, I discovered that the C program does not work, because the redirections are in "text mode", as opposed to "binary mode". I only tried on windows, this is why I did not see the problem,

 

 

 

actually, I'm still having an issue further in the song, it's still before the loop point I believe. Maybe another redirection issue ?

 

I'm attaching the obj and the converted file

lzsfail.zip

Link to comment
Share on other sites

Hi!

 

An idea: would it help to deinterleave the sap-r data and compress them indiviually with lzss? Your window will reach 9 times further back in the song and might capture more repetitions.

This is exactly what the code does, in fact the compressor shows the compressed size for each of the 9 streams:

 

 ./lzss < test1.sap > test1.lzs
WARNING: stream #8 is empty, should not be included in output!
Ratio: 37619 / 126792 = 29.67%
 Stream #0: 24939 bits,	22.13%,	 8.29% of output
 Stream #1: 29502 bits,	26.18%,	 9.80% of output
 Stream #2: 46260 bits,	41.05%,	15.37% of output
 Stream #3: 53586 bits,	47.55%,	17.81% of output
 Stream #4: 13140 bits,	11.66%,	 4.37% of output
 Stream #5: 68274 bits,	60.58%,	22.69% of output
 Stream #6: 36621 bits,	32.49%,	12.17% of output
 Stream #7: 21159 bits,	18.77%,	 7.03% of output
 Stream #8: 7470 bits,	 6.63%,	 2.48% of output
As you see, pokey register 8 (AUDCTL) does not change, so it compresses the most, followed by register 4 (AUDF3), probably does not changes a lot.

 

The small window size (16 bytes for each stream) is to limit the buffer needed at decompression time, as you don't want to keep the samples in memory.

  • Like 1
Link to comment
Share on other sites

Hi!

 

actually, I'm still having an issue further in the song, it's still before the loop point I believe. Maybe another redirection issue ?

 

I'm attaching the obj and the converted file

See:

$ ataricom music.obx 
ataricom 0.30-150320
(c) 2008-2014 Matthias Reichl <hias@horus.com>
block    1: 0582-0d4e (bytes:  1997, offset:      6)
block    2: 3ffa-4f61 (bytes:  3944, offset:   2007)
block    3: 1000-1052 (bytes:    83, offset:   5955)
block    4: 02e0-02e1 (bytes:     2, offset:   6042)
       RUN: 1000
block    5: 1053-2a4b (bytes:  6649, offset:   6048)
block    6: 0094-009e (bytes:    11, offset:  12701)
block    7: 2100-2160 (bytes:    97, offset:  12716)
block    8: 02e0-02e1 (bytes:     2, offset:  12817)
       RUN: 1000
Your XEX has many segments, and segment 7 ($2100 to $2160) is overwriting segment 5 (the compressed musing, at $1053 to $2A4B).
  • Like 1
Link to comment
Share on other sites

This is exactly what the code does, in fact the compressor shows the compressed size for each of the 9 streams:

Sorry, I wasn't aware of that.

 

 

 ./lzss < test1.sap > test1.lzs
.....

 

Could you post this test1.sap? I would like to have a look at it and perhaps have a go at it myself.

 

Regards,

Ivo

Link to comment
Share on other sites

Your XEX has many segments, and segment 7 ($2100 to $2160) is overwriting segment 5 (the compressed musing, at $1053 to $2A4B).

 

 

OOoops Sorry, I'd ORGd the tune at $1000 and forgot that your player is ORGd at $2000, so the player ended up playing itself!

 

Verified to work again, added looping, looping perfectly!

Link to comment
Share on other sites

Sorry, I wasn't aware of that.

 

 

Could you post this test1.sap? I would like to have a look at it and perhaps have a go at it myself.

 

Regards,

Ivo

 

For ./lzss < test1.sap > test1.lzs, it's shadows.zip which I posted above.

 

my sapRLE: +-8KB

dmsc: 6.6KB

 

Are we going to end up with 3 solutions :) ?

Link to comment
Share on other sites

For ./lzss < test1.sap > test1.lzs, it's shadows.zip which I posted above.

 

my sapRLE: +-8KB

dmsc: 6.6KB

 

Are we going to end up with 3 solutions :) ?

Now let shadows.zip be the one file I didn't download :)

I have called this day "a day" already, but I'll have a look at it tomorrow.

 

Edit: ok, couldn't resist. Looks like shadows.sap is not test1.sap

$ src2/lzss < shadows.sap > shadows.lzs
WARNING: stream #6 is empty, should not be included in output!
WARNING: stream #7 is empty, should not be included in output!
WARNING: stream #8 contains only $40, should not be included in output!
Ratio: 6623 / 42759 = 15.49%
 Stream #0: 8208 bits,	21.60%,	15.49% of output
 Stream #1: 13005 bits,	34.22%,	24.55% of output
 Stream #2: 4734 bits,	12.46%,	 8.93% of output
 Stream #3: 8388 bits,	22.07%,	15.83% of output
 Stream #4: 5481 bits,	14.42%,	10.34% of output
 Stream #5: 5580 bits,	14.68%,	10.53% of output
 Stream #6: 2529 bits,	 6.65%,	 4.77% of output
 Stream #7: 2529 bits,	 6.65%,	 4.77% of output
 Stream #8: 2529 bits,	 6.65%,	 4.77% of output

value	  POS	  LEN
 0	    0	 2197
 1	 2572	    0
 2	    0	  778
 3	    8	   62
 4	    6	  348
 5	    0	   61
 6	  374	  106
 7	    0	   78
 8	    2	   23
 9	   18	   13
10	    3	  224
11	    1	   39
12	  115	   37
13	    0	    9
14	   17	   19
15	   17	   18
16	  557	    1
17	    0	 1874
Edited by ivop
Link to comment
Share on other sites

Here's coding the delta's before lzss, but only on the distortion+volume streams:

./main < ../shadows.sap > foo
WARNING: stream #6 is empty, should not be included in output!
WARNING: stream #7 is empty, should not be included in output!
WARNING: stream #8 contains only $40, should not be included in output!
Ratio: 6538 / 42759 = 15.29%
 Stream #0: 8208 bits,	21.60%,	15.69% of output
 Stream #1: 11502 bits,	30.26%,	21.99% of output
 Stream #2: 4734 bits,	12.46%,	 9.05% of output
 Stream #3: 9792 bits,	25.76%,	18.72% of output
 Stream #4: 5481 bits,	14.42%,	10.48% of output
 Stream #5: 4995 bits,	13.14%,	 9.55% of output
 Stream #6: 2529 bits,	 6.65%,	 4.84% of output
 Stream #7: 2529 bits,	 6.65%,	 4.84% of output
 Stream #8: 2529 bits,	 6.65%,	 4.84% of output

Interesting that stream #3 actually increases in size (probably the drum channel), but stream #1 and #5 decrease in size, with an overall saving of an extra 85 bytes. But it would need an extra operation during decompression.

Edited by ivop
  • Like 1
Link to comment
Share on other sites

Hi!

 

Here's coding the delta's before lzss, but only on the distortion+volume streams:

./main < ../shadows.sap > foo
WARNING: stream #6 is empty, should not be included in output!
WARNING: stream #7 is empty, should not be included in output!
WARNING: stream #8 contains only $40, should not be included in output!
Ratio: 6538 / 42759 = 15.29%
 Stream #0: 8208 bits,	21.60%,	15.69% of output
 Stream #1: 11502 bits,	30.26%,	21.99% of output
 Stream #2: 4734 bits,	12.46%,	 9.05% of output
 Stream #3: 9792 bits,	25.76%,	18.72% of output
 Stream #4: 5481 bits,	14.42%,	10.48% of output
 Stream #5: 4995 bits,	13.14%,	 9.55% of output
 Stream #6: 2529 bits,	 6.65%,	 4.84% of output
 Stream #7: 2529 bits,	 6.65%,	 4.84% of output
 Stream #8: 2529 bits,	 6.65%,	 4.84% of output
Interesting that stream #3 actually increases in size (probably the drum channel), but stream #1 and #5 decrease in size, with an overall saving of an extra 85 bytes. But it would need an extra operation during decompression.

 

Perhaps you could simply flag which streams are delta-encoded, with one byte you could cover the eight main streams.

 

Meanwhile, I did a few modifications to the code:

 

- Implemented optimal parsing. This does not help with compression as the LZSS with only one byte is so simple that greedy parsing gives the same compression.

- Change the coding of matches to make decoding smaller (and faster).

- Adds a flag to skip streams that don't vary - only for streams 1 to 8, stream 0 is always coded. If no stream is skipped, the compressed file is one byte longer, but the ASM player is 13 bytes bigger.

 

Attached are the sources, player and Linux + Windows binaries:

 

Skipping channel #8, set with $40.
Skipping channel #7, set with $00.
Skipping channel #6, set with $00.
Ratio: 5675 / 42759 = 13.27%
 Stream #0: 8208 bits,	21.60%,	18.08% of output
 Stream #1: 13005 bits,	34.22%,	28.65% of output
 Stream #2: 4734 bits,	12.46%,	10.43% of output
 Stream #3: 8388 bits,	22.07%,	18.48% of output
 Stream #4: 5481 bits,	14.42%,	12.07% of output
 Stream #5: 5580 bits,	14.68%,	12.29% of output
Have Fun!

lzss-sap-20190524.zip

  • Like 4
Link to comment
Share on other sites

- Change the coding of matches to make decoding smaller (and faster).

- Adds a flag to skip streams that don't vary - only for streams 1 to 8, stream 0 is always coded. If no stream is skipped, the compressed file is one byte

 

Gave it a shot... the player is quite a bit faster indeed. Litterally 4-5 times faster than the original RMT sometimes!

 

Only issue is that I can't seem to get it to loop 100% right with the new version... not excluding that I've done something wrong of course...

 

zip contains just obxs for all versions for a quick perf compare.

shadowsobx.zip

  • Like 1
Link to comment
Share on other sites

Hi!

 

Gave it a shot... the player is quite a bit faster indeed. Litterally 4-5 times faster than the original RMT sometimes!

 

Only issue is that I can't seem to get it to loop 100% right with the new version... not excluding that I've done something wrong of course...

 

zip contains just obxs for all versions for a quick perf compare.

The problem is that now the payer has an initialization phase that reads from the song bytes, to select which streams are available and initialize the POKEY register with proper values.

 

So, on the second run, don't re-init the streams, as this info does not change during the song, but set the song_ptr to the position after the initialization bytes. So, one posible code is:

  jsr initPlayer
  sty saved_y

start_song:
  lda get_byte+1
  pha
  lda get_byte+2
  pha

play_song:
  ldy saved_y
  jsr advancePlayer
  sty saved_y

  ; ... do other things here ...

  ; Wait for new frame
  lda 20
delay
  cmp 20
  beq delay
  ; Compare with end and loop
  lda get_byte + 2
  cmp #>song_end
  bne play_song
  lda get_byte + 1
  cmp #<song_end
  bne play_song

  ; Now, reset the song pointer to loop the song
  pla
  sta get_byte+2
  pla
  sta get_byte+1
  jmp start_song
Now, the code for "initPlayer" is from current "start" in the ASM up to just before "sap_loop", and "advancePlayer" is from "sap_loop" up to just before "delay".

 

I have a question: is there a program to convert a standard SAP to a type-R SAP?

 

Have Fun!

  • Like 1
Link to comment
Share on other sites

Now, the code for "initPlayer" is from current "start" in the ASM up to just before "sap_loop", and "advancePlayer" is from "sap_loop" up to just before "delay".

 

I have a question: is there a program to convert a standard SAP to a type-R SAP?

 

About looping, I did -some- of those things you mentioned, I'll give it another try. Thanks!

 

 

I've been wondering about SAP->SAPR myself... Only found ASAP so far:

 

http://asap.sourceforge.net/

 

There's a command line tool included which I haven't tried yet. On Win, there's a player called WASAP included in the package. If you load a song and go to info, you can save a XEX and SAPR it from Altirra.

 

Best I found was to get an RMT and assemble it, then SAPR it in Altirra and KIL on song loop to get a perfect loop :roll:

Edited by rensoup
Link to comment
Share on other sites

 

Interesting that stream #3 actually increases in size (probably the drum channel), but stream #1 and #5 decrease in size, with an overall saving of an extra 85 bytes. But it would need an extra operation during decompression.

 

I tried on a 8KB RLE compressed song and all channels but one got bigger. The one that got smaller shrunk by 500 bytes! But I couldn't be bothered.

 

Again I'm thinking I'd get better results by modifying the RLE format but I'd be trying to catch up with dmsc so not much point.

 

Perhaps larger buffer sizes in dmsc's lzss would yield good gains too? It's pretty usable for any tune under 2 minutes right now.

 

Just curious about your Sid player, are you feeding pokey at a higher frequency than 50hz ? (I'm guessing the sprites shapes in your player reflect that)

Edited by rensoup
Link to comment
Share on other sites

Just curious about your Sid player, are you feeding pokey at a higher frequency than 50hz ? (I'm guessing the sprites shapes in your player reflect that)

 

Seeing you mention sprite shapes, I guess you mean AtariSid. That one bangs three Pokey registers at 15kHz :)

 

Sid2gumby on the other hand updates Pokey at 50Hz, but uses large tables to convert from sid frequencies to pokey values. Those could be replaced with some math routines, but that would be slower than a table lookup.

Edited by ivop
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...