Jump to content
  • entries
    41
  • comments
    373
  • views
    64,045

2600 Compression redux


batari

2,009 views

This is actually quite different than the first topic on 2600 compression.

 

Let's suppose we have some RAM available, and want to crunch a 4k bank of arbitrary 2600 code/data by as much as feasible, assuming the decompressor will live within some of the space we have freed. Therefore the decompressor should be simple, but compression can be as complicated as a modern computer can handle.

 

I have analyzed various 2600 binaries, and I'm showing the most common 5 bytes are typically A9, A5, 85, 02, and F0 (which would probably come as no surprise.) Also showing up a lot are 00, FF, 01 (more so on bankswitched games.) But in any case the most common 8 bytes typically comprise around 25% of the total size, sometimes up to 50%.

 

I have come up with a scheme based on Huffman coding. The program analyzes the binary, calculates the most common 8 bytes, and sorts them. The 8 bytes comprise the first 8 bytes of the image as as a lookup table, then the binary is encoded. If found, each of the 8 bytes are encoded in a bitsteam as 10, 110, 1110, 11110, 111110, 1111110, 11111110, and 11111111 respectively. The other 248 bytes are encoded as they appear but with a leading zero bit to distinguish them from the shorter codes. The decompression routine should be simple and short. This one certainly would be both, albeit a little slow computationally, but that's probably fine.

 

I tried it with bankswitched games first, and these are the results (per bank):

 

Asteroids: 2862, 4048

Gingerbread Man: 3810, 3817

Thrust: 3575, 3886, 3899, 4125(!)

Minigame Multicart: 2633, 2808, 3150, 3780, 2546, 3245, 3117, 3213

Medieval Mayhem: 4023, 3171, 2974, 3927, 3682, 3736, 3831, 3847

Juno First (beta 10): 3860, 3298, 3300, 3595, 3373, 3834, 2591, 2643

Strat-O-gems: 3528, 3421, 3972, 3970, 2963, 2953, 3939, 2713

(Some of you may know why I'm pleased with the 32k results, but I won't get into that now.)

 

4k games were less impressive:

Yars' Revenge: 3945

Space Invaders: 3964

Pitfall: 3896

Adventure: 3874

Gunfight: 3680

Swoops!: 4161(!!)

 

So the reason I've posted this was to open up a discussion as to how compression of 2600 binaries might be improved in terms of compressed code size, while balancing the size of the decompressor as it must be integrated into the compressed image.

18 Comments


Recommended Comments

Would it be possible to support the DEFLATE algorithm used to compress files in a ZIP archive? The algorithm is basically string matching (LZ77) combined with huffman coding, so you are halfway there already.

 

Chris

Link to comment

Looks like some of my code is really nicely optimized (try Jammed too). :)

 

For optimal Hufman encoding, you should make the compression more flexible. E.g. count all bytes and adjust the compression accordingly. Decompression of Huffman data is really simple. I have done that in Jammed. The central loop looks like that:

.loopValue:
asl	 shift			  ; 5
beq	 .loadValue		 ; 2/3
.contValue:
rol						; 2
inx						; 2
sec						; 2
sbc	 SubTab,x		   ; 4
bcs	 .loopValue		 ; 2/3

 

The biggest problem is the table with the ordered bytes. It takes 256 bytes, which is quite a lot compared to the possible compression ratio. Though maybe the table could be optoimized somehow too.

Link to comment
Would it be possible to support the DEFLATE algorithm used to compress files in a ZIP archive? The algorithm is basically string matching (LZ77) combined with huffman coding, so you are halfway there already.

I suppose that would be overkill for most files.

 

But maybe have a look at pucrunch which is an extremely compact [~260 bytes) implementation of what you are looking for.

 

Also have a look at xip, which is simplier but also even smaller.

Link to comment
Would it be possible to support the DEFLATE algorithm used to compress files in a ZIP archive? The algorithm is basically string matching (LZ77) combined with huffman coding, so you are halfway there already.

 

Chris

Thanks - I read that and got some ideas. LZ77 got me thinking that certain byte pairs might be common, e.g. 85 02. I'm going to analyze some binaries and see how common they really are. Maybe searching for unused bytes and using those as a code for common byte pairs would work - as every binary seems to have a few unused bytes.

 

I also read up more on Huffman coding, and found my codes are suboptimal. I tried various types and found that I can get Swoops! down to 3649 bytes using just Huffman codes (not bad considering zipping it results in a 3625 byte file, though some of that is due to zip's headers and the like.)

Link to comment
Looks like some of my code is really nicely optimized (try Jammed too). :)

 

For optimal Hufman encoding, you should make the compression more flexible. E.g. count all bytes and adjust the compression accordingly. Decompression of Huffman data is really simple. I have done that in Jammed. The central loop looks like that:

.loopValue:
asl	 shift			 ; 5
beq	 .loadValue		; 2/3
.contValue:
rol					; 2
inx					; 2
sec					; 2
sbc	 SubTab,x		  ; 4
bcs	 .loopValue		; 2/3

 

The biggest problem is the table with the ordered bytes. It takes 256 bytes, which is quite a lot compared to the possible compression ratio. Though maybe the table could be optoimized somehow too.

Just looked at xip, interestingly it's doing exactly what I tried earlier today (fixed-length Huffman-like coding.) I wonder if I can beat it by looking for byte pairs?

Link to comment
Looks like some of my code is really nicely optimized (try Jammed too). :)

My first attempt increases the size to 4354 bytes... Jammed indeed :D

 

However, using a xip-like method as I did above, I was able to beat .zip with this one - 4071 vs. 3899 bytes.

Link to comment
However, using a xip-like method as I did above, I was able to beat .zip with this one - 4071 vs. 3899 bytes.

Including decompression code size?

Link to comment
However, using a xip-like method as I did above, I was able to beat .zip with this one - 4071 vs. 3899 bytes.

Including decompression code size?

No, that hasn't been written yet :)

 

But presumably, a zip decompression routine would be much longer than xip. I haven't looked into the length of the xip decompressor, but I don't think it's very large.

Link to comment
But presumably, a zip decompression routine would be much longer than xip. I haven't looked into the length of the xip decompressor, but I don't think it's very large.

~60 bytes, IIRC.

Link to comment

Many 2600 games contain a fair amount of redundant data. In some cases, significant quantities of code or data are duplicated in multiple banks; in other cases, data may be duplicated to allow for sufficiently-rapid access. I would think something like the following approach would be pretty good: Divide the last 4K bank (the only one to be compressed) into records, each of which starts with a type/length header and the specified follow-on data:

 

0nnnnnnn -- 1 to 128 bytes of literal data (most common record type)

 

10nnnnnn -- 2 to 65 copies of a particular data value (first byte following the record header)

 

11nnnnnn -- 2 to 65 bytes copied from the address specified in the next two bytes. Note that the specified address could be in any bank.

 

The worst-case behavior of this scheme would be to expand the code by one byte per 128, or 32 bytes total. Any time a byte appears three or more times consecutively would cost 2-3 bytes. Cloning a sequence of up to 65 bytes would take 3-4 bytes.

 

Lempel-Ziv coding may also work well, but might take more work to decompress. Something like the modified run-length encoding described above should be helpful in many situations without being overly complex.

Link to comment
Lempel-Ziv coding may also work well, but might take more work to decompress. Something like the modified run-length encoding described above should be helpful in many situations without being overly complex.

Doing all together we would have something like LZW here, right?

 

I did an experiment yesterday with my games folder and compressed all 4K files with ZIP into single archives. On average(!) the IMO better games take more space, while prototypes, WIP and bad games take less space. Sizes were between 1.8 and 4.2k.

Link to comment
Many 2600 games contain a fair amount of redundant data. In some cases, significant quantities of code or data are duplicated in multiple banks; in other cases, data may be duplicated to allow for sufficiently-rapid access. I would think something like the following approach would be pretty good: Divide the last 4K bank (the only one to be compressed) into records, each of which starts with a type/length header and the specified follow-on data:

 

0nnnnnnn -- 1 to 128 bytes of literal data (most common record type)

 

10nnnnnn -- 2 to 65 copies of a particular data value (first byte following the record header)

 

11nnnnnn -- 2 to 65 bytes copied from the address specified in the next two bytes. Note that the specified address could be in any bank.

 

The worst-case behavior of this scheme would be to expand the code by one byte per 128, or 32 bytes total. Any time a byte appears three or more times consecutively would cost 2-3 bytes. Cloning a sequence of up to 65 bytes would take 3-4 bytes.

Then I think it would make more sense to do 4-67 for 10nnnnnn, and 5-68 for 11nnnnnn, as there would be no point in changing modes unless there was actual gain.

 

I like this idea so far, and unless my initial tests are incorrect, it's the most efficient of the bunch. I'll have to write an actual compressor and decompressor now to see if my compression test results were indeed accurate.

Link to comment
Then I think it would make more sense to do 4-67 for 10nnnnnn, and 5-68 for 11nnnnnn, as there would be no point in changing modes unless there was actual gain.

But you don't know what is coming next. If there is a mode change anyway, you'd loose one byte in compression. 3-66 and 4-67 seems best.

Link to comment
Then I think it would make more sense to do 4-67 for 10nnnnnn, and 5-68 for 11nnnnnn, as there would be no point in changing modes unless there was actual gain.

But you don't know what is coming next. If there is a mode change anyway, you'd loose one byte in compression. 3-66 and 4-67 seems best.

I think that changing modes will result in a very small gain in exchange for slower decompression. I'd think that 99% of the time, changing modes would simply break even space-wise.

 

I successfully implemented this and I've already beat all other schemes, and I didn't even do the repeating byte code - I found that 0 for literal and 1 for copy was all I needed.

 

EDIT: Tried 5-68 and 4-67, and 4-67 resulted in a savings of around 1% as I predicted, but I counted the number of mode switches and that number is unchanged on the 4 binaries I tried, which kind of surprised me. So 4-67 it is (actually, I'm using 4-131, as there are just two modes.)

Link to comment
I successfully implemented this and I've already beat all other schemes, and I didn't even do the repeating byte code - I found that 0 for literal and 1 for copy was all I needed.

 

Depending upon the exact implementation of the compressor/decompressor, you could represent a repeating byte sequence using four bytes: the byte value, then a 'copy' code for the number of repetitions minus one, then the address of the first byte.

Link to comment
Depending upon the exact implementation of the compressor/decompressor, you could represent a repeating byte sequence using four bytes: the byte value, then a 'copy' code for the number of repetitions minus one, then the address of the first byte.

Or you define a relative window for code repeating (e.g. up to -4k). Then you only need e.g. 12 bits for the address.

Link to comment
Depending upon the exact implementation of the compressor/decompressor, you could represent a repeating byte sequence using four bytes: the byte value, then a 'copy' code for the number of repetitions minus one, then the address of the first byte.

Or you define a relative window for code repeating (e.g. up to -4k). Then you only need e.g. 12 bits for the address.

 

Data may be repeated in any bank. Often at the same address within a bank but not always. A "short repeat" code may be helpful, but there are always tradeoffs.

Link to comment
Data may be repeated in any bank. Often at the same address within a bank but not always. A "short repeat" code may be helpful, but there are always tradeoffs.

Sure. But why reserving 16 bits for a 13 bit ROM? :)

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...