Data Compression Test on Simple Bitmap 256x192

newcoleco · May 29, 2016

After 2 years studying data compression and testing various ways to achieve it, I've tried to make my own new format named "DAN1". This new format has "no extra memory usage" and "reasonable decompression speed" for 8bit systems.

The following table compares data size after various compressions of a 12K sample Graphic Mode II screen.

Exomizer v2 : 3337
DAN2 -m 11 : 3416
DAN1 : 3419
MegaLZ : 3524
aPLib : 3543
ZX7 : 3551
Pletter : 3557
BitBuster : 3595
PkK's RLE+Huffman : 3867
PuCrunch : 4000
DAN0 : 4143
Marcel DeKogel's RLE : 4528

In its core, DAN1 is a LZSS variant using 4 different fixed sizes for the offset values in pairs < length , offset >; in comparaison, ZX7 uses only 2 fixed sizes,and Exomizer has variable sizes for the offset depending on its rarety more than its value.

Edited January 6, 2017 by newcoleco

nitrofurano · May 30, 2016

After a few more hours of net surfing, I saw this gem named ZX7 from the zx spectrum scene. The decompression routine is 69 bytes long only, but it's a routine to decompression into RAM, not VRAM. Beside that, I think it's an incredible achievement.

With the test case "ColecoVision Strip Poker title screen", I've got this result.

BitBuster : 3595
Pletter : 3557
ZX7 : 3551

; -----------------------------------------------------------------------------
; ZX7 decoder by Einar Saukas, Antonio Villena & Metalbrain
; "Standard" version (69 bytes only)
; -----------------------------------------------------------------------------
; Parameters:
;   HL: source address (compressed data)
;   DE: destination address (decompressing)
; -----------------------------------------------------------------------------
dzx7_standard:
	    ld	  a, $80
dzx7s_copy_byte_loop:
	    ldi							 ; copy literal byte
dzx7s_main_loop:
	    call    dzx7s_next_bit
	    jr	  nc, dzx7s_copy_byte_loop ; next bit indicates either literal or sequence
; determine number of bits used for length (Elias gamma coding)
	    push    de
	    ld	  bc, 0
	    ld	  d, b
dzx7s_len_size_loop:
	    inc	 d
	    call    dzx7s_next_bit
	    jr	  nc, dzx7s_len_size_loop
; determine length
dzx7s_len_value_loop:
	    call    nc, dzx7s_next_bit
	    rl	  c
	    rl	  b
	    jr	  c, dzx7s_exit		   ; check end marker
	    dec	 d
	    jr	  nz, dzx7s_len_value_loop
	    inc	 bc					  ; adjust length
; determine offset
	    ld	  e, (hl)				 ; load offset flag (1 bit) + offset value (7 bits)
	    inc	 hl
	    defb    $cb, $33			    ; opcode for undocumented instruction "SLL E" aka "SLS E"
	    jr	  nc, dzx7s_offset_end    ; if offset flag is set, load 4 extra bits
	    ld	  d, $10				  ; bit marker to load 4 bits
dzx7s_rld_next_bit:
	    call    dzx7s_next_bit
	    rl	  d					   ; insert next bit into D
	    jr	  nc, dzx7s_rld_next_bit  ; repeat 4 times, until bit marker is out
	    inc	 d					   ; add 128 to DE
	    srl d   ; retrieve fourth bit from D
dzx7s_offset_end:
	    rr	  e					   ; insert fourth bit into E
; copy previous sequence
	    ex	  (sp), hl			    ; store source, restore destination
	    push    hl					  ; store destination
	    sbc	 hl, de				  ; HL = destination - offset - 1
	    pop	 de					  ; DE = destination
	    ldir
dzx7s_exit:
	    pop	 hl					  ; restore source address (compressed data)
	    jr	  nc, dzx7s_main_loop
dzx7s_next_bit:
	    add	 a, a				    ; check next bit
	    ret	 nz					  ; no more bits left?
	    ld	  a, (hl)				 ; load another group of 8 bits
	    inc	 hl
	    rla
	    ret

i guess that replacing the ldir with a kind of “ldirvm” routine, as from msx bios (it is actually a simple routine, you can find on my colecovision/sg1000 sources at Boriel’s ZX-Basic Compiler wiki, in the “released programs”), would do the task of decompressing to vram instead of ram

newcoleco · May 30, 2016

i guess that replacing the ldir with a kind of “ldirvm” routine, as from msx bios (it is actually a simple routine, you can find on my colecovision/sg1000 sources at Boriel’s ZX-Basic Compiler wiki, in the “released programs”), would do the task of decompressing to vram instead of ram

I don't know much about the MSX BIOS routines -AND- the ZX7 to RAM routine is pretty much using all register pairs, the accumulator, and flags for various tasks, that's why I can only say "Maybe".

I've made my own version ZX7 decompression directly into VRAM (see an earlier post for its source code).

Edited May 30, 2016 by newcoleco

nitrofurano · June 1, 2016

the routine sg1000ldirvm that i use on ColecoVision is this one: (ix register is used from Boriel’s ZX-Basic Compiler, it just works exactly like ldir )

sg1000ldirvm.bas:

sub sg1000ldirvm(tlen as uinteger, tvram as uinteger, tram as uinteger):
  asm
    ld b,(ix+5)
    ld c,(ix+4)
    ld d,(ix+7)
    ld e,(ix+6)
    ld h,(ix+9)
    ld l,(ix+

    ldirmv:
    di
    ld a,e
    out ($bf),a
    ld a,d
    or 040h ;????????
    out ($bf),a

    ldirmv_loop:
    ld a,(hl)
    out ($be),a
    inc hl
    dec bc
    ld a,b
    or c
    jr nz,ldirmv_loop
    ei

    end asm
  end sub

'-------------------------------------------------------------------------------
'- LDIRVM (msx)
'- Address  : #005C
'- Function : Block transfer to VRAM from memory
'- Input    : BC - blocklength
'-            DE - Start address of VRAM
'-            HL - Start address of memory
'- Registers: All

nitrofurano · June 1, 2016

I've made my own version ZX7 decompression directly into VRAM (see an earlier post for its source code).

this one? http://atariage.com/forums/topic/215500-data-compression-test-on-simple-bitmap-256x192/?p=3507010 - thanks!

einar · June 1, 2016

Hi,

My name is Einar Saukas, I'm the original author of ZX7. I'm glad to know people have been using it here, thanks for the interest!

However I'm afraid there's some misunderstanding I would like to correct:

Side note : The ZX7 compression tool written by the original author do a quick look and compress data based on the ZX7 specifications which I can explain if someone asks for it. However, this tool was written as a proof of concept but NOT as the best compression tool possible for this format to achieve even better compression results than what people are getting now. This situation is easy to understand if we compare with modern ZIP tools which are not optimized for the best compression possible at a cost of taking hours for it but good enough for a nice user experience achievable in seconds or very few minutes.

This information is wrong. The ZX7 compression tool is "optimal", which means it produces the best compression results possible for this format. The documentation file explicitly says it. If you read anywhere a comment that suggested otherwise, please let me know so I can post a clarification there too.

The compression tool runs very fast because it implements a faster algorithm I developed myself, but it doesn't sacrifice compression ratio to achieve this performance.

In fact, after studying a lot about compression during my quest of making my own new compression format, reading various scientific papers on the subject and also watching the amusing video series by Google titled Compressor Head, I'm happy to say that after all the headaches, I was able to make my own compression tool based on ZX7 specifications and get the best results out of it.

I'm fairly confident my ZX7 compressor is optimal, thus it should never be possible to produce an even smaller result for any file using the same format. However I recognize I'm not infallible, there's always a chance I made a mistake somewhere. Could you please provide one example where you believe you got a better result? If I really made a mistake, I would very much appreciate the chance to learn about it!

Starting from there, I was able to develop my own format and compare fairly with ZX7 format; the results show better compression results overall at a cost of a slower decompression runtime but still no need for extra RAM space like Exomizer.

Sounds interesting! I'm curious about your format also. Where can I find it?

When I designed ZX7, I was aiming for the best tradeoff between compression ratio on most typical cases, decompression speed and size, while restricting myself to "pure" LZ77/LZSS. I'm aware that different file formats (consequently different algorithm strategies) could potentially provide better results in one of these characteristics by sacrificing others, and it's always interesting to see different algorithms and the choices they made.

Edited June 1, 2016 by einar

newcoleco · June 1, 2016

Hi,

My name is Einar Saukas, I'm the original author of ZX7. I'm glad to know people have been using it here, thanks for the interest!

The ZX7 compression tool is "optimal", which means it produces the best compression results possible for this format. The documentation file explicitly says it. If you read anywhere a comment that suggested otherwise, please let me know so I can post a clarification there too.

The compression tool runs very fast because it implements a faster algorithm I developed myself, but it doesn't sacrifice compression ratio to achieve this performance.

I'm fairly confident my ZX7 compressor is optimal, thus it should never be possible to produce an even smaller result for any file using the same format. However I recognize I'm not infallible, there's always a chance I made a mistake somewhere. Could you please provide one example where you believe you got a better result? If I really made a mistake, I would very much appreciate the chance to learn about it!

Sounds interesting! I'm curious about your format also. Where can I find it?

When I designed ZX7, I was aiming for the best tradeoff between compression ratio on most typical cases, decompression speed and size, while restricting myself to "pure" LZ77/LZSS. I'm aware that best results can be obtained in one of these characteristics by sacrificing others, it's always interesting to see different algorithms and the choices they made.

Indeed, ZX7 is a tradeoff between compression ratio and decompression speed and size because it's based on reading bytes most of the time instead of individual bits. It looks like there was a misunderstanding from my part regarding your proof of concept not providing optimal results. When I tried to repeat the experience by rewriting in Java the ZX7 data compression written in C, my results were differents than with the optimal algorithm. Clearly, I made a mistake. One paper compared Elias gamma with Elias gamma on delta and so on and the conclusion is that Elias gamma with lazy LZSS gives best results; this paper confused me at first about the "lazy" LZSS, but I decided to not investigate further... maybe I should.

Journey to DAN1 format

My goal is to make graphics fit in my ColecoVision projects and the biggest graphics are bitmap screens made of two 6K tables for patterns and colors for a total of 12K, charsets are also quite big and can certainly benefit from good compression.

In LZSS compression, offset values are usually the biggest part of the data and just by changing to more suitable sizes for offsets it gets to a better compression at the cost of speed. For example, all Graphic Mode II bitmap screens I've tested, going from a simple logo to pixel art (ZX7, MSX1) to heavily dithered pictures do fit into these two categories : simple and elaborated; 4 bits and 9 bits offsets are best for simple bitmaps (simple logo) and 4 bits and 11 or 12 bits for elaborated bitmaps (vast majority of ColecoVision Bitmap title screens). Since the biggest challenge for my CV projects is to make the graphics fit, I had fun adapting ZX7 to uses these 4,9 and 4,11 offsets to get better compression but still couldn't get close to Exomizer results. I also started to write a paper in PDF about these results, even wrote a presentation last year for the ADAMCon, but I decided to keep going in my R&D before fixing my new format. This year, I've made the discovery of 4 fixed sizes for offset values is the best fit overall (for graphics and text compression), providing compression results closer to Exomizer without the need for extra memory. And since the waiting time before seeing a cool title screen doesn't really matter, I decided to adopt the 4 sizes to encode the offsets.

Table 1 - Size of compressed data on a 12K bitmap sample obtained by using one, two, three, four, five or six sizes to encode the offset values.

ONE : 11 bits = 2961
ZX7 : 7 and 11 bits = 2901
TWO : 8 and 11 bits = 2841
THREE : 4,8 and 11 bits = 2814
FOUR : 1,4,8 and 11 bits = 2798
FIVE : 1,3,5,8 and 11 bits = 2804
SIX : 1,2,3,5,8 and 11 bits = 2823

Based on these results, no more than 4 different sizes for offset values is best.

According to a paper about variants of gamma encoding in LZSS optimization, as mentionned earlier, normal Elias gamma with lazy LZSS is best. But since I don't fully understand what this paper means by "lazy" I just go with what I know.

I started to encode my new format DAN1 using LZSS variant with Elias gamma and 4 different sizes to encode offset values. Because I knew some people would want to use this format and not just on simple title bitmap screens, DAN1 also avoid adding constantly bits for each literal byte in a long sequence of bytes (see RLE compression).

Table 2 - Size of different routines to decompress to VRAM used in various ColecoVision projects :

RLE = 43
DAN0 = 97 ( RLE with literals encoded into a fixed Huffman )
ZX7 = 131 ( LZSS with 7 and 11 bits for Offsets )
DAN1 = 220* ( LZSS with 1, 4, 8 and 12 bits for Offsets and RLE for uncompressed literals )
PLETTER = 222

Table 3 - Size of compressed data of a 12K Bitmap sample

RLE = 4528
DAN0 = 4143
ZX7 = 3551
DAN1 = 3419*
PLETTER = 3557

Table 4 - Size of both decompression routine and compressed data

RLE = 4571
DAN0 = 4240
ZX7 = 3682
DAN1 = 3639*
PLETTER = 3779

DAN1 results are marked with * because it's still in development, not yet used in a ColecoVision project.

I finish my tests and then present my results during ADAMCon 2016, July 14-18.

Edited June 1, 2016 by newcoleco

nitrofurano · June 1, 2016

my first attempt, not working... :S - https://www.dropbox.com/s/khpkdp5g03jddg9/colecovision_dzx7_borielzxbasiccompiler_notworking_20160601.zip

i really have no idea what is happening, or what is wrong and where - i tried both version from @newcoleco (i tried to convert that into a Boriel’s ZX-Basic Compiler library routine) , as i also tried to replace the ldir from the @einar ’s routine with a colecovision/sg1000 version of “call ldirvm” - both doesn’t work as i expected - i used this picture for this test:

Edited June 1, 2016 by nitrofurano

newcoleco · June 2, 2016

my first attempt, not working... :S - https://www.dropbox.com/s/khpkdp5g03jddg9/colecovision_dzx7_borielzxbasiccompiler_notworking_20160601.zip

i really have no idea what is happening, or what is wrong and where - i tried both version from @newcoleco (i tried to convert that into a Boriel’s ZX-Basic Compiler library routine) , as i also tried to replace the ldir from the @einar ’s routine with a colecovision/sg1000 version of “call ldirvm” - both doesn’t work as i expected - i used this picture for this test:

I've looked at your ROM file very quickly and here is what I saw

Inside dzx7.rom

960A: CD 57 96   CALL Something
    : .. .. ..   <- Missing LD HL, SOURCE ADDR IN ROM
    : ..         <- Missing PUSH HL
960D: 21 00 00   LD HL, $0000 = DESTINATION IN VRAM
9610: E5         PUSH HL
9611: 21 2C 80   LD HL, $802C = ? is it the data source ?
9614: CD 53 97   CALL DZX7

DZX7: C1         POP BC
    : D1         POP DE <- Destination Addr. $0000 = GOOD
    : E1         POP HL <- Source Addr. $0000 = BAD, Invalid since source address was never pushed?
    : E5         PUSH HL
    : D5         PUSH DE
    : C5         PUSH BC

If you want to keep the way you call the dzx7 function by not pushing the source address but instead having it already in the HL register pair, remove the unnecessary POP HL and PUSH HL opcodes from the DZX7 routine and it should work.

Give it a try and have fun!

Edited June 2, 2016 by newcoleco

nitrofurano · June 2, 2016

@newcoleco thanks! it’s working now!

https://www.dropbox.com/s/8juj36b271725j6/colecovision_dzx7_borielzxbasiccompiler_working_20160602.zip

(now it’s a slideshow with 5 pictures for testing)

i just needed to replace

pop bc
pop de
pop hl
push hl
push de
push bc

with

pop bc                        ;- RET address
pop de                        ;- DE=dst
push bc                       ;- restore RET address

it’s really amazing! thanks again

(and thanks @einar as well, of course! it’s a really great compression method! 60kb of pictures into less than 29kb)

Edited June 2, 2016 by nitrofurano

einar · June 2, 2016

It looks like there was a misunderstanding from my part regarding your proof of concept not providing optimal results. When I tried to repeat the experience by rewriting in Java the ZX7 data compression written in C, my results were differents than with the optimal algorithm. Clearly, I made a mistake.

No problem! Thanks for the confirmation, I'm glad my ZX7 compressor is not flawed

One paper compared Elias gamma with Elias gamma on delta and so on and the conclusion is that Elias gamma with lazy LZSS gives best results; this paper confused me at first about the "lazy" LZSS, but I decided to not investigate further... maybe I should.

Elias Gamma and Elias Delta are very similar. The difference is, Elias Gamma provides slightly shorter encodings for small numbers, and Elias Delta provides slightly shorter encodings for large numbers. Therefore LZSS compression works better with Elias Delta when expecting a higher percentage of very long matches (such as large files with lots of repetitions using a large sliding compression window), and Elias Gamma works better otherwise. There's a good image in Wikipedia to illustrate this idea:

https://commons.wikimedia.org/wiki/File:Fibonacci,_Elias_Gamma,_and_Elias_Delta_encoding_schemes.GIF

I'm not sure the paper conclusion you mentioned makes much sense, since this is not supposed to be much related to the matching algorithm used. It depends almost exclusively on the characteristics of the data to be compressed.

For 8-bit machines, that typically handles small data blocks (only a few Kb or less), Elias Gamma is certainly the best choice.

This year, I've made the discovery of 4 fixed sizes for offset values is the best fit overall (for graphics and text compression), providing compression results closer to Exomizer without the need for extra memory. And since the waiting time before seeing a cool title screen doesn't really matter, I decided to adopt the 4 sizes to encode the offsets.

It seems you are focused on compressing text and full screen graphics only. There's obviously nothing wrong about producing a compressor specialized in certain data types. A specialized compressor would be very useful too. However, if you intend to produce a generic compressor, you better make sure your samples include other kinds of data too. Otherwise you will probably make choices that will benefit compression of certain data types only, but sacrifice others.

According to a paper about variants of gamma encoding in LZSS optimization, as mentionned earlier, normal Elias gamma with lazy LZSS is best. But since I don't fully understand what this paper means by "lazy" I just go with what I know.

In match-based compression like LZSS, "greedy matching" (also known as "naive matching") means: at each step, just look for the next longest match and take it. In comparison, "lazy matching" means: at each step, look at next pairs of matches and take the next match from the longest pair. A more detailed explanation is provided here:

http://fastcompression.blogspot.com/2010/12/parsing-level-1-lazy-matching.html

Everybody assumes it would be too "time consuming" to analyze all possible combinations of matches, in order to achieve the best possible compression. For this reason, existing algorithms only look at 1 (greedy) or 2 (lazy) matches ahead. But they are wrong. Just look at ZX7...

I started to encode my new format DAN1 using LZSS variant with Elias gamma and 4 different sizes to encode offset values. Because I knew some people would want to use this format and not just on simple title bitmap screens, DAN1 also avoid adding constantly bits for each literal byte in a long sequence of bytes (see RLE compression).

Combining LZ and RLE is a good idea. PuCrunch is also based on this idea.

Anyway your results are very promising. I will be looking forward to your final version!

I think the most useful advice I can give you is: If you concentrate on optimizing your algorithm against a few sample files, you will obtain the best possible algorithm specialized in compressing these sample files only, instead of the best possible algorithm for most typical files. The only way to avoid this "trap" is, from time to time, add more files to your test sample in order to validate your conclusions.

einar · June 2, 2016

(and thanks @einar as well, of course! it’s a really great compression method! 60kb of pictures into less than 29kb)

You are welcome!!!

Edited June 2, 2016 by einar

newcoleco · June 2, 2016

It seems you are focused on compressing text and full screen graphics only. There's obviously nothing wrong about producing a compressor specialized in certain data types. A specialized compressor would be very useful too. However, if you intend to produce a generic compressor, you better make sure your samples include other kinds of data too. Otherwise you will probably make choices that will benefit compression of certain data types only, but sacrifice others.

True. My goal is to reach great compression results for data used in ColecoVision projects and most of it is whatever goes to VRAM. However, DAN1 is all-purpose compression if you know what you are doing with it.

My favorite 8-bit console named ColecoVision has 1K of RAM and 16K of VRAM and my objective is to push the limits of what I can fit in the 32K ROM cartridge to exploit this beautiful vintage game system without the need of RAM space for decompression. This is the challenge I've accepted to make ColecoVision homebrew games, to do great with the capacities of the console. Data compression is part of my Coleco tools since the beginning in 1999 and it is a great deal to achieve my goal and I'm happy to sacrifice some years in my life and some CPU time to achieve it.

Like in ZX7 and Exomizer, all literals in DAN1 are stored raw to speed things up, but Huffman encoding can be applied in some circumstances (see DEFLATE). I plan to use my fixed Huffman encoding from DAN0 to exploit this potential of extra bytes saving in the future without again using extra RAM space.

As to be all-purpose or not, it is true that my main focus is what I need in VRAM for my projects and I based my decisions on this fact because it's important for me. However, the format itself is not specific for one kind of compression since it's a variant of LZSS.

Table 5 - Apply DAN1 on a small 24K ZIP file

Original ZIP file made of 12K bitmap samples. Size = 24583 bytes
DAN1 compression of the ZIP file. Size = 24543* bytes
ZX7 compression of the ZIP file. Size = 26815 bytes
ZIP file of the ZIP file. Size = 24527 bytes
Packed data only from the ZIP of the ZIP file. Size = 24288 bytes

DAN1 isn't as strong as DEFLATE (probably due to the lack of Huffman encoding), but it can be used for all-purpose if you know what you do. The difference between DAN1 and ZX7 results here is partly due to the implementation of RLE sequences which is about 10 lines of ASM codes in the decompression routine.

I've coded a simple CLI (command-line) to compress in DAN1 format. It handles files up to 64K. I use it for data up to 16K since my focus is what goes in VRAM. I'm sure a clever programmer can make a tool supporting input stream without a size limit.

Again, I'm taking this project seriously. DAN1 is a work in progress format and I'm not ready to release it yet.

Edited June 2, 2016 by newcoleco

newcoleco · June 4, 2016

Worked on DAN1 compression tool again today [stop] Patched memory leak found after testing DAN1 with test1.bin, test2.bin and test3.bin files from the Exomizer archive [stop] Improved compression speed for special case TEST2.BIN [stop] Quite pleased with the results

Table 6 - TEST1.BIN (Sound Sample .WAV)

Source : 202762 bytes
Exomizer : 202601 bytes
DAN1 : 204280* bytes
ZX7 : 225280 bytes

Tabe 7 - TEST2.BIN (String made of letter 'q')

Source : 196608 bytes
Exomizer : 39 bytes
DAN1 : 18* bytes
ZX7 : 19 bytes

Table 8 - TEST3.BIN (Documentation text file)

Source : 111261 bytes
Exomizer : 34219 bytes
DAN1 : 48103* bytes
ZX7 : 52035 bytes

* Still work-in-progress, may change.

Checked manually TEST2.BIN result to be correct according to DAN1 format [stop] Need to work on decompression tool.

Edited June 4, 2016 by newcoleco

newcoleco · June 8, 2016

@newcoleco thanks! it’s working now!

https://www.dropbox.com/s/8juj36b271725j6/colecovision_dzx7_borielzxbasiccompiler_working_20160602.zip

(now it’s a slideshow with 5 pictures for testing)

i just needed to replace
pop bc
pop de
pop hl
push hl
push de
push bc
with
pop bc                        ;- RET address
pop de                        ;- DE=dst
push bc                       ;- restore RET address
it’s really amazing! thanks again

(and thanks @einar as well, of course! it’s a really great compression method! 60kb of pictures into less than 29kb)

I have checked your listing again and I have noticed that you forgot to disable NMI during decompression. This leads to glitches, artefacts in the picture, and do answer your post on another forum about glitches you see.

newcoleco · June 9, 2016

Here's my version of the slide show, in ZX7 , DAN1 and a variant DAN1C.

An extra bitmap picture can certainly be added in the project within the limit of 32K.

Download the complete folder including compiled and source codes for all these slideshows : SLIDESHOW.zip

Edited June 19, 2016 by newcoleco

nitrofurano · July 27, 2016

excellent, @newcoleco! thanks!!!! (and sorry again for the late reply!)

Data Compression Test on Simple Bitmap 256x192

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members