newcoleco Posted May 29, 2016 Author Share Posted May 29, 2016 (edited) After 2 years studying data compression and testing various ways to achieve it, I've tried to make my own new format named "DAN1". This new format has "no extra memory usage" and "reasonable decompression speed" for 8bit systems. The following table compares data size after various compressions of a 12K sample Graphic Mode II screen. Exomizer v2 : 3337 DAN2 -m 11 : 3416 DAN1 : 3419 MegaLZ : 3524 aPLib : 3543 ZX7 : 3551 Pletter : 3557 BitBuster : 3595 PkK's RLE+Huffman : 3867 PuCrunch : 4000 DAN0 : 4143 Marcel DeKogel's RLE : 4528 In its core, DAN1 is a LZSS variant using 4 different fixed sizes for the offset values in pairs < length , offset >; in comparaison, ZX7 uses only 2 fixed sizes,and Exomizer has variable sizes for the offset depending on its rarety more than its value. Edited January 6, 2017 by newcoleco 2 Quote Link to comment Share on other sites More sharing options...
nitrofurano Posted May 30, 2016 Share Posted May 30, 2016 After a few more hours of net surfing, I saw this gem named ZX7 from the zx spectrum scene. The decompression routine is 69 bytes long only, but it's a routine to decompression into RAM, not VRAM. Beside that, I think it's an incredible achievement. With the test case "ColecoVision Strip Poker title screen", I've got this result. BitBuster : 3595 Pletter : 3557 ZX7 : 3551 ; ----------------------------------------------------------------------------- ; ZX7 decoder by Einar Saukas, Antonio Villena & Metalbrain ; "Standard" version (69 bytes only) ; ----------------------------------------------------------------------------- ; Parameters: ; HL: source address (compressed data) ; DE: destination address (decompressing) ; ----------------------------------------------------------------------------- dzx7_standard: ld a, $80 dzx7s_copy_byte_loop: ldi ; copy literal byte dzx7s_main_loop: call dzx7s_next_bit jr nc, dzx7s_copy_byte_loop ; next bit indicates either literal or sequence ; determine number of bits used for length (Elias gamma coding) push de ld bc, 0 ld d, b dzx7s_len_size_loop: inc d call dzx7s_next_bit jr nc, dzx7s_len_size_loop ; determine length dzx7s_len_value_loop: call nc, dzx7s_next_bit rl c rl b jr c, dzx7s_exit ; check end marker dec d jr nz, dzx7s_len_value_loop inc bc ; adjust length ; determine offset ld e, (hl) ; load offset flag (1 bit) + offset value (7 bits) inc hl defb $cb, $33 ; opcode for undocumented instruction "SLL E" aka "SLS E" jr nc, dzx7s_offset_end ; if offset flag is set, load 4 extra bits ld d, $10 ; bit marker to load 4 bits dzx7s_rld_next_bit: call dzx7s_next_bit rl d ; insert next bit into D jr nc, dzx7s_rld_next_bit ; repeat 4 times, until bit marker is out inc d ; add 128 to DE srl d ; retrieve fourth bit from D dzx7s_offset_end: rr e ; insert fourth bit into E ; copy previous sequence ex (sp), hl ; store source, restore destination push hl ; store destination sbc hl, de ; HL = destination - offset - 1 pop de ; DE = destination ldir dzx7s_exit: pop hl ; restore source address (compressed data) jr nc, dzx7s_main_loop dzx7s_next_bit: add a, a ; check next bit ret nz ; no more bits left? ld a, (hl) ; load another group of 8 bits inc hl rla ret i guess that replacing the ldir with a kind of “ldirvm” routine, as from msx bios (it is actually a simple routine, you can find on my colecovision/sg1000 sources at Boriel’s ZX-Basic Compiler wiki, in the “released programs”), would do the task of decompressing to vram instead of ram Quote Link to comment Share on other sites More sharing options...
newcoleco Posted May 30, 2016 Author Share Posted May 30, 2016 (edited) i guess that replacing the ldir with a kind of “ldirvm” routine, as from msx bios (it is actually a simple routine, you can find on my colecovision/sg1000 sources at Boriel’s ZX-Basic Compiler wiki, in the “released programs”), would do the task of decompressing to vram instead of ram I don't know much about the MSX BIOS routines -AND- the ZX7 to RAM routine is pretty much using all register pairs, the accumulator, and flags for various tasks, that's why I can only say "Maybe". I've made my own version ZX7 decompression directly into VRAM (see an earlier post for its source code). Edited May 30, 2016 by newcoleco 1 Quote Link to comment Share on other sites More sharing options...
nitrofurano Posted June 1, 2016 Share Posted June 1, 2016 the routine sg1000ldirvm that i use on ColecoVision is this one: (ix register is used from Boriel’s ZX-Basic Compiler, it just works exactly like ldir ) sg1000ldirvm.bas: sub sg1000ldirvm(tlen as uinteger, tvram as uinteger, tram as uinteger): asm ld b,(ix+5) ld c,(ix+4) ld d,(ix+7) ld e,(ix+6) ld h,(ix+9) ld l,(ix+ ldirmv: di ld a,e out ($bf),a ld a,d or 040h ;???????? out ($bf),a ldirmv_loop: ld a,(hl) out ($be),a inc hl dec bc ld a,b or c jr nz,ldirmv_loop ei end asm end sub '------------------------------------------------------------------------------- '- LDIRVM (msx) '- Address : #005C '- Function : Block transfer to VRAM from memory '- Input : BC - blocklength '- DE - Start address of VRAM '- HL - Start address of memory '- Registers: All Quote Link to comment Share on other sites More sharing options...
nitrofurano Posted June 1, 2016 Share Posted June 1, 2016 I've made my own version ZX7 decompression directly into VRAM (see an earlier post for its source code). this one? http://atariage.com/forums/topic/215500-data-compression-test-on-simple-bitmap-256x192/?p=3507010 - thanks! Quote Link to comment Share on other sites More sharing options...
einar Posted June 1, 2016 Share Posted June 1, 2016 (edited) Hi, My name is Einar Saukas, I'm the original author of ZX7. I'm glad to know people have been using it here, thanks for the interest! However I'm afraid there's some misunderstanding I would like to correct: Side note : The ZX7 compression tool written by the original author do a quick look and compress data based on the ZX7 specifications which I can explain if someone asks for it. However, this tool was written as a proof of concept but NOT as the best compression tool possible for this format to achieve even better compression results than what people are getting now. This situation is easy to understand if we compare with modern ZIP tools which are not optimized for the best compression possible at a cost of taking hours for it but good enough for a nice user experience achievable in seconds or very few minutes.This information is wrong. The ZX7 compression tool is "optimal", which means it produces the best compression results possible for this format. The documentation file explicitly says it. If you read anywhere a comment that suggested otherwise, please let me know so I can post a clarification there too. The compression tool runs very fast because it implements a faster algorithm I developed myself, but it doesn't sacrifice compression ratio to achieve this performance. In fact, after studying a lot about compression during my quest of making my own new compression format, reading various scientific papers on the subject and also watching the amusing video series by Google titled Compressor Head, I'm happy to say that after all the headaches, I was able to make my own compression tool based on ZX7 specifications and get the best results out of it.I'm fairly confident my ZX7 compressor is optimal, thus it should never be possible to produce an even smaller result for any file using the same format. However I recognize I'm not infallible, there's always a chance I made a mistake somewhere. Could you please provide one example where you believe you got a better result? If I really made a mistake, I would very much appreciate the chance to learn about it! Starting from there, I was able to develop my own format and compare fairly with ZX7 format; the results show better compression results overall at a cost of a slower decompression runtime but still no need for extra RAM space like Exomizer.Sounds interesting! I'm curious about your format also. Where can I find it? When I designed ZX7, I was aiming for the best tradeoff between compression ratio on most typical cases, decompression speed and size, while restricting myself to "pure" LZ77/LZSS. I'm aware that different file formats (consequently different algorithm strategies) could potentially provide better results in one of these characteristics by sacrificing others, and it's always interesting to see different algorithms and the choices they made. Edited June 1, 2016 by einar 2 Quote Link to comment Share on other sites More sharing options...
newcoleco Posted June 1, 2016 Author Share Posted June 1, 2016 (edited) Hi, My name is Einar Saukas, I'm the original author of ZX7. I'm glad to know people have been using it here, thanks for the interest! The ZX7 compression tool is "optimal", which means it produces the best compression results possible for this format. The documentation file explicitly says it. If you read anywhere a comment that suggested otherwise, please let me know so I can post a clarification there too. The compression tool runs very fast because it implements a faster algorithm I developed myself, but it doesn't sacrifice compression ratio to achieve this performance. I'm fairly confident my ZX7 compressor is optimal, thus it should never be possible to produce an even smaller result for any file using the same format. However I recognize I'm not infallible, there's always a chance I made a mistake somewhere. Could you please provide one example where you believe you got a better result? If I really made a mistake, I would very much appreciate the chance to learn about it! Sounds interesting! I'm curious about your format also. Where can I find it? When I designed ZX7, I was aiming for the best tradeoff between compression ratio on most typical cases, decompression speed and size, while restricting myself to "pure" LZ77/LZSS. I'm aware that best results can be obtained in one of these characteristics by sacrificing others, it's always interesting to see different algorithms and the choices they made. Indeed, ZX7 is a tradeoff between compression ratio and decompression speed and size because it's based on reading bytes most of the time instead of individual bits. It looks like there was a misunderstanding from my part regarding your proof of concept not providing optimal results. When I tried to repeat the experience by rewriting in Java the ZX7 data compression written in C, my results were differents than with the optimal algorithm. Clearly, I made a mistake. One paper compared Elias gamma with Elias gamma on delta and so on and the conclusion is that Elias gamma with lazy LZSS gives best results; this paper confused me at first about the "lazy" LZSS, but I decided to not investigate further... maybe I should. Journey to DAN1 format My goal is to make graphics fit in my ColecoVision projects and the biggest graphics are bitmap screens made of two 6K tables for patterns and colors for a total of 12K, charsets are also quite big and can certainly benefit from good compression. In LZSS compression, offset values are usually the biggest part of the data and just by changing to more suitable sizes for offsets it gets to a better compression at the cost of speed. For example, all Graphic Mode II bitmap screens I've tested, going from a simple logo to pixel art (ZX7, MSX1) to heavily dithered pictures do fit into these two categories : simple and elaborated; 4 bits and 9 bits offsets are best for simple bitmaps (simple logo) and 4 bits and 11 or 12 bits for elaborated bitmaps (vast majority of ColecoVision Bitmap title screens). Since the biggest challenge for my CV projects is to make the graphics fit, I had fun adapting ZX7 to uses these 4,9 and 4,11 offsets to get better compression but still couldn't get close to Exomizer results. I also started to write a paper in PDF about these results, even wrote a presentation last year for the ADAMCon, but I decided to keep going in my R&D before fixing my new format. This year, I've made the discovery of 4 fixed sizes for offset values is the best fit overall (for graphics and text compression), providing compression results closer to Exomizer without the need for extra memory. And since the waiting time before seeing a cool title screen doesn't really matter, I decided to adopt the 4 sizes to encode the offsets. Table 1 - Size of compressed data on a 12K bitmap sample obtained by using one, two, three, four, five or six sizes to encode the offset values. ONE : 11 bits = 2961 ZX7 : 7 and 11 bits = 2901 TWO : 8 and 11 bits = 2841 THREE : 4,8 and 11 bits = 2814 FOUR : 1,4,8 and 11 bits = 2798 FIVE : 1,3,5,8 and 11 bits = 2804 SIX : 1,2,3,5,8 and 11 bits = 2823 Based on these results, no more than 4 different sizes for offset values is best. According to a paper about variants of gamma encoding in LZSS optimization, as mentionned earlier, normal Elias gamma with lazy LZSS is best. But since I don't fully understand what this paper means by "lazy" I just go with what I know. I started to encode my new format DAN1 using LZSS variant with Elias gamma and 4 different sizes to encode offset values. Because I knew some people would want to use this format and not just on simple title bitmap screens, DAN1 also avoid adding constantly bits for each literal byte in a long sequence of bytes (see RLE compression). Table 2 - Size of different routines to decompress to VRAM used in various ColecoVision projects : RLE = 43 DAN0 = 97 ( RLE with literals encoded into a fixed Huffman ) ZX7 = 131 ( LZSS with 7 and 11 bits for Offsets ) DAN1 = 220* ( LZSS with 1, 4, 8 and 12 bits for Offsets and RLE for uncompressed literals ) PLETTER = 222 Table 3 - Size of compressed data of a 12K Bitmap sample RLE = 4528 DAN0 = 4143 ZX7 = 3551 DAN1 = 3419* PLETTER = 3557 Table 4 - Size of both decompression routine and compressed data RLE = 4571 DAN0 = 4240 ZX7 = 3682 DAN1 = 3639* PLETTER = 3779 DAN1 results are marked with * because it's still in development, not yet used in a ColecoVision project. I finish my tests and then present my results during ADAMCon 2016, July 14-18. Edited June 1, 2016 by newcoleco Quote Link to comment Share on other sites More sharing options...
nitrofurano Posted June 1, 2016 Share Posted June 1, 2016 (edited) my first attempt, not working... :S - https://www.dropbox.com/s/khpkdp5g03jddg9/colecovision_dzx7_borielzxbasiccompiler_notworking_20160601.zip i really have no idea what is happening, or what is wrong and where - i tried both version from @newcoleco (i tried to convert that into a Boriel’s ZX-Basic Compiler library routine) , as i also tried to replace the ldir from the @einar ’s routine with a colecovision/sg1000 version of “call ldirvm” - both doesn’t work as i expected - i used this picture for this test: Edited June 1, 2016 by nitrofurano Quote Link to comment Share on other sites More sharing options...
newcoleco Posted June 2, 2016 Author Share Posted June 2, 2016 (edited) my first attempt, not working... :S - https://www.dropbox.com/s/khpkdp5g03jddg9/colecovision_dzx7_borielzxbasiccompiler_notworking_20160601.zip i really have no idea what is happening, or what is wrong and where - i tried both version from @newcoleco (i tried to convert that into a Boriel’s ZX-Basic Compiler library routine) , as i also tried to replace the ldir from the @einar ’s routine with a colecovision/sg1000 version of “call ldirvm” - both doesn’t work as i expected - i used this picture for this test: I've looked at your ROM file very quickly and here is what I saw Inside dzx7.rom 960A: CD 57 96 CALL Something : .. .. .. <- Missing LD HL, SOURCE ADDR IN ROM : .. <- Missing PUSH HL 960D: 21 00 00 LD HL, $0000 = DESTINATION IN VRAM 9610: E5 PUSH HL 9611: 21 2C 80 LD HL, $802C = ? is it the data source ? 9614: CD 53 97 CALL DZX7 DZX7: C1 POP BC : D1 POP DE <- Destination Addr. $0000 = GOOD : E1 POP HL <- Source Addr. $0000 = BAD, Invalid since source address was never pushed? : E5 PUSH HL : D5 PUSH DE : C5 PUSH BC If you want to keep the way you call the dzx7 function by not pushing the source address but instead having it already in the HL register pair, remove the unnecessary POP HL and PUSH HL opcodes from the DZX7 routine and it should work. Give it a try and have fun! Edited June 2, 2016 by newcoleco Quote Link to comment Share on other sites More sharing options...
nitrofurano Posted June 2, 2016 Share Posted June 2, 2016 (edited) @newcoleco thanks! it’s working now! https://www.dropbox.com/s/8juj36b271725j6/colecovision_dzx7_borielzxbasiccompiler_working_20160602.zip (now it’s a slideshow with 5 pictures for testing) i just needed to replace pop bc pop de pop hl push hl push de push bc with pop bc ;- RET address pop de ;- DE=dst push bc ;- restore RET address it’s really amazing! thanks again (and thanks @einar as well, of course! it’s a really great compression method! 60kb of pictures into less than 29kb) Edited June 2, 2016 by nitrofurano 1 Quote Link to comment Share on other sites More sharing options...
einar Posted June 2, 2016 Share Posted June 2, 2016 It looks like there was a misunderstanding from my part regarding your proof of concept not providing optimal results. When I tried to repeat the experience by rewriting in Java the ZX7 data compression written in C, my results were differents than with the optimal algorithm. Clearly, I made a mistake. No problem! Thanks for the confirmation, I'm glad my ZX7 compressor is not flawed One paper compared Elias gamma with Elias gamma on delta and so on and the conclusion is that Elias gamma with lazy LZSS gives best results; this paper confused me at first about the "lazy" LZSS, but I decided to not investigate further... maybe I should. Elias Gamma and Elias Delta are very similar. The difference is, Elias Gamma provides slightly shorter encodings for small numbers, and Elias Delta provides slightly shorter encodings for large numbers. Therefore LZSS compression works better with Elias Delta when expecting a higher percentage of very long matches (such as large files with lots of repetitions using a large sliding compression window), and Elias Gamma works better otherwise. There's a good image in Wikipedia to illustrate this idea: https://commons.wikimedia.org/wiki/File:Fibonacci,_Elias_Gamma,_and_Elias_Delta_encoding_schemes.GIF I'm not sure the paper conclusion you mentioned makes much sense, since this is not supposed to be much related to the matching algorithm used. It depends almost exclusively on the characteristics of the data to be compressed. For 8-bit machines, that typically handles small data blocks (only a few Kb or less), Elias Gamma is certainly the best choice. This year, I've made the discovery of 4 fixed sizes for offset values is the best fit overall (for graphics and text compression), providing compression results closer to Exomizer without the need for extra memory. And since the waiting time before seeing a cool title screen doesn't really matter, I decided to adopt the 4 sizes to encode the offsets. It seems you are focused on compressing text and full screen graphics only. There's obviously nothing wrong about producing a compressor specialized in certain data types. A specialized compressor would be very useful too. However, if you intend to produce a generic compressor, you better make sure your samples include other kinds of data too. Otherwise you will probably make choices that will benefit compression of certain data types only, but sacrifice others. According to a paper about variants of gamma encoding in LZSS optimization, as mentionned earlier, normal Elias gamma with lazy LZSS is best. But since I don't fully understand what this paper means by "lazy" I just go with what I know. In match-based compression like LZSS, "greedy matching" (also known as "naive matching") means: at each step, just look for the next longest match and take it. In comparison, "lazy matching" means: at each step, look at next pairs of matches and take the next match from the longest pair. A more detailed explanation is provided here: http://fastcompression.blogspot.com/2010/12/parsing-level-1-lazy-matching.html Everybody assumes it would be too "time consuming" to analyze all possible combinations of matches, in order to achieve the best possible compression. For this reason, existing algorithms only look at 1 (greedy) or 2 (lazy) matches ahead. But they are wrong. Just look at ZX7... I started to encode my new format DAN1 using LZSS variant with Elias gamma and 4 different sizes to encode offset values. Because I knew some people would want to use this format and not just on simple title bitmap screens, DAN1 also avoid adding constantly bits for each literal byte in a long sequence of bytes (see RLE compression). Combining LZ and RLE is a good idea. PuCrunch is also based on this idea. Anyway your results are very promising. I will be looking forward to your final version! I think the most useful advice I can give you is: If you concentrate on optimizing your algorithm against a few sample files, you will obtain the best possible algorithm specialized in compressing these sample files only, instead of the best possible algorithm for most typical files. The only way to avoid this "trap" is, from time to time, add more files to your test sample in order to validate your conclusions. 2 Quote Link to comment Share on other sites More sharing options...
einar Posted June 2, 2016 Share Posted June 2, 2016 (edited) (and thanks @einar as well, of course! it’s a really great compression method! 60kb of pictures into less than 29kb) You are welcome!!! Edited June 2, 2016 by einar 1 Quote Link to comment Share on other sites More sharing options...
newcoleco Posted June 2, 2016 Author Share Posted June 2, 2016 (edited) It seems you are focused on compressing text and full screen graphics only. There's obviously nothing wrong about producing a compressor specialized in certain data types. A specialized compressor would be very useful too. However, if you intend to produce a generic compressor, you better make sure your samples include other kinds of data too. Otherwise you will probably make choices that will benefit compression of certain data types only, but sacrifice others. True. My goal is to reach great compression results for data used in ColecoVision projects and most of it is whatever goes to VRAM. However, DAN1 is all-purpose compression if you know what you are doing with it. My favorite 8-bit console named ColecoVision has 1K of RAM and 16K of VRAM and my objective is to push the limits of what I can fit in the 32K ROM cartridge to exploit this beautiful vintage game system without the need of RAM space for decompression. This is the challenge I've accepted to make ColecoVision homebrew games, to do great with the capacities of the console. Data compression is part of my Coleco tools since the beginning in 1999 and it is a great deal to achieve my goal and I'm happy to sacrifice some years in my life and some CPU time to achieve it. Like in ZX7 and Exomizer, all literals in DAN1 are stored raw to speed things up, but Huffman encoding can be applied in some circumstances (see DEFLATE). I plan to use my fixed Huffman encoding from DAN0 to exploit this potential of extra bytes saving in the future without again using extra RAM space. As to be all-purpose or not, it is true that my main focus is what I need in VRAM for my projects and I based my decisions on this fact because it's important for me. However, the format itself is not specific for one kind of compression since it's a variant of LZSS. Table 5 - Apply DAN1 on a small 24K ZIP file Original ZIP file made of 12K bitmap samples. Size = 24583 bytes DAN1 compression of the ZIP file. Size = 24543* bytes ZX7 compression of the ZIP file. Size = 26815 bytes ZIP file of the ZIP file. Size = 24527 bytes Packed data only from the ZIP of the ZIP file. Size = 24288 bytes DAN1 isn't as strong as DEFLATE (probably due to the lack of Huffman encoding), but it can be used for all-purpose if you know what you do. The difference between DAN1 and ZX7 results here is partly due to the implementation of RLE sequences which is about 10 lines of ASM codes in the decompression routine. I've coded a simple CLI (command-line) to compress in DAN1 format. It handles files up to 64K. I use it for data up to 16K since my focus is what goes in VRAM. I'm sure a clever programmer can make a tool supporting input stream without a size limit. Again, I'm taking this project seriously. DAN1 is a work in progress format and I'm not ready to release it yet. Edited June 2, 2016 by newcoleco Quote Link to comment Share on other sites More sharing options...
newcoleco Posted June 4, 2016 Author Share Posted June 4, 2016 (edited) Worked on DAN1 compression tool again today [stop] Patched memory leak found after testing DAN1 with test1.bin, test2.bin and test3.bin files from the Exomizer archive [stop] Improved compression speed for special case TEST2.BIN [stop] Quite pleased with the results Table 6 - TEST1.BIN (Sound Sample .WAV) Source : 202762 bytes Exomizer : 202601 bytes DAN1 : 204280* bytes ZX7 : 225280 bytes Tabe 7 - TEST2.BIN (String made of letter 'q') Source : 196608 bytes Exomizer : 39 bytes DAN1 : 18* bytes ZX7 : 19 bytes Table 8 - TEST3.BIN (Documentation text file) Source : 111261 bytes Exomizer : 34219 bytes DAN1 : 48103* bytes ZX7 : 52035 bytes * Still work-in-progress, may change. Checked manually TEST2.BIN result to be correct according to DAN1 format [stop] Need to work on decompression tool. Edited June 4, 2016 by newcoleco Quote Link to comment Share on other sites More sharing options...
newcoleco Posted June 8, 2016 Author Share Posted June 8, 2016 @newcoleco thanks! it’s working now! https://www.dropbox.com/s/8juj36b271725j6/colecovision_dzx7_borielzxbasiccompiler_working_20160602.zip (now it’s a slideshow with 5 pictures for testing) i just needed to replace pop bc pop de pop hl push hl push de push bc with pop bc ;- RET address pop de ;- DE=dst push bc ;- restore RET address it’s really amazing! thanks again (and thanks @einar as well, of course! it’s a really great compression method! 60kb of pictures into less than 29kb) I have checked your listing again and I have noticed that you forgot to disable NMI during decompression. This leads to glitches, artefacts in the picture, and do answer your post on another forum about glitches you see. Quote Link to comment Share on other sites More sharing options...
newcoleco Posted June 9, 2016 Author Share Posted June 9, 2016 (edited) Here's my version of the slide show, in ZX7 , DAN1 and a variant DAN1C. An extra bitmap picture can certainly be added in the project within the limit of 32K. Download the complete folder including compiled and source codes for all these slideshows : SLIDESHOW.zip Edited June 19, 2016 by newcoleco 1 Quote Link to comment Share on other sites More sharing options...
nitrofurano Posted July 27, 2016 Share Posted July 27, 2016 excellent, @newcoleco! thanks!!!! (and sorry again for the late reply!) Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.