Jump to content
IGNORED

DAN1 Compression Format


Recommended Posts

This thread is about DAN1 data compression.

I would love to get feedback before and after the final release.

 

Name : DAN1

 

Author : Daniel Bienvenu aka NewColeco

 

Start Year : 2014

 

End Year : 2016

 

Description : LZSS variant data compression format using Elias gamma length values, four fixed offset sizes, and raw parts (literal as is, see RLE).

 

Objective : Great compression ratio no extra memory needed during decompression. Can be used on vintage 8-bit systems.

 

 

Details :

 

DAN1 is a format based on LZSS (a variant of LZ77) which attempts compression by finding matches from already seen data. These matches are encoded as a pair of values : size and relative offset. Size is encoded by using Elias gamma, Offset is encoded in 1,4,8 or 12 bits depending on its value. When no match is found for some particular data, it is stored raw, literals. Bits are used as markers in DAN1 format to identify between literals and matches, and between the different sizes of offsets. The first byte is literal by default since no already seen data exists, no need for a marker. An end marker is used to tell the decompressor to stop regardless if there are even more data in the stream or in memory. A special marker is used to tell the decompressor to copy as-is a sequence of bytes all at once. Since the offset sizes are hard coded and no dynamic tables such as adaptive Huffman are used in this format, no extra memory is needed to decompress the data.

 

 

Data format (short text version) :

  • The first byte is literal
  • The following bytes are bits indicating what is next, reading from the highest bit first.
  • If it's a bit set to 1, the next byte is literal
  • If it's a bit reset to 0, the next bits (and byte) are values for size and relative offset.
  • The size is encoded using Elias gamma format : 1 = 1, 010 = 2, 011 = 3, 00100 = 4, 00101 = 5, 00110 = 6, 00111 = 7, etc.
  • The offset is encoded first with bits marker telling which format size (1, 4, 8 or 12 bits) is the encoded offset then the value itself.
  • RLE marker is a sequence of 17 bits reset to 0 followed by 1 bit set to 1, the number of literals to copy is 27 more than the value of the following byte.
  • END marker is a sequence of 18 bits reset to 0.

 

Offset encoding details :

 

The offset is relative to where to write upcoming data; if the offset value is zero then what we have to write next is what we just wrote as the last byte. Because the size value can be one, meaning a match of a single byte found near, the encoded offset should not be encoded too big in order to save bits, denying relative offset value to be in 8 or 12 bits. For a match of only 2 bytes, a relative offset value in 12 bits doesn't provide any real saving. So here's a table explaining the way offsets are written in DAN1.

  • If the size value is 1, marker bit 0 means the offset is encoded in one bit, marker bit 1 means the offset value (minus 2) is encoded in 4 bits. 0 0 = offset of zero (last byte we just wrote), 0 1 (before the last byte we just wrote), 1 0000 ("before before" the last byte we just wrote), 1 0001, ..., 1 1111.
  • If the size value is 2, marker bit 1 means the offset value (minus 18) is encoded in a byte (8 bits), marker bits 01 means the offset value (minus 2) is encoded in 4 bits, and marker bits 00 means the offset value is encoded in 1 bit.
  • If the size value is greater than 2, marker bit 1 means the offset value (minus 274) is encoded in 4 bits and a byte (12 bits), marker bits 01 means the offset value (minus 18) is encoded in a byte (8 bits), marker bits 001 means the offset value (minus 2) is encoded in 4 bits, marker 000 means the offset value is encoded in 1 bit.

 

Download :

 

DAN1 (EXE, SRC, ASM) Beta version 20160715 : dan1-beta-20160715.zip

 

Older versions :

DAN1 (EXE file) Beta version 20160712 : dan1-beta-20160712.zip

DAN1 (EXE file) Beta version 20160710 (all-in-one tool) : dan1-beta-20160710.zip

DAN1 (EXE file) Beta version 20160704 (compression tool only) : dan1-beta-20160704.zip

DAN1 (EXE files) Alpha version 20160618 (limited to file size up to 512K) : dan1-alpha-20160618.zip

DAN1 (EXE files) Alpha version 20160604 (limited to file size up to 256K) : dan1.zip

 

Usage :

 

Compression (beta version)

command-line :> dan1.exe [options] filename

options include : -r to activate rle encoding, -v to verbose, -y to automaticaly say yes to overwrite, -o to specify an output filename

Compression (alpha version)

command-line :> dan1.exe filename

 

Decompression

command-line :> ddan1.exe filename

 

Z80 Assembly Code to decompress in VRAM (ColecoVision version) * 20160715 - fixed RLE bug and optimized for size *

; DAN1 + RLE to VRAM
; 2016 Daniel Bienvenu, Canada

; HL = Source
; DE = Destination (in VRAM)

dan1rvram:
	; Set Write in VRAM at DE
	ld		c, $BF
	out		(c), e
	set		6, d
	out		(c), d
	res		6, d
	; Init. Read bits
	ld		a, $80

; Copy literal byte
dan1r_copy_byte:
	ld		b, $01
        jr		dan1r_literals2main

dan1r_special:
	pop		de
	call		getbit
	ret		nc        ; EXIT
	ld		b, (hl) ; Load counter value as a byte
	inc		hl
	inc		b
	call		dan1r_literals
	ld		b, $1A
dan1r_literals2main:
	call		dan1r_literals

; Main loop
dan1r_main_loop:
	call		getbit                    ; Check next bit
	jr		c,dan1r_copy_byte
	
; Elias gamma decoding + Special marker
        push	de
        ld		de, $0001
        ld		b,d	
dan1r_eliasgamma_0:
        inc		b
	call		getbit			; Check next bit
	jr		c, dan1r_eliasgamma_value
	bit		4,b
        jr		nz, dan1r_special	; Special marker "0000000000000000"
	jr		dan1r_eliasgamma_0
dan1r_eliasgamma_value_loop:
        call		getbite			; Load next bit into DE
        rl		d
dan1r_eliasgamma_value:
	djnz		dan1r_eliasgamma_value_loop
	push	de
	pop		bc			; BC = LENGTH
	
; Get Offset value
	ld		d, $00		; Reset Offset to $0000
	
; ON LEN GOTO TWO_OFFSETS, THREE_OFFSETS
; GOTO FOUR_OFFSETS

	ex		af,af'
	ld		a,b
	or		a
	jr		z, dan1r_bzero
	ld		a, $03
dan1r_bzero:
	or		c
	ld		e, a
	ex		af, af'
	dec		e
	jr		z, dan1r_offset2
	dec		e
	jr		z, dan1r_offset3
	ld		e, d

dan1r_offset4:
	call		getbit			; Check next bit
	jr		nc, dan1r_offset3
        call		getnibblee			; Get next nibble -> E
	inc		e
	ld		d,e				; D = E + 1
	jr		dan1r_offset3a
dan1r_offset3:
	call		getbit			; Check next bit
	jr		nc, dan1r_offset2
dan1r_offset3a:
        ld     	e, (hl)			; Load offset as a byte (8 bits) -> E
        inc   	hl
	ex		af, af'
	ld		a,e
	add		a, $12
	ld		e,a
	jr		nc, dan1r_offset3b
	inc		d
dan1r_offset3b:
	ex		af,af'
	jr		dan1r_copy_from_offset
dan1r_offset2:
	call		getbit			; Check next bit
	jr      	nc, dan1r_offset1
        call		getnibblee			; Get next nibble -> E
	inc		e
	inc		e
	jr		dan1r_copy_from_offset
dan1r_offset1:
	call		getbit			; Load next bit -> E
	rl		e
	
; Copy previously seen bytes
dan1r_copy_from_offset:
        ex		(sp), hl                ; Store source, Restore destination
        push	hl                              ; Store destination
	scf
        sbc		hl, de                  ; HL = source = destination - offset - 1
        pop		de                      ; DE = destination
	                                        ; BC = count
	; COPY BYTES
	ex		af,af'
	set		6,d
dan1r_copybytes_loop:
	push	bc
	ld		c, $BF
	out		(c), l
	nop
	out		(c), h
	inc		hl
	nop
	nop
	in		a, ($BE)
	nop
	nop
	nop
	out		(c), e
	nop
	out		(c), d
	inc		de
	nop
	nop
	out		($BE), a
	pop		bc
	dec		bc
	ld		a,b
	or		c
	jr		nz, dan1r_copybytes_loop
	res		6,d
	ex		af,af'
        pop		hl		; Restore source address
        jp		dan1r_main_loop
	
dan1r_literals:
	ld		c, $BE
dan1r_literals_loop:
	outi
	inc		de
	jr		nz, dan1r_literals_loop
	ret

getnibblee:
        call	getbite	; Load next bit -> E
        call	getbite	; Load next bit -> E
        call	getbite	; Load next bit -> E
getbite:
        call	getbit	; Load next bit -> E
        rl      e	
	ret

getbit:
	add		a, a
  	ret		nz
	ld		a, (hl)
	inc		hl
	rla
	ret
 

Example :

 

Here is a file encoded in DAN1 format ( in hex values ) : 71 00 00 FF FF 00 00 0F FF F0 00 00 FF FF 02 00 00 00

Input : 71 00 00 FF FF 00 00 0F FF F0 00 00 FF FF 02 00 00 00
Input size : 18 bytes
Decode : 71 < size = 65535 , offset in a bit = 0 > < size = 65535 , offset in a bit = 0 > < size = 65535 , offset in a bit = 0 > < size = 2 , offset in a bit = 0> < END marker >
Output : 71 71 71 71 71 71 ... 71
Output size : 196608 bytes
Details : 196608 times the letter 'q' (see test2.bin file in Exomizer ZIP archive)
Edited by newcoleco
  • Like 4
Link to comment
Share on other sites

  • 1 month later...

DAN1 Compression tool version Beta 20160710 seems stable and perform at a good speed.

Decompression routine is a bit slow in its small version and that's expected ( size vs speed dilema ).

Overall results surpass or equal the majority of available data compression for 8bit systems.

Mission Accomplished.

 

I do a speech during ADAMCon July 15-17, 2016, Guelph, Ontario, Canada.

Any questions? remark? comment?

  • Like 4
Link to comment
Share on other sites

What's your license terms? I could include it in my libti99coleco but my library is PD.

You probably know I like sharing my tools and games for decades, making them available freely, so what's the good licence terms for what I do believe is right?

 

Is freeware good? or perhaps gpl? or is lgpl applicable? i don't know. But what I do know is I make it available and hope it helps retro gaming enthusiasts like you and me.

Edited by newcoleco
Link to comment
Share on other sites

Over my head but still cool!

 

And that's part of my challenge right now as I write my speech for this week-end to make this new information somewhat clear, less abstract.

 

Please ask your questions if any.

Edited by newcoleco
Link to comment
Share on other sites

You probably know I like sharing my tools and games for decades, making them available freely, so what's the good licence terms for what I do believe is right?

 

Is freeware good? or perhaps gpl? or is lgpl applicable? i don't know. But what I do know is I make it available and hope it helps retro gaming enthusiasts like you and me.

 

I don't believe with most of the various drives to formalize free software (which I why my library went straight to public domain - I don't owe anything on it and I don't control its usage at all ;) ). But that's why I need to ask before I can include your code, if I put it into my library then the same thing happens to that code (not to your ownership of the protocol or any future versions, of course ;) ).

 

So entirely up to you! But I'd like to include the decoder and also port it across to the TI-99 for that version of my lib. :)

  • Like 1
Link to comment
Share on other sites

Oh, but I would add that GPL and LGPL should be applied with care. GPL is an "infectious" license which places demands on anyone using your code -- it would force anything using Dan1 to ALSO be GPL, and not everyone wants to release their source. LGPL is okay only if your code is distributed as a library. In source form as you list above, it would be the same as GPL. Licenses like "BSD" are a little more free in terms of freedom, IMO. :) Personally I don't use any of them except in cases where GPL has forced me to use it. :)

  • Like 1
Link to comment
Share on other sites

I tried to use it in the upcomming CollectorVision: Star Command game, I used it to compress 16kb vram down to around 1600 bytes. I took very long time to compress, Using an I5 CPU. And the CPU did not use more then 33% all the time.

 

I dont know if the decompressor is quicker then the one I usely use, but it works good and most importend no use of RAM.

 

So I think from now one I will use the DAN1 if its allowed for Commercial use :)

 

By the way the compressor said its PP2PC and a tool to extract bitmapped data :)

 

Dont know if its the newest version it said beta-20160710

Link to comment
Share on other sites

I tried to use it in the upcomming CollectorVision: Star Command game, I used it to compress 16kb vram down to around 1600 bytes. I took very long time to compress, Using an I5 CPU. And the CPU did not use more then 33% all the time.

 

I dont know if the decompressor is quicker then the one I usely use, but it works good and most importend no use of RAM.

 

So I think from now one I will use the DAN1 if its allowed for Commercial use icon_smile.gif

 

By the way the compressor said its PP2PC and a tool to extract bitmapped data icon_smile.gif

 

Dont know if its the newest version it said beta-20160710

 

 

Thanks for reminding me to clean up my code for the final release.

PP2PC is another tool I made to convert PowerPaint pictures into Dale Wick's ".pc" files.

I simply copy-pasted part of its code to implement options (or flags) such as "-v" to verbose all steps, "-h" to show help and "-r" to allow RLE encoding as "there is a string of uncompressed data here better not to try to compress it wastes memory space". I will explain all about DAN1 options (flags) in my speech.

 

The compression tool tries to explore all possibilities and pick up the best solution for this format and that's why it's slow. I guess a quick no so optimized result for development purpose should be an option for this tool, giving more an idea of the space needed and decoding speed while programming and testing projects.

 

If the actual dan1 decompression routine is a bit slow for your taste, I'm sure someone will come up with a faster version the more this tool used.

Side note : Try "dan1 -r filename" to allow RLE markers if you consider your data to be difficult to compress such as highly dithered pictures.

Link to comment
Share on other sites

Hey Coleco programmers... (from A to Z)

 

I've followed this thread for several years and I'm just a Coleco fan with ZERO knowledge of programming on the system (years ago I wanted to learn but those days are behind me...).

 

 

 

but to all of you (for what it's worth)... THANK YOU for sharing ideas etc here so that you can continue producing games for the Coleco

 

 

Continue please...

Jeff31

Link to comment
Share on other sites

I dont really mind the speed just wondering why it dont use all cpu speed available :)

The most importend is at the z80 side :)

 

Thanks for a great tool :)

 

 

 

Thanks for reminding me to clean up my code for the final release.

PP2PC is another tool I made to convert PowerPaint pictures into Dale Wick's ".pc" files.

I simply copy-pasted part of its code to implement options (or flags) such as "-v" to verbose all steps, "-h" to show help and "-r" to allow RLE encoding as "there is a string of uncompressed data here better not to try to compress it wastes memory space". I will explain all about DAN1 options (flags) in my speech.

 

The compression tool tries to explore all possibilities and pick up the best solution for this format and that's why it's slow. I guess a quick no so optimized result for development purpose should be an option for this tool, giving more an idea of the space needed and decoding speed while programming and testing projects.

 

If the actual dan1 decompression routine is a bit slow for your taste, I'm sure someone will come up with a faster version the more this tool used.

Side note : Try "dan1 -r filename" to allow RLE markers if you consider your data to be difficult to compress such as highly dithered pictures.

Link to comment
Share on other sites

Sorry Daniel, I don't understand one thing. How do we use it with bmp file for example ?

Is it currently only for raw files ?

 

Short answer : It takes raw files. Not limited to pictures. This is not a BMP to CV converter.

 

Long answer :

No, it's not an image converter, It's a tool to compress raw data into this format I named DAN1. Data in VRAM can be many things such as fullscreen pictures, charsets, game levels, etc. For BMP files, try an image converter such as like BMP2PP or DITHER to transform BMP into a picture file. Extract the PATTERN and COLOR data tables into binary files and apply DAN1 compression. Add the compressed data into your project and use the appropriate code to setup the video chip right and see the result on screen.

Link to comment
Share on other sites

Speech about DAN1 data compression was done at the ADAMCon 28, July 16, 2PM in Guelph, Ontario, Canada.

 

I tried as much as possible to be clear without going into details.

I've explained why I consider this format to be great for ColecoVision projects.

 

Link to a copy of the presentation and related files soon.

Edited by newcoleco
Link to comment
Share on other sites

Just a few words before receiving tons of messages about how to use this tool and what's the difference between dan1vram and dan1rvram routines included in the zip archive file.

 

DAN1 has two "flavors" : a normal one and a special with RLE coding for long strings of uncompressible data.

 

You won't need the extra RLE coding most of the time because LZSS works just fine by itself.

 

In the case you want to use the special RLE coding, use the flag ' -r ' to allow its usage during compression, which doesn't mean it will be necessary and used for your data.

 

DAN1VRAM is for normally compressed DAN1 data.

DAN1RVRAM* is for DAN1 data using special RLE coding.

 

* : You can use DAN1RVRAM routine for both DAN1 normal data and DAN1 with RLE data.

 

I hope this is clear.

 

Please share and comment.

I'm already getting suggestions about tools using DAN1 in various ways.

Link to comment
Share on other sites

  • 3 months later...

Hi,

I have a strange behavior with your compressor Daniel.
i tried to integrate it in my tool (thanks for sharing source code), it works fine, but compression ratio is lower than a pletter compressor :o !

Dan1 :
const byte TILtitle3DAN1[] = { // tile size dan1 compressed 965 bytes (52%)
const byte COLtitle3DAN1[] = { // color size dan1 compressed 793 bytes (60%)
const byte MAPtitle3DAN1[] = { // map size dan1 compressed 385 bytes (49%)

Pletter :
const byte TILtitle3PLE[] = { // tile size pletter compressed 957 bytes (52%)
const byte COLtitle3PLE[] = { // color size pletter compressed 787 bytes (61%)
const byte MAPtitle3PLE[] = { // map size pletter compressed 385 bytes (49%)

 

I tried (only the TIL part on my image) with your exe compressor, and it gave same size, 965 bytes.

 

Do you think it is a normal behavior for Dan1 compression ?

 

I added the c source file of the pletter compressor i used (entry point is int pletterCompress(byte *src,byte *dst,int n)

 

Of course, for some pictures, your compressor is sometime better, but not all the time. It is just to inform you and to share information about compression :)

 

i used this image for test

post-16672-0-37626200-1477991734_thumb.png

plet.txt

Edited by alekmaul
Link to comment
Share on other sites

  • 4 weeks later...

Dear alekmaul,

 

The behavior you are experimenting is normal for 3 reasons :

 

1. The fixed set may be overkill and waste bits in some cases, but my tests tend to demonstrate that a set of 4 sizes is optimal (5 being too much). The static set of 1,4,8,12 bits size to encode the offset values is optimal for "large" ColecoVision graphics in average according to my sample set. An offset using 12 bits (value going up to 2^12) goes beyond the limit of a charset (size 2^11), making the highest bit totally useless and so increase artificially the size of the compressed data. Your picture has lots of repeated sequences very close to each other, within a charset distance maximum.

 

2. Pletter has multiple options to encode offset values and uses the best fit one according to the data. DAN1 encodes offset values into the four fixed sizes 1,4,8 and 12 bits depending on their values. This set is in average ideal, but not adapted to the data. A workaround is to modify DAN1 to use a different set more adapted to the data used in your project, which is what I did in the slideshow demo in another thread and named it DAN1C (C simply means it was the variation C I tried that day and worked nicely).

 

3. DAN1 is not a miracle solution, but is surely a great one.

 

My sample set to develop DAN1 was a bunch of bitmap screens from ColecoVision projects, pixelated photos and converted ZX Spectrum pixel arts.

I would love to see someone make an intensive test with hundred of data files related to 8bit homebrew projects. In fact, this would be very helpful to do the next generation of data compression tool.

Extra note : I've used with success a variation named DAN1C (C variation) that uses 1,3,8 and 11 bits to encode offset values. This variation is briefly mentioned in another thread as an alternative that works better for some graphics like the slideshow demo. Compared to the original set of 1,4,8 and 12 bits, the variation tend to reduce even more simple data such as the color table and so give better results.

 

 

Hi,

I have a strange behavior with your compressor Daniel.
i tried to integrate it in my tool (thanks for sharing source code), it works fine, but compression ratio is lower than a pletter compressor icon_eek.gif !

Dan1 :
const byte TILtitle3DAN1[] = { // tile size dan1 compressed 965 bytes (52%)
const byte COLtitle3DAN1[] = { // color size dan1 compressed 793 bytes (60%)
const byte MAPtitle3DAN1[] = { // map size dan1 compressed 385 bytes (49%)

Pletter :
const byte TILtitle3PLE[] = { // tile size pletter compressed 957 bytes (52%)
const byte COLtitle3PLE[] = { // color size pletter compressed 787 bytes (61%)
const byte MAPtitle3PLE[] = { // map size pletter compressed 385 bytes (49%)

 

I tried (only the TIL part on my image) with your exe compressor, and it gave same size, 965 bytes.

 

Do you think it is a normal behavior for Dan1 compression ?

 

I added the c source file of the pletter compressor i used (entry point is int pletterCompress(byte *src,byte *dst,int n)

 

Of course, for some pictures, your compressor is sometime better, but not all the time. It is just to inform you and to share information about compression icon_smile.gif

 

i used this image for test

Edited by newcoleco
Link to comment
Share on other sites

  • 2 weeks later...

Solved.

 

Somehow, it is space consuming to split into 3 tables (PATTERN, COLOR, NAME) instead of just 2 (PATTERN, COLOR) assuming we talk about a bitmap screen. When using only 2 tables, DAN1 performs as expected, getting better results than Pletter. When using 3 tables under 2048 bytes each, DAN1 is using more bits than needed to store offset values. A solution is to adapt and there are some ways to make it.

 

For this problem, I considered a simple variation of DAN1 using a reasonable number of bits to store offset values adapted to the data, which in this case a maximum of 10 bits will do the trick. The results are :

 

With 3 tables

Pletter :
  1. const byte TILtitle3PLE[] = { // tile size pletter compressed 957 bytes (52%)
  2. const byte COLtitle3PLE[] = { // color size pletter compressed 787 bytes (61%)
  3. const byte MAPtitle3PLE[] = { // map size pletter compressed 385 bytes (49%)

Total = 2129 bytes

 

Dan1_10 (adapted) :
  1. const byte TILtitle3DAN1_10[] = { // tile size dan1 compressed 947 bytes (53%)
  2. const byte COLtitle3DAN1_10[] = { // color size dan1 compressed 784 bytes (61%)
  3. const byte MAPtitle3DAN1_10[] = { // map size dan1 compressed 385 bytes (49%)
Total = 2116 bytes
As for if the picture was kept as a bitmap into 2 tables, DAN1 performs nicely, and sometimes only 11 bits max perform better. The results are :
With 2 tables (assuming screen filled with all the possible characters to show a bitmap screen)
Pletter :
  1. const byte TILtitle3Plet[] = { // tile size pletter compressed 1259 bytes (79%)
  2. const byte COLtitle3Plet[] = { // color size pletter compressed 794 bytes (87%)
Total = 2053 bytes
Dan1_12 (original) :
  1. const byte TILtitle3DAN1_12[] = { // tile size dan1 compressed 1248 bytes (79%)
  2. const byte COLtitle3DAN1_12[] = { // color size dan1 compressed 799 bytes (87%)
Total = 2047 bytes
Dan1_11 (adapted) :
  1. const byte TILtitle3DAN1_11[] = { // tile size dan1 compressed 1233 bytes (79%)
  2. const byte COLtitle3DAN1_11[] = { // color size dan1 compressed 791 bytes (87%)
Total = 2024 bytes

 

Since DAN1 source code is given, it is possible to change it by yourself and adapt the solution to your needs like I've done here to present these results. With more data and time, I may create a new software solution, perhaps as a new format and keep this one as is; it's at least something to think about.

Edited by newcoleco
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...