glurk Posted September 17, 2021 Share Posted September 17, 2021 (edited) So I had a few K of data, a font and some G2F screen data, and noticed it had lots of repeated runs interspersed with sequential runs of bytes. So I made this RLE decoder for it, and saved about 32% (on this particular data). Maybe someone will find it useful. It doesn't handle runs of non-repeated, non-sequential data, as I didn't have many. So it was sort of purpose-built. Uses 5 zero page locations. Quote RLEDECOMP SUBROUTINE ldy #1 lda (src_addr),Y ; load the data byte first pha ; and push it dey sty temp ; disable the incrementer lda (src_addr),Y ; get the command bne .ahead3 pla ; pop 'A' and then, rts ; exit when CMD is $00 .ahead3 asl bcc .ahead2 ; if high bit set, inc temp ; enable the incrementer .ahead2 lsr tax pla ; pop the data byte .rle1 sta (dst_addr),Y ;clc ; unneeded I guess?? adc temp ; add the increment inc dst_addr bne .ahead1 inc dst_addr+1 .ahead1 dex bne .rle1 ;clc ; apparently unneeded also? lda #2 adc src_addr sta src_addr txa ; faster than LDA #0 adc src_addr+1 sta src_addr+1 bne RLEDECOMP ; unconditional Sorry about the formatting, I couldn't figure out how to do that on this board. I think the two CLC instructions are unneeded, it works for me without them. I also don't have an encoder, because I did it by hand, hahaha... Just posting it in case someone finds it useful. EDIT FOR EXAMPLE: .BYTE 3,$01,128+5,$04 decodes to: 01 01 01 04 05 06 07 08 Edited September 17, 2021 by glurk 1 1 Quote Link to comment Share on other sites More sharing options...
TGB1718 Posted September 17, 2021 Share Posted September 17, 2021 May be just lucky, but both asl and lsr can set the carry bit Quote Link to comment Share on other sites More sharing options...
glurk Posted September 17, 2021 Author Share Posted September 17, 2021 (edited) But it's an ASL soon followed by an LSR. So I guess the same zero that went in comes back out ! Hahaha. It's the second CLC that I wonder about. I left them in there just in case so I can put them back in. EDIT: Looking at it again, the carry should still be clear at that point (I think) Edited September 17, 2021 by glurk 1 Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 17, 2021 Share Posted September 17, 2021 If you have cc65, I have a version of the Stac technique online, but it's for CBM machines and will need to be converted for the Atari. I can give you a link. Are you interested? Quote Link to comment Share on other sites More sharing options...
glurk Posted September 17, 2021 Author Share Posted September 17, 2021 Don't know much about 'Stac' but example code is always good... This one really came about by accident, and it worked really well for me, so I cleaned up the code and posted. I tend to like using code I write myself, so I can (usually) understand what it's doing :) 1 Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 17, 2021 Share Posted September 17, 2021 Try looking for "Stac compression" on wikipedia.com. My version is at c65 additions - Manage /compress at SourceForge.net. Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 17, 2021 Share Posted September 17, 2021 I had an update to make Stac slightly better, but the copy broke. I need to redo it. Any recommendations to make it better? Quote Link to comment Share on other sites More sharing options...
glurk Posted September 17, 2021 Author Share Posted September 17, 2021 If you're asking me in particular, I'm no expert on data compression at all. I wrote something similar to the above on an Apple ][ years ago, but this was something I just hacked together... I'm sure there are others here that know much more about this kinda stuff. 1 Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 17, 2021 Share Posted September 17, 2021 glurk: I thank you for your response. I was not asking you in particular. I was asking the group. I'm asking because I want to improve it. I tried MTF and some of my own ideas. Only one of them worked. It was simply an optimization for performance that performs extra checks to make sure LZ77 blocks were efficient. I need to redo it. Quote Link to comment Share on other sites More sharing options...
tebe Posted September 17, 2021 Share Posted September 17, 2021 RLE Encoder 1.2 http://madteam.atari8.info/index.php?prod=uzytki the fastest and shortest rle_depacker sta inputPointer+1 loop jsr getByte beq stop lsr @ tay lp0 jsr getByte lp1 sta $ffff outputPointer equ *-2 inw outputPointer dey _bpl bmi loop bcs lp0 bcc lp1 getByte inx sne inc inputPointer+1 lda $ff00,x ; lo(inputPointer) = 0 !!! inputPointer equ *-2 stop rts Quote Link to comment Share on other sites More sharing options...
glurk Posted September 17, 2021 Author Share Posted September 17, 2021 tebe- I wrote this one to run from ROM (cartridge-based), so no self-modifying code, plus it decodes runs like $00,$00,$32,$33,$34,$35 which I didn't see any other schemes doing, really. So hopefully I'm not re-inventing the wheel. I even looked at NES stuff, which uses RLE a lot, but that stuff is too different architecture. 1 Quote Link to comment Share on other sites More sharing options...
glurk Posted September 18, 2021 Author Share Posted September 18, 2021 Replying to myself, I've worked on this some more and got it to also handle runs of arbitrary data. Here's a before/after sample of g2f data: (128 BYTES) 00 00 00 00 00 01 02 00 00 00 3B 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 08 09 0A 3C 3D 3E 3F 40 0F 41 11 3F 42 0B 02 15 43 44 45 0F 41 11 00 00 00 00 00 00 00 00 00 17 18 19 1A 46 47 48 49 4A 4B 4C 4D 46 4E 4F 26 23 50 51 52 53 4C 4D 00 00 00 00 00 00 00 00 00 29 2A 2B 2C 54 55 56 35 57 58 59 5A 5B 5C 34 38 5D 5E 5F 2F 00 59 5A 00 00 00 00 00 (72 BYTES) 05 00 82 01 03 00 01 3B 08 00 01 04 0F 00 84 07 85 3C 48 0F 41 11 3F 42 0B 02 15 83 43 43 0F 41 11 09 00 84 17 88 46 01 46 82 4E 42 26 23 84 50 82 4C 09 00 84 29 83 54 01 35 86 57 42 34 38 83 5D 42 2F 00 82 59 05 00 I think 128 bytes down to 72 is 44% compression. That's pretty good, IMO. It's really intended for g2f data, which generates a lot of runs of consecutive bytes. I haven't cleaned up the code yet, but it works. But no-one seems interested in it anyway, haha.... ? 1 2 Quote Link to comment Share on other sites More sharing options...
tebe Posted September 18, 2021 Share Posted September 18, 2021 3 hours ago, glurk said: But no-one seems interested in it anyway, haha.... ? it's not true Quote Link to comment Share on other sites More sharing options...
glurk Posted September 18, 2021 Author Share Posted September 18, 2021 (edited) Ok, here's the new version then. It even uses an (optional) undocumented opcode, which saves 1 byte and 2 cycles. I guess I need to write an encoder to go with it, but not sure how to go about that... As a Windows program, or in C, or something else? Anyway, this code is as tight and fast as I can make it. Someone more clever than myself might be able to optimize further: RLEDECOMP SUBROUTINE ldy #1 lda (TEMP3),Y ; load the data byte first pha ; and push it dey sty TEMP5 ; disable the incrementer lda (TEMP3),Y ; get the command bne .ahead3 pla ; pop 'A' and then, rts ; exit when CMD is $00 .ahead3 asl bmi .dorun bcc .ahead2 ; if high bit set, inc TEMP5 ; enable the incrementer .ahead2 lsr tax pla ; pop the data byte .rle1 sta (TEMP1),Y adc TEMP5 ; add the increment inc TEMP1 bne .ahead1 inc TEMP2 .ahead1 dex bne .rle1 lda #2 adc TEMP3 sta TEMP3 bcc RLEDECOMP inc TEMP4 bne RLEDECOMP ; unconditional .dorun .byte $4B,$7F ; undocumented ASR #$7F, same as AND #$7F, LSR tax pla inc TEMP3 bne .drloop inc TEMP4 .drloop inc TEMP3 bne .ahead4 inc TEMP4 .ahead4 sta (TEMP1),Y lda (TEMP3),Y inc TEMP1 bne .ahead6 inc TEMP2 .ahead6 dex bne .drloop beq RLEDECOMP ; unconditional TEMP1 and 2 are the destination, TEMP3 and 4 are the source, TEMP5 is the "increment-er" - all zero-page locations. $00 = END of stream $01-$3F = Repeat the following byte that number of times $41-$7F = Dump the following n-$40 bytes directly $81-$FF = Repeat the following byte n-$80 times, incrementing by 1 each byte I (think) that's an accurate description. In any case, I'm using it and it works... Edited September 18, 2021 by glurk Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 18, 2021 Share Posted September 18, 2021 I'm jealous of you. I ask you to port it to cc65 C and let me use it for Template Creator CBM which is to be ported to the Atari 8-bit series. Quote Link to comment Share on other sites More sharing options...
glurk Posted September 18, 2021 Author Share Posted September 18, 2021 There was a slight bug in the above code, the relevant part should be this: .ahead3 asl bcc .ahead2 ; if high bit set, inc TEMP5 ; enable the incrementer .ahead2 bmi .dorun lsr tax Harry Potter- This is just a de-compressor for a certain type of data, with specific characteristics. Graph2Font generated data primarily. It wouldn't be great at general purpose stuff. It's "kind of" based on RLEINC for the NES, but they decompress into a 'port' (a single address) on their PPU, and this one works with src/dest addresses. I probably need to test it more to make sure there are no other bugs like the one I caught above. It's really easy to make an "off-by-one" error that messes up the whole thing. I haven't tested every possible case, but I think it's solid now(?) I mean, I'm using it in a program now, and it's working with the data there... Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 18, 2021 Share Posted September 18, 2021 Okay. How does it fare against the Stac technique? Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted September 18, 2021 Share Posted September 18, 2021 BTW, I'm working on my own compression technique. It's very slow, but, right now, I'm concentrating on getting better numbers. I have to optimize later. Quote Link to comment Share on other sites More sharing options...
glurk Posted September 18, 2021 Author Share Posted September 18, 2021 The way I look at it - at least for "game stuff", which this is - is that it's OK for the COMPRESSOR to be slow, since more time can/should allow it to compress better, and you only do that part once anyway. But you want the in-game DECOMPRESSOR to be really fast. That said, I "invented" this scheme in my head, and then wrote a decoder for it. Writing the code to actually do the compression itself is a whole other thing. Recursion, ouch, makes my head hurt. Plus, I guess it should be a Windows-based compressor, and Windows programming is not my thing at all. I actually encoded/compressed my own data by hand, which was not that big a deal, really, mostly involved removing bytes from a text file of data and putting in the counts etc... For this to really be useful for others, a compressor will need to be written, but I'm not sure how to do that part of it, hahaha. Never written one before, I just came up with the (simple, really) RLE-variant compression scheme. Quote Link to comment Share on other sites More sharing options...
glurk Posted September 19, 2021 Author Share Posted September 19, 2021 tebe- I decided to test this on a real g2f image, and I used your "Gollum" image. The file Gollum_C64_Rensoup_Tebe.scr which is 1200 bytes, compresses down to 253 bytes using this scheme. I actually encoded it by hand (lots of work) because I don't yet have an encoder, and I'm not sure at all that I really have the skills/know how to write one. It would have to be a Windows program, or at least commandline C. But, 1-(253/1200) is 0.789166666 which is 79% compression!! I'd say that's pretty good. I could REALLY use help writing an encoder for this, from anyone who wants to volunteer. I don't think I can do it, I'm mostly a 6502 guy, and can't do much with Windows-based programming. Quote Link to comment Share on other sites More sharing options...
tebe Posted September 19, 2021 Share Posted September 19, 2021 I don't know how you got that result. I compressed this file with RLE and got a result of 994 bytes gollum.7z Quote Link to comment Share on other sites More sharing options...
glurk Posted September 19, 2021 Author Share Posted September 19, 2021 (edited) It's not just simple RLE, it encodes sequences like 06,07,08,09,0A,0B,0C,0D etc. That sequence would be encoded like: .byte $88,$06 And those n+1 increasing sequences are so common in the .SCR files, that's why I wrote this thing. Another example: .byte 06,FF,86,DD,42,BA,AE,02,FF decodes to: FF,FF,FF,FF,FF,FF,DD,DE,DF,E0,E1,E2,,BA,AE,FF,FF It's really efficient at n+1 increasing sequences. Edited September 19, 2021 by glurk fixed bad mistake! 1 Quote Link to comment Share on other sites More sharing options...
glurk Posted September 19, 2021 Author Share Posted September 19, 2021 I'm working on writing an encoder for this, it's going very slowly. Here is the current output when run on the "Gollum_C64_Rensoup_Tebe.scr" file: File is 1200 bytes long. Repeated run of 81 bytes of value: 0 Increasing run of 31 bytes starting with: 1 Increasing run of 3 bytes starting with: 160 Increasing run of 2 bytes starting with: 35 Repeated run of 3 bytes of value: 0 Increasing run of 4 bytes starting with: 37 Those values are all decimal - and correct. It then gets to the $0D byte, which I haven't handled yet. I'm trying to write this encoder for Windows, and the language I'm using is QB64 (BASIC) and I've never really used it before. Quote Link to comment Share on other sites More sharing options...
tebe Posted September 19, 2021 Share Posted September 19, 2021 (edited) 1 hour ago, glurk said: It's not just simple RLE, it encodes sequences like 06,07,08,09,0A,0B,0C,0D etc. That sequence would be encoded like: .byte $88,$06 ahhh, now I understand your idea, clever Edited September 19, 2021 by tebe Quote Link to comment Share on other sites More sharing options...
ivop Posted September 19, 2021 Share Posted September 19, 2021 (edited) Perhaps an easier way to implement this is to see it as a form of delta compression. First create the deltas, then use plain RLE on the deltas. Increasing runs result in runs of 1. Decreasing runs result in runs of $ff. Normal runs turn into delta, 0, 0, 0 etc... Edited September 19, 2021 by ivop 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.