Harry Potter Posted June 21, 2022 Share Posted June 21, 2022 Hi! This morning, I was perusing some notes referred here and finally realized what was meant by "hash tables." I implemented similar in my own way: I have an 8k block of bits which specify whether a particular word is already used. It had the effect of greatly increasing the compression ratio, but speed is not increased significantly. Following is the code to get and set a certain bit in the table: unsigned char LZGetUsedBit (unsigned i) { return (LZUsedBits[i>>3]>>(i&7))&1; } unsigned char LZSetUsedBit (unsigned i, unsigned char val) { LZUsedBits[i>>3]=(LZUsedBits[i>>3])&~(1<<(i&7))|(val<<(i&7)); }and the code to skip testing if unused: if ((backcol=LZGetUsedBit(*(unsigned*)&lzdict[lzoffs]))==0) { //++backcol; LZSetUsedBit (*(unsigned*)&lzdict[lzoffs], 1); goto e; } BTW, backcol is a variable that points to the register in the C64 memory map that holds the background color. BTW, this code is written in cc65 C. Quote Link to comment Share on other sites More sharing options...
phaeron Posted June 22, 2022 Share Posted June 22, 2022 The structure you're using is more commonly known as a bit vector or bit set. It's a useful technique but not the same as a hash table. A hash table works by first converting lookup key to a hash value to select a bucket, then searching/adding/removing from the list of elements in the bucket. The simplest hash function is simply masking (v & N) or computing a modulus (v % N) by the bucket count. The idea is that the hashing roughly distributes keys among the buckets so you're searching a fraction of the elements instead of the whole set. Many implementations also grow the number of buckets as the element count grows. An important difference from the bit set is that the hash table is sparse -- its memory usage depends on how many buckets and elements it has, not the total number of possible keys (8K in your case). But in the case where most of the keys are present, the bitset is more compact, so as usual it's a tradeoff. I'm not experienced with CC65, but I threw the above functions in with all optimization options I could find and the result was pretty poor, it isn't up to the interprocedural optimizations required to make this fast. In asm I would use something like this: lda i+1 lsr lsr lsr ora #>LZUsedBits sta ptr+1 lda i+1 and #7 tax ldy i lda (ptr),y and bittab,x beq skip eor (ptr),y sta (ptr),y skip: But of course, this also requires changes that a C compiler is not allowed to apply, such as inverting the bitfield and reordering the index bits. Back to the original topic of LZ sliding window compression, a bitset probably isn't what you want here as it records if something has been seen, not where. You want the latter for fast compression matching. To do that, you need to actually store the position where a word value was found as well. From that, you can determine if it's an accurate match and still within the reference window. The simplest structure is just to hash the word down to a smaller number of buckets and then have a array of buckets updated on each new word, e.g. unsigned int LZLastPosition[256]. But that only stores the latest position per bucket, so you'll get a lot of collisions and thus lots of lost match candidates. That's where the traditional hash table strategies for collision resolution / chaining come in to play, to store more than one element per hash bucket. To give you an idea, this is the hash insertion algorithm I use as part of an LZ firmware compressor. It's not meant to run on a 6502, but it should give you an idea. It's used with a 16K reference window and a 256 bucket hash table, hashing 3 byte clusters. There are better hashing strategies but this just needs to reduce the complexity to lower than brute force. do { hashnext[pos & 0x3fff] = hashhead[hc]; hashhead[hc] = pos; hc -= mem[pos]; hc += mem[pos + 3]; ++pos; } while(--bestlen); Quote Link to comment Share on other sites More sharing options...
Harry Potter Posted June 22, 2022 Author Share Posted June 22, 2022 I had to turn Warp mode off in the emulator, but it seems to work fine. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.