Jump to content
IGNORED

cc65 users: printtokatari.c


Recommended Posts

Hi!  I'm sorry for SPAMming, but I really want to share one of my files.  I have a program for cc65 that prints strings with embedded tokens and expands the tokens to full strings.  The program is very small, so the tokenization is likely to help.  It decreased the size of one of my text adventures' string data by about 13%.  It also uses my AtaSimpleIO library to minimize overhead.  It is located at c65 additions - Manage /ui at SourceForge.net and called printtok_atari.c.  Try it out!

Link to comment
Share on other sites

I just looked at your post, and it seems that my code is more efficient than his but doesn't include BPE.  Also, I only have room for 32 tokens.  On an Atari, I could use 0xA0 to 0xFF for the tokens, but I'm also coding for the CBM computers, on which cc65 seems to convert literals from ASCII instead of leaving it in PETSCII.  If I don't need upper-case, I can use A-Z as tokens or BPE codes, but a text adventure I'm coding uses upper-case.  I can add BPE and extend the codes as said on an Atari.  On a CBM, I can add 26 tokens to the mix if upper-case is not used.  Someone on the Denial Vic20 forum gave me the idea to use Static Huffman codes to shorten literals. What else can I do to improve printtok()?

Link to comment
Share on other sites

Your approach seems to imply that you intend to hand code the strings which would seem over complicated.

You should really have a separate application that can input a file of text, e.g. per line, and encode them for you.

For speed in decoding this can prepend an offset and/or length table to access a string quickly.

 

If you think back to other compression techniques, you can move away from single tokens and use byte pairs, i.e. a token and an argument.

This was you can have encoding that searches the existing string table for already encoded text and employ a token that takes the string number as its argument. 

 

Link to comment
Share on other sites

I thought about a byte pair to indicate a token, but I think my method is more efficient.  I was thinking about creating code to compress the text automatically.  If I can do that, I can add Huffman codes and Placement Offset to the technique.  I'm not prepared to do this just yet.  Is there anything else I can do to improve this function?

 

BTW, if you want to know what Placement Offset is, just ask.

Link to comment
Share on other sites

I don't see it as really practical as it stands. Approaches like the mentioned Elite, or even that employed by the Infocom adventures are better choices IMO, even using any existing packer tool with a common key would suffice. 

In terms of the file itself, the user would have to edit the 'tokens' array their own purpose. In a 'module', the user would be better off passing their own array during initialisation or it could instead be declared as extern.

The comments reference the range 0x80 to 0x9F, whereas the code uses 0xA0 to 0xBF, so that needs to be corrected.

You are exploiting the fact that CC65 sets char to unsigned char by default (can be reversed via a command option). So having 'char i' and 'i>=0xA0' is non portable. Maybe adopt stdint in your work, e.g. uint8_t

Remove redundant code.

Link to comment
Share on other sites

  • 2 weeks later...

I am currently working on a better version of the function.  Right now, I have the lits compressor and the output writer started.  I plan to use a modification of Infocom's technique and tokenization. There's no room for BPE; LZ77 can't be used, as it would need to reference previous strings, and they are compressed; my Placement Offset Basic technique would cost more than help, and Huffman would require extra complexity.  Does anybody here have other ideas on how I can compress text?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...