Jump to content
IGNORED

Game Text Encoding Problem


DanBoris

Recommended Posts

I need some help figuring out something, and thought some of the people here might spot whatever I am missing...

 

I'm playing around with decoding the format of the MSDOS Adventure Construction Set game files. Some of it has been pretty easy to figure out but when I came to the object names I found out that they were compressing this text, but I can't quite figure out the logic. Here is what I know:

 

- Each character can be one of 40 characters (26 letters, 10 digits, and 4 symbols)

- Every three characters is encoded into 2 bytes

- Here are some of the encodings I have seen, showing the character, hex values and binary values:

 

	1 = C0 A8	        11000000 10101000
A = 40 06		01000000 00000110
B = 80 0C               10000000 00001100
C = C0 12               11000000 00010010
D = 00 19		00000000 00011001
E = 40 1F		01000000 00011111
F = 80 25               10000000 00100101

       A =   40 06		01000000 00000110
AA =  38 13             00111000 00010011
AAA = 69 06	        01101001 00000110

ABC = 93 06             10010011 00000110

 

Anyone have any ideas on this?

Link to comment
Share on other sites

Its some form of radix 40

 

e.g.

 

(X-'A')*1

+ (Y-'A')*40

+ (Z-'A')*1600

 

That'll fit into an unsigned 16 bit word.

Yes, I said the same thing, but I lost my internet connection while I was typing my reply, so it got lost when I tried to post it. :(

 

Notice the pattern with some of the values shown:

 

b = 00 00 (I'm guessing that 00 00 is a space) ?

A = 40 06 (add 40 06)

B = 80 0C (add 40 06)

C = C0 12 (add 40 06)

D = 00 19 (add 40 06, then add the carry flag)

E = 40 1F (add 40 06)

F = 80 25 (add 40 06)

G = C0 2B (add 40 06) ?

H = 00 32 (add 40 06, then add the carry flag) ?

etc.

 

Michael

 

Edit: Also, notice that it takes the same number of bytes to code 1, 2, or 3 characters, further pointing to a base-40 system.

 

The only other way I know to code 3 characters in 2 bytes is to split the bits, 5 bits per character, with 1 bit left over, but that gives only 32 characters.

Edited by SeaGtGruff
Link to comment
Share on other sites

Anyone have any ideas on this?

Adding to my previous comments, I suggest looking at the encoding systematically (b = SPACE):

 

bbb = ?? ??

bbA = ?? ??

bbB = ?? ??

bbC = ?? ??

etc.

 

That should give you the values for the 1s place (0 to 39).

 

The rest should be a matter of just multiplying by decimal 40 for the 10s place, or by decimal 1600 for the 100s place, but you could verify that systematically:

 

bAb = ?? ?? (should be the same as bbA times decimal 40)

bBb = ?? ?? (should be the same as bbB times decimal 40)

bCb = ?? ?? (should be the same as bbC times decimal 40)

etc.

 

Abb = 40 06 (should be the same as bAb divided by decimal 40, or bbA divided by decimal 1600)

Bbb = 80 0C

etc.

 

But you'd have to take the carry into consideration, since it appears that the carry might be getting added back to the lo byte?

 

Michael

Link to comment
Share on other sites

Another thought:

 

I think the values shown are lo byte first:

 

Abb = hex 40 06 = $0640 = decimal 1600 = 1*1600

Bbb = hex 80 0C = $0C80 = decimal 3200 = 2*1600

Cbb = hex C0 12 = $12C0 = decimal 4800 = 3*1600

Dbb = hex 00 19 = $1900 = decimal 6400 = 4*1600

etc.

 

Michael

Link to comment
Share on other sites

This seems to work for some, but not all, of the examples you posted:

 

bbb = $0000 = 0

bbA = $0001 = 1

bbB = $0002 = 2

bbC = $0003 = 3

 

bAb = $0028 = 1*40

bBb = $0050 = 2*40

bCb = $0078 = 3*40

 

Abb = $0640 = 1*1600

Bbb = $0C80 = 2*1600

Cbb = $12C0 = 3*1600

 

AAb = $0640+$0028=$0668 -- you gave $1338, or 38 13

AAA = $0640+$0028+$0001=$0669

ABC = $0640+$0050+$0003=$0693

 

By my figuring, $1338 should be CCb, not AAb.

 

Michael

Edited by SeaGtGruff
Link to comment
Share on other sites

Since 1 (presumably 1bb) is encoded as C0 A8, or $A8C0, which is decimal 43200, which is 27*1600, I'm guessing the characters have the following values:

 

b = 0 (space)

A = 1

B = 2

C = 3

D = 4

E = 5

F = 6

G = 7

H = 8

I = 9

J = 10

K = 11

L = 12

M = 13

N = 14

O = 15

P = 16

Q = 17

R = 18

S = 19

T = 20

U = 21

V = 22

W = 23

X = 24

Y = 25

Z = 26

1 = 27

2 = 28

3 = 29

4 = 30

5 = 31

6 = 32

7 = 33

8 = 34

9 = 35

0 = 36

? = 37 (unknown symbol)

? = 38 (unknown symbol)

? = 39 (unknown symbol)

 

These are multipled by 40^0=1, 40^1=40, or 40^2=1600, depending on their position. In example ABC, A is in the 100s place, B is in the 10s place, and C is in the 1s place.

 

Michael

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...