Game Text Encoding Problem

DanBoris · July 17, 2010

I need some help figuring out something, and thought some of the people here might spot whatever I am missing...

I'm playing around with decoding the format of the MSDOS Adventure Construction Set game files. Some of it has been pretty easy to figure out but when I came to the object names I found out that they were compressing this text, but I can't quite figure out the logic. Here is what I know:

- Each character can be one of 40 characters (26 letters, 10 digits, and 4 symbols)

- Every three characters is encoded into 2 bytes

- Here are some of the encodings I have seen, showing the character, hex values and binary values:

	1 = C0 A8	        11000000 10101000
A = 40 06		01000000 00000110
B = 80 0C               10000000 00001100
C = C0 12               11000000 00010010
D = 00 19		00000000 00011001
E = 40 1F		01000000 00011111
F = 80 25               10000000 00100101

       A =   40 06		01000000 00000110
AA =  38 13             00111000 00010011
AAA = 69 06	        01101001 00000110

ABC = 93 06             10010011 00000110

Anyone have any ideas on this?

GroovyBee · July 17, 2010

Its some form of radix 40

e.g.

(X-'A')*1

+ (Y-'A')*40

+ (Z-'A')*1600

That'll fit into an unsigned 16 bit word.

SeaGtGruff · July 17, 2010

Its some form of radix 40

e.g.

(X-'A')*1

+ (Y-'A')*40

+ (Z-'A')*1600

That'll fit into an unsigned 16 bit word.

Yes, I said the same thing, but I lost my internet connection while I was typing my reply, so it got lost when I tried to post it.

Notice the pattern with some of the values shown:

b = 00 00 (I'm guessing that 00 00 is a space) ?

A = 40 06 (add 40 06)

B = 80 0C (add 40 06)

C = C0 12 (add 40 06)

D = 00 19 (add 40 06, then add the carry flag)

E = 40 1F (add 40 06)

F = 80 25 (add 40 06)

G = C0 2B (add 40 06) ?

H = 00 32 (add 40 06, then add the carry flag) ?

etc.

Michael

Edit: Also, notice that it takes the same number of bytes to code 1, 2, or 3 characters, further pointing to a base-40 system.

The only other way I know to code 3 characters in 2 bytes is to split the bits, 5 bits per character, with 1 bit left over, but that gives only 32 characters.

Edited July 17, 2010 by SeaGtGruff

SeaGtGruff · July 17, 2010

Anyone have any ideas on this?

Adding to my previous comments, I suggest looking at the encoding systematically (b = SPACE):

bbb = ?? ??

bbA = ?? ??

bbB = ?? ??

bbC = ?? ??

etc.

That should give you the values for the 1s place (0 to 39).

The rest should be a matter of just multiplying by decimal 40 for the 10s place, or by decimal 1600 for the 100s place, but you could verify that systematically:

bAb = ?? ?? (should be the same as bbA times decimal 40)

bBb = ?? ?? (should be the same as bbB times decimal 40)

bCb = ?? ?? (should be the same as bbC times decimal 40)

etc.

Abb = 40 06 (should be the same as bAb divided by decimal 40, or bbA divided by decimal 1600)

Bbb = 80 0C

etc.

But you'd have to take the carry into consideration, since it appears that the carry might be getting added back to the lo byte?

Michael

SeaGtGruff · July 17, 2010

Another thought:

I think the values shown are lo byte first:

Abb = hex 40 06 = $0640 = decimal 1600 = 1*1600

Bbb = hex 80 0C = $0C80 = decimal 3200 = 2*1600

Cbb = hex C0 12 = $12C0 = decimal 4800 = 3*1600

Dbb = hex 00 19 = $1900 = decimal 6400 = 4*1600

etc.

Michael

GroovyBee · July 17, 2010

I think the values shown are lo byte first:

Makes sense since the files come from an x86 based machine which is little endian.

SeaGtGruff · July 17, 2010

This seems to work for some, but not all, of the examples you posted:

bbb = $0000 = 0

bbA = $0001 = 1

bbB = $0002 = 2

bbC = $0003 = 3

bAb = $0028 = 1*40

bBb = $0050 = 2*40

bCb = $0078 = 3*40

Abb = $0640 = 1*1600

Bbb = $0C80 = 2*1600

Cbb = $12C0 = 3*1600

AAb = $0640+$0028=$0668 -- you gave $1338, or 38 13

AAA = $0640+$0028+$0001=$0669

ABC = $0640+$0050+$0003=$0693

By my figuring, $1338 should be CCb, not AAb.

Michael

Edited July 17, 2010 by SeaGtGruff

SeaGtGruff · July 17, 2010

Since 1 (presumably 1bb) is encoded as C0 A8, or $A8C0, which is decimal 43200, which is 27*1600, I'm guessing the characters have the following values:

b = 0 (space)

A = 1

B = 2

C = 3

D = 4

E = 5

F = 6

G = 7

H = 8

I = 9

J = 10

K = 11

L = 12

M = 13

N = 14

O = 15

P = 16

Q = 17

R = 18

S = 19

T = 20

U = 21

V = 22

W = 23

X = 24

Y = 25

Z = 26

1 = 27

2 = 28

3 = 29

4 = 30

5 = 31

6 = 32

7 = 33

8 = 34

9 = 35

0 = 36

? = 37 (unknown symbol)

? = 38 (unknown symbol)

? = 39 (unknown symbol)

These are multipled by 40^0=1, 40^1=40, or 40^2=1600, depending on their position. In example ABC, A is in the 100s place, B is in the 10s place, and C is in the 1s place.

Michael

DanBoris · July 17, 2010

You guys rock! Thanks!

Yes, "AAb = $1338" was a mistake, $0668 is the correct value.

Dan

Edited July 17, 2010 by DanBoris

Sign In

Game Text Encoding Problem

Recommended Posts

DanBoris

Link to comment

Share on other sites

GroovyBee

Link to comment

Share on other sites

SeaGtGruff

Link to comment

Share on other sites

SeaGtGruff

Link to comment

Share on other sites

SeaGtGruff

Link to comment

Share on other sites

GroovyBee

Link to comment

Share on other sites

SeaGtGruff

Link to comment

Share on other sites

SeaGtGruff

Link to comment

Share on other sites

DanBoris

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Apps

My Activity Streams

More