DanBoris Posted July 17, 2010 Share Posted July 17, 2010 I need some help figuring out something, and thought some of the people here might spot whatever I am missing... I'm playing around with decoding the format of the MSDOS Adventure Construction Set game files. Some of it has been pretty easy to figure out but when I came to the object names I found out that they were compressing this text, but I can't quite figure out the logic. Here is what I know: - Each character can be one of 40 characters (26 letters, 10 digits, and 4 symbols) - Every three characters is encoded into 2 bytes - Here are some of the encodings I have seen, showing the character, hex values and binary values: 1 = C0 A8 11000000 10101000 A = 40 06 01000000 00000110 B = 80 0C 10000000 00001100 C = C0 12 11000000 00010010 D = 00 19 00000000 00011001 E = 40 1F 01000000 00011111 F = 80 25 10000000 00100101 A = 40 06 01000000 00000110 AA = 38 13 00111000 00010011 AAA = 69 06 01101001 00000110 ABC = 93 06 10010011 00000110 Anyone have any ideas on this? Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted July 17, 2010 Share Posted July 17, 2010 Its some form of radix 40 e.g. (X-'A')*1 + (Y-'A')*40 + (Z-'A')*1600 That'll fit into an unsigned 16 bit word. Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted July 17, 2010 Share Posted July 17, 2010 (edited) Its some form of radix 40 e.g. (X-'A')*1 + (Y-'A')*40 + (Z-'A')*1600 That'll fit into an unsigned 16 bit word. Yes, I said the same thing, but I lost my internet connection while I was typing my reply, so it got lost when I tried to post it. Notice the pattern with some of the values shown: b = 00 00 (I'm guessing that 00 00 is a space) ? A = 40 06 (add 40 06) B = 80 0C (add 40 06) C = C0 12 (add 40 06) D = 00 19 (add 40 06, then add the carry flag) E = 40 1F (add 40 06) F = 80 25 (add 40 06) G = C0 2B (add 40 06) ? H = 00 32 (add 40 06, then add the carry flag) ? etc. Michael Edit: Also, notice that it takes the same number of bytes to code 1, 2, or 3 characters, further pointing to a base-40 system. The only other way I know to code 3 characters in 2 bytes is to split the bits, 5 bits per character, with 1 bit left over, but that gives only 32 characters. Edited July 17, 2010 by SeaGtGruff Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted July 17, 2010 Share Posted July 17, 2010 Anyone have any ideas on this? Adding to my previous comments, I suggest looking at the encoding systematically (b = SPACE): bbb = ?? ?? bbA = ?? ?? bbB = ?? ?? bbC = ?? ?? etc. That should give you the values for the 1s place (0 to 39). The rest should be a matter of just multiplying by decimal 40 for the 10s place, or by decimal 1600 for the 100s place, but you could verify that systematically: bAb = ?? ?? (should be the same as bbA times decimal 40) bBb = ?? ?? (should be the same as bbB times decimal 40) bCb = ?? ?? (should be the same as bbC times decimal 40) etc. Abb = 40 06 (should be the same as bAb divided by decimal 40, or bbA divided by decimal 1600) Bbb = 80 0C etc. But you'd have to take the carry into consideration, since it appears that the carry might be getting added back to the lo byte? Michael Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted July 17, 2010 Share Posted July 17, 2010 Another thought: I think the values shown are lo byte first: Abb = hex 40 06 = $0640 = decimal 1600 = 1*1600 Bbb = hex 80 0C = $0C80 = decimal 3200 = 2*1600 Cbb = hex C0 12 = $12C0 = decimal 4800 = 3*1600 Dbb = hex 00 19 = $1900 = decimal 6400 = 4*1600 etc. Michael Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted July 17, 2010 Share Posted July 17, 2010 I think the values shown are lo byte first: Makes sense since the files come from an x86 based machine which is little endian. Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted July 17, 2010 Share Posted July 17, 2010 (edited) This seems to work for some, but not all, of the examples you posted: bbb = $0000 = 0 bbA = $0001 = 1 bbB = $0002 = 2 bbC = $0003 = 3 bAb = $0028 = 1*40 bBb = $0050 = 2*40 bCb = $0078 = 3*40 Abb = $0640 = 1*1600 Bbb = $0C80 = 2*1600 Cbb = $12C0 = 3*1600 AAb = $0640+$0028=$0668 -- you gave $1338, or 38 13 AAA = $0640+$0028+$0001=$0669 ABC = $0640+$0050+$0003=$0693 By my figuring, $1338 should be CCb, not AAb. Michael Edited July 17, 2010 by SeaGtGruff Quote Link to comment Share on other sites More sharing options...
SeaGtGruff Posted July 17, 2010 Share Posted July 17, 2010 Since 1 (presumably 1bb) is encoded as C0 A8, or $A8C0, which is decimal 43200, which is 27*1600, I'm guessing the characters have the following values: b = 0 (space) A = 1 B = 2 C = 3 D = 4 E = 5 F = 6 G = 7 H = 8 I = 9 J = 10 K = 11 L = 12 M = 13 N = 14 O = 15 P = 16 Q = 17 R = 18 S = 19 T = 20 U = 21 V = 22 W = 23 X = 24 Y = 25 Z = 26 1 = 27 2 = 28 3 = 29 4 = 30 5 = 31 6 = 32 7 = 33 8 = 34 9 = 35 0 = 36 ? = 37 (unknown symbol) ? = 38 (unknown symbol) ? = 39 (unknown symbol) These are multipled by 40^0=1, 40^1=40, or 40^2=1600, depending on their position. In example ABC, A is in the 100s place, B is in the 10s place, and C is in the 1s place. Michael Quote Link to comment Share on other sites More sharing options...
DanBoris Posted July 17, 2010 Author Share Posted July 17, 2010 (edited) You guys rock! Thanks! Yes, "AAb = $1338" was a mistake, $0668 is the correct value. Dan Edited July 17, 2010 by DanBoris Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.