Dealer Demo part 2, Let's make a disassembler!
So to decompile the Dealer Demo, we need to start by peeking at the boot sectors to see how it starts.
000010: SECTOR: 001: FILE: 0 ff 01 80 04 c0 e4 a9 f4-d0 06 00 0d 01 00 80 00 ................ 10 85 f0 a9 52 8d 02 03 ad-8c 04 8d 0a 03 ad 8d 04 ...R............
The first six bytes of the boot sector describes the boot sequence. The first byte is a flag byte (0xff) and the second byte is the number of sectors to load (DBSECT), in this case one. The third and fourth byte are where to start copying the sectors to on boot ($0480). The fifth and sixth bytes is the boot address (BOOTAD), in this case ($e4c0) a pointer to an RTS instruction in 800A/B O.S. Execution resumes after the boot address at the seventh byte.
So the Dealer Demo boot is pretty spartan. It loads a single sector and then that sector has to load the rest of the program. So we need to disassemble the rest of sector 1.
When I first disassembled this, I used Altirra's debugger to do this (after booting the disk, break into the debugger and type "u 480"), but that had a few drawbacks. By default, Altirra will disassemble both valid and "invalid" opcodes, so when the disassembly is going through a mixture of data and code, it will generate a lot of weird disassembly that is really just data. I'd rather when it hit invalid opcodes to just emit the data as .BYTE statements, since the invalid opcodes are rarely used. Another drawback is that it is disassembling what's in memory, and some of the memory by the time we get around to disassembling it has changed. So I decided it was time to move away from Altirra and write my own disassembler.
Writing a disassembler is pretty easy. You need an array containing a definition of each of the 256 possible opcodes. Most of them will just map to .BYTE <value>, but the valid ones will have a name and a mode. The mode determines how much additional data to read (0, 1 or 2 bytes) and how to format the line. Move the pointer forward that many bytes and repeat the process. In other words, something like so:
sub disasm_buf { my ($buff, $addr, $size) = @_; my $opcodes = opcodes(); my $i = 0; for (my $i = 0; $i + 1 < $size;) { my $code = unpack "C", substr($buff, $i, 1); my $def = $opcodes->[$code]; my $count = print_op($buff, $addr, $i, $def); $i += $count; } }
Here all the heavy lifting will be done by print_op, which relies on the definitions from the opcodes() function to know what to print.
What does opcodes look like?
sub opcodes { my $opcodes = []; $opcodes->[$_] = { name => (sprintf ".BYTE \$%02X", $_), mode => 'ILL' } for (0..255); my $modes = { 'IMP' => [ '00BRK', '40RTI', '60RTS', '08PHP', '18CLC', '28PLP', '38SEC', '48PHA', '58CLI', '68PLA', '78SEI', '88DEY', '98TYA', 'a8TAY', 'b8CLV', 'c8INY', 'd8CLD', 'e8INX', 'f8SED', '8aTXA', '9aTXS', 'aaTAX', 'baTSX', 'caDEX', 'eaNOP' ], 'ZP' => [ '24BIT', '84STY', 'a4LDY', 'c4CPY', 'e4CPX', '05ORA', '25AND', '45EOR', '65ADC', '85STA', 'a5LDA', 'c5CMP', 'e5SBC', '06ASL', '26ROL', '46LSR', '66ROR', '86STX', 'a6LDX', 'c6DEC', 'e6INC' ], 'ZP,X' => [ '94STY', 'b4LDY', '15ORA', '35AND', '55EOR', '75ADC', '95STA', 'b5LDA', 'd5CMP', 'f5SBC', '16ASL', '36ROL', '56LSR', '76ROR', 'd6DEC', 'f6INC' ], 'IMM' => [ 'a0LDY', 'c0CPY', 'e0CPX', 'a2LDX', '09ORA', '29AND', '49EOR', '69ADC', 'a9LDA', 'c9CMP', 'e9SBC' ], 'REL' => [ '10BPL', '30BMI', '50BVC', '70BVS', '90BCC', 'b0BCS', 'd0BNE', 'f0BEQ' ], 'ABS' => [ '20JSR', '2cBIT', '4cJMP', '8cSTY', 'acLDY', 'ccCPY', 'ecCPX', '0dORA', '2dAND', '4dEOR', '6dADC', '8dSTA', 'adLDA', 'cdCMP', 'edSBC', '0eASL', '2eROL', '4eLSR', '6eROR', '8eSTX', 'aeLDX', 'ceDEC', 'eeINC' ], 'ABS,X' => [ 'bcLDY', '1dORA', '3dAND', '5dEOR', '7dADC', '9dSTA', 'bdLDA', 'ddCMP', 'fdSBC', '1eASL', '3eROL', '5eLSR', '7eROR', 'deDEC', 'feINC' ], 'ABS,Y' => [ '19ORA', '39AND', '59EOR', '79ADC', '99STA', 'b9LDA', 'd9CMP', 'f9SBC', 'beLDX' ], ')Y' => [ '11ORA', '31AND', '51EOR', '71ADC', '91STA', 'b1LDA', 'd1CMP', 'f1SBC' ], 'X)' => [ '01ORA', '21AND', '41EOR', '61ADC', '81STA', 'a1LDA', 'c1CMP', 'e1SBC' ], 'ACC' => [ '0aASL', '2aROL', '4aLSR', '6aROR' ], 'IND' => [ '6cJMP' ], }; foreach my $mode (keys %$modes) { $opcodes->[hex substr($_, 0, 2)] = { name => substr($_, 2), mode => $mode } for (@{$modes->{$mode}}); } $opcodes; }
Although it looks formidable, it really isn't. The function creates an array of 256 definitions, with a name of ".BYTE $xx" and a mode of 'ILL', for illegal. It then takes data from the $modes hash table to fill in the valid opcodes. For example in the key 'ACC' there is a value '0aASL'. So that expands to $opcodes->[0x0a] = { name => 'ASL', mode => 'ACC' }, or opcode 0x0a is "ASL" in accumulator mode (i.e. ASL A). In other words, I've represented the valid opcode information in a compact form, and used a little code to expand it into a more useful representation.
When I first did this of course I missed a few opcodes, and reversed SEI and SED (which went undetected for a long time because they are rare opcodes) but I've used this routine for several months now and am fairly certain it's accurate now.
print_op is a bit more complicated, but not too horrible. To implement it, let's first write a helper function.
sub sb { if (scalar(@_) == 2) { sprintf "%04X: %02X ", @_; } elsif (scalar(@_) == 3) { sprintf "%04X: %02X %02X ", @_; } elsif (scalar(@_) == 4) { sprintf "%04X: %02X %02X %02X ", @_; } }
This routine is designed to show bytes (thus the name sb) in the first 16 characters of a line: sb(0x480, 0xff, 0x01, 0x80) for instance would show:
0480: FF 01 80
Ok, we're now ready for print_op
our $names = {}; sub print_op { my ($buff, $addr, $i, $def) = @_; my $mode = $def->{mode}; my $sval = ''; my $count = 1; if ($mode eq 'IMM') { my $val = unpack "C", substr($buff, $i + 1, 1); $sval = sprintf "\$%02X", $val; $sval = $val if $val < 10; $sval = " #$sval"; $count = 2; } elsif ($mode eq 'ZP' || $mode eq 'ZP,X' || $mode eq 'ZP,Y' || $mode eq 'IMM' || $mode eq ')Y' || $mode eq 'X)') { my $val = unpack "C", substr($buff, $i + 1, 1); $sval = $names->{$val} || sprintf "\$%02X", $val; $sval = $val if $val < 10; $sval = " $sval" if $mode eq 'ZP'; $sval = " $sval,X" if $mode eq 'ZP,X'; $sval = " $sval,Y" if $mode eq 'ZP,Y'; $sval = " ($sval,X)" if $mode eq 'X)'; $sval = " ($sval),Y" if $mode eq ')Y'; $count = 2; } elsif ($mode eq 'REL') { my ($b1) = unpack "c", substr($buff, $i + 1, 1); my $val = $addr + $i + 2 + $b1; $sval = $names->{$val} || sprintf "\$%04X", $val; $sval = " $sval"; $count = 2; } elsif ($mode eq 'ABS' || $mode eq 'ABS,Y' || $mode eq 'ABS,X' || $mode eq 'IND') { my $val = unpack "v", substr($buff, $i + 1, 2); $sval = $names->{$val} || sprintf "\$%04X", $val; $sval = " $sval" if $mode eq 'ABS'; $sval = " $sval,X" if $mode eq 'ABS,X'; $sval = " $sval,Y" if $mode eq 'ABS,Y'; $sval = " ($sval)" if $mode eq 'IND'; $count = 3; } elsif ($mode eq 'ACC') { $sval = ' A'; } elsif ($mode eq 'IMP' || $mode eq 'ILL') { } else { die "$mode"; } my $string = sprintf "%s%s%s\n", ' ', $def->{name}, $sval; print sb($addr + $i, unpack "C*", substr($buff, $i, $count)), $string; }
So for each mode, we compute the value to append after the definition name. For instance, for 'ACC' we set $sval = ' A', and for 'REL' (a relative branch) we compute the destination and use that for $sval.
In a few spots there is reference to the $names hash. This is an empty hash currently, but eventually we will put well known symbols into it so that we don't always output numbers.
OK, we are almost there. We need a read_file routine of course. We also need a read_img routine that will help us translate addresses to offsets into the file. For now, we'll just hard code that the first sector is 0x480 to 0x4ff and change this in the future.
sub read_file { my ($file, $size, $pos) = @_; $size = -s $file if !defined $size || $size == 0; open my $fh, '<', $file or die "open: $file\n"; binmode $fh; seek $fh, $pos, 0 if defined $pos; my $buff; read $fh, $buff, $size; close $fh; return $buff; } sub read_img { my $buff = read_file('../../atr/dealerdemo.atr', 0x80, 0x10); ($buff, 0x480, length($buff)); }
The path in read_img is for my particular setup, you may need to change it. OK, now let's hook it all up.
sub disasm { my ($buff, $addr, $size) = read_img(@_); disasm_buf($buff, $addr, $size); } disasm(@ARGV);
And run it to get a listing:
0480: FF .BYTE $FF 0481: 01 80 ORA ($80,X) 0483: 04 .BYTE $04 0484: C0 E4 CPY #$E4 0486: A9 F4 LDA #$F4 0488: D0 06 BNE $0490 048A: 00 BRK 048B: 0D 01 00 ORA $0001 048E: 80 .BYTE $80 048F: 00 BRK 0490: 85 F0 STA $F0 0492: A9 52 LDA #$52 0494: 8D 02 03 STA $0302 0497: AD 8C 04 LDA $048C 049A: 8D 0A 03 STA $030A 049D: AD 8D 04 LDA $048D 04A0: 8D 0B 03 STA $030B 04A3: A9 01 LDA #1 04A5: 8D 01 03 STA $0301 04A8: AD 8A 04 LDA $048A 04AB: 8D 04 03 STA $0304
Not perfect, but a great start for only ~160 lines of code. We need to start populating the $names array to make this a bit more readable, and improve read_img so we can target where the disassembly takes place, but we'll talk about how we're going to do that next time.
- 1
3 Comments
Recommended Comments