Dealer Demo, part 7: Forward is the new Back
Thus far in disassembling the Dealer Demo, we've run across primitive definitions that have a PFA of .WORD *+2 and then some 6502 assembly code. There were two exceptions I didn't draw attention to at the time, the word 'I' and the word 'DROP'. DROP's PFA was simply ".WORD POP", in other words it took advantage of the fact the POP falls into NEXT, so the semantics of DROP could use that routine directly. I's PFA was similarly ".WORD R+2", in other words it had the same implementation as the word R, i.e. the current loop counter is at the top of the return stack.
What's more interesting about the word I is its implementation is later in the listing. The Forth compiler can't generally deal with forward references. Words are built out of existing words, not words that have yet to be defined, but an assembler doesn't have that limitation, so the Forth kernel which is bootstrapped with an assembler can put the kernel words in almost any order. However, our decompiler does have that limitation. We can't emit a name we don't yet know, so generally our tool will just have to emit a hex address instead and we'll have to remember to go back and fix it later when we discover the label for that address. In the case of 'I', I filled it in without comment at the time, but that was actually when I should have thought about how to address that problem.
So let's revert that bit of future knowledge and write some code that tries to detect when that happens:
sub fwdrefs { my $fh = open_lst(); my $data = {}; while (<$fh>) { if (/^([0-9A-F]{4}): .*(?:\.WORD|JMP|JSR|BNE|BEQ|BC[CS]|BMI|BPL|BV[CS]) \$([0-9A-F]{4})$/) { my ($addr, $val) = (hex $1, hex $2); push @{$data->{$val}}, $addr if exists $names->{$val}; } } foreach my $key (sort { $data->{$a} <=> $data->{$b} } keys %$data) { my $addrs = join ',', map { sprintf "%04x", $_ } @{$data->{$key}}; printf "\$%04x: %s: %s\n", $key, $names->{$key}, $addrs; } }
We also need to add a little stub in main to invoke this with '-refs'.
elsif ($opt =~ /^\-refs/) { fwdrefs(@_); }
What this code does is record every time an unresolved reference is made in a .WORD, JMP, JSR, or branch that is also in our name table. We then output the address locations and the name we should have used.
If we run this now, we get:
$0490: XBOOT: 0488 $1098: R+2: 0e56
Sure enough it detected the forward reference we reverted, and another one in the bootloader where I added a label for a branch but then forgot to use the label (oops). So let's fix those now.
We can periodically run this after disassembling to detect when we have put labels in that cover previous forward references.
The reason we're writing this reference code now is we're about to hit a cluster of forward references in the next word to decompile, the implementation of ':'. Running our tool with -def 1234 yields:
1234: C1 BA .BYTE $C1,$BA 1236: 26 12 .WORD L813 1238: 4C 12 .WORD $124C 123A: 04 16 .WORD $1604 123C: C0 15 .WORD $15C0 123E: 31 14 .WORD $1431 1240: EF 11 .WORD AT 1242: 24 14 .WORD $1424 1244: 13 12 .WORD STORE 1246: 75 1A .WORD $1A75 1248: 85 16 .WORD $1685 124A: D4 16 .WORD $16D4 124C: A5 F9 .WORD $F9A5
COLON is the first word we've run into that isn't a primitive. It is a sequence of other words. We recognize two of them in the listing as primitives we've already seen, AT and STORE, but most of the other addresses are yet to come. The PFA here starts with .WORD $124C, which is just at the end of the above block. Even without looking at the fig-Forth listing, you can guess it is 6502 code, so let's disassemble that and give it the name DOCOL we find for it the fig-Forth listing.
124C: A5 F9 DOCOL LDA IP+1 124E: 48 PHA 124F: A5 F8 LDA IP 1251: 48 PHA 1252: 18 CLC 1253: A5 FB LDA W 1255: 69 02 ADC #2 1257: 85 F8 STA IP 1259: 98 TYA 125A: 65 FC ADC W+1 125C: 85 F9 STA IP+1 125E: 4C 42 0D JMP NEXT
This routine is found at the start of every Forth colon definition (thus the name DOCOL). It takes the current interpreter pointer and pushes it on the stack. It then takes the code field pointer W (which is the location of this DOCOL word, since the we just came from the NEXT routine), adds two and makes that the interpreter pointer. In other words, when you enter a colon definition, you record where to return to and then start interpreting the word after DOCOL. This is very similar to how JSR works, except that the callee pushes the IP, not the caller.
COLON is also the first word we've hit where the length byte has bit 6 set as well as bit 7. This bit is called the precedence bit, and is used during compilation. We won't likely discuss how compilation works in Forth for some time, so for now we can simply note that there's a bit in the definition that marks words that need to be handled differently.
If we put this into our listing and rerun -refs it picks up that COLON starts with DOCOL (as we noted before), but the rest of the implementation is unknown. Most colon definitions end with ".WORD SEMIS", which we noted before restores IP from the stack and calls NEXT. But a handful of words like COLON handle it internally; we'll have to postpone that discussion for COLON till we get to those words, specifically (;CODE).
The next word at $1261 is the word ';' (distinct from ';S'), followed by CONSTANT at $1273. Like COLON, CONSTANT contains some code after a few Forth definitions at $1288, so we need to manually disassemble that and add a DOCON label to it.
1288: A0 02 DOCON LDY #2 128A: B1 FB LDA (W),Y 128C: 48 PHA 128D: C8 INY 128E: B1 FB LDA (W),Y 1290: 4C 3B 0D JMP PUSH
Constant takes the value after the current ".WORD DOCON" and pushes it onto the stack.
Next comes VARIABLE at $1293. It also is followed by a short 6502 routine called DOVAR at $12A4.
12A4: 18 DOVAR CLC 12A5: A5 FB LDA W 12A7: 69 02 ADC #2 12A9: 48 PHA 12AA: 98 TYA 12AB: 65 FC ADC W+1 12AD: 4C 3B 0D JMP PUSH
It is similar to DOCON, except it pushes the address after the DOCON word, not it's value. So a CONSTANT contains an invariant word, a VARIABLE contains a variant block of data.
The next word is USER, at $12B0, with a block of code called DOUSE at 12BD.
12BD: A0 02 DOUSE LDY #2 12BF: 18 CLC 12C0: B1 FB LDA (W),Y 12C2: 65 FD ADC UP 12C4: 48 PHA 12C5: A9 00 LDA #0 12C7: 65 FE ADC UP+1 12C9: 4C 3B 0D JMP PUSH
The routine takes the byte after the ".WORD DOUSE" and adds it to UP, pushing the result onto the stack. In other words, it computes the address of some data in the user block.
We now come to a whole bunch of CONSTANT and USER definitions. Since those have a fixed size, and are then followed by the next definition, we should modify our decompiler to handle that more gracefully for us.
We start by adding some helpers before forth_buf.
sub print_byte { my ($buff, $addr, $i) = @_; my $b1 = unpack "C", substr($buff, $i, 1); my $sval = sprintf "\$%02X", $b1; $sval = sprintf "\$%01X", $b1 if $b1 < 16; $sval = $b1 if $b1 < 10; my $string = sprintf "%s.BYTE %s\n", get_label($addr + $i), $sval; print sb($addr + $i, $b1), $string; } sub get_byteq { return $_[0] == 0x0d5b || $_[0] == 0x12bd; # CLIT, DOUSE } sub get_defq { my ($val, $val1) = @_; return $val == 0x1045 || $val1 == 0x1288 || # SEMIS, DOCON $val == 0x12bd || $val1 == 0x12a4; # DOUSE, DOVAR }
And then add the following into the inner loop of forth_buf.
if (get_byteq($val)) { print_byte($buff, $addr, $i); $i += 1; } if (get_defq($val, $val1)) { print "\n"; def_buf(substr($buff, $i), $addr + $i, $size - $i); last; }
This is code deserves some explanation. Up to now, forth_buf, has just dumped words of data to add to the listing, switching to disassembling code when the word happened to be *+2. Now it's going to look for a couple of other conditions. If the current word is CLIT or DOUSE, the next value is a BYTE, not a WORD, so emit that next using print_byte. If the current word is SEMIS or DOUSE, or if the last word was DOCON or DOVAR, let's switch to decompiling a definition by calling def_buf. Occasionally for DOVAR this is incorrect as variables can be more than one word, but most of the time, this heuristic works well.
In other words, we've changed forth_buf to determine when to start decompiling a definition automatically in a lot of cases.
Running this new code at $12CC leads to a much nicer output, the definitions for 0, 1, 2, and 3.
12CC: 81 B0 .BYTE $81,$B0 12CE: B0 12 .WORD L902 12D0: 88 12 .WORD DOCON 12D2: 00 00 .WORD 0 12D4: 81 B1 .BYTE $81,$B1 12D6: CC 12 .WORD $12CC 12D8: 88 12 .WORD DOCON 12DA: 01 00 .WORD 1 12DC: 81 B2 .BYTE $81,$B2 12DE: D4 12 .WORD $12D4 12E0: 88 12 .WORD DOCON 12E2: 02 00 .WORD 2 12E4: 81 B3 .BYTE $81,$B3 12E6: DC 12 .WORD $12DC 12E8: 88 12 .WORD DOCON 12EA: 03 00 .WORD 3
We still need to go back and fill in the labels, but we don't have to restart the decompilation for every word as we had to do for the primitives.
Note that we could have done something similar in the primitives, assuming when we saw a JMP NEXT or JMP PUT we're at the end of a definition, but that heuristic isn't as reliable as the one we've just added for non-primitive definitions.
So let's crank through definitions using our newly augmented decompiler. After 1, 2, 3, and 4 we find BL, C/L, FIRST, LIMIT, B/BUF, B/SCR and then HIMEM. HIMEM isn't in the fig-Forth listing, it's the first such word we've seen, so we'll need some conventions for labeling its NFA and PFA. All the NFA's so far have been labeled with L and the line number in the original listing, which while not very descriptive was adopted to follow the original source code as closely as possible. For new words, we instead are going to prefix them with NF, so HIMEM's NFA is simply NFHIMEM, and it's PFA is simply HIMEM.
After HIMEM is +ORIGIN, which is the first traditional colon definition in the listing. By traditional, I mean it starts with DOCOL and ends with SEMIS.
After that are some more words original to the Dealer Demo, ICCM, ICCL, ICCAL, which are CIO constants. Then comes the fig-Forth words EMIT, CR, and ?TERMINAL and KEY.
We then hit the definitions of all the fig-Forth USER variables: TIB, WIDTH, WARNING, FENCE, DP, VOC-LINK, BLK, IN, OUT, SCR, OFFSET, CONTEXT, CURRENT, STATE, BASE, DPL, FLD, CSP, R#, HLD. Each reserves 2 bytes in the USER block, starting at $0A and ending at $31. Dealer Demo then adds two of its own USER variables, INPT and PHYSOFF.
If we now run -refs, we find that we've found two missing forward references:
$1431: CURR: 123e $1424: CON: 1242
So going back and fixing those, we now have a fairly good listing up to address $148B. The next post will pick up from there, I think we've covered enough territory for today.
dd7.zip
0 Comments
Recommended Comments
There are no comments to display.