Atari_Ace's Blog - Dealer Demo, part 7: Forward is the new Back

RSS Bot · September 4, 2018

Thus far in disassembling the Dealer Demo, we've run across primitive definitions that have a PFA of .WORD *+2 and then some 6502 assembly code. There were two exceptions I didn't draw attention to at the time, the word 'I' and the word 'DROP'. DROP's PFA was simply ".WORD POP", in other words it took advantage of the fact the POP falls into NEXT, so the semantics of DROP could use that routine directly. I's PFA was similarly ".WORD R+2", in other words it had the same implementation as the word R, i.e. the current loop counter is at the top of the return stack.

What's more interesting about the word I is its implementation is later in the listing. The Forth compiler can't generally deal with forward references. Words are built out of existing words, not words that have yet to be defined, but an assembler doesn't have that limitation, so the Forth kernel which is bootstrapped with an assembler can put the kernel words in almost any order. However, our decompiler does have that limitation. We can't emit a name we don't yet know, so generally our tool will just have to emit a hex address instead and we'll have to remember to go back and fix it later when we discover the label for that address. In the case of 'I', I filled it in without comment at the time, but that was actually when I should have thought about how to address that problem.

So let's revert that bit of future knowledge and write some code that tries to detect when that happens:

We also need to add a little stub in main to invoke this with '-refs'.

$0490: XBOOT: 0488$1098: R+2: 0e56

Sure enough it detected the forward reference we reverted, and another one in the bootloader where I added a label for a branch but then forgot to use the label (oops). So let's fix those now.

We can periodically run this after disassembling to detect when we have put labels in that cover previous forward references.

The reason we're writing this reference code now is we're about to hit a cluster of forward references in the next word to decompile, the implementation of ':'. Running our tool with -def 1234 yields:

124C: A5 F9     DOCOL   LDA IP+1124E: 48                PHA124F: A5 F8             LDA IP1251: 48                PHA1252: 18                CLC1253: A5 FB             LDA W1255: 69 02             ADC #21257: 85 F8             STA IP1259: 98                TYA125A: 65 FC             ADC W+1125C: 85 F9             STA IP+1125E: 4C 42 0D          JMP NEXT

This routine is found at the start of every Forth colon definition (thus the name DOCOL). It takes the current interpreter pointer and pushes it on the stack. It then takes the code field pointer W (which is the location of this DOCOL word, since the we just came from the NEXT routine), adds two and makes that the interpreter pointer. In other words, when you enter a colon definition, you record where to return to and then start interpreting the word after DOCOL. This is very similar to how JSR works, except that the callee pushes the IP, not the caller.

COLON is also the first word we've hit where the length byte has bit 6 set as well as bit 7. This bit is called the precedence bit, and is used during compilation. We won't likely discuss how compilation works in Forth for some time, so for now we can simply note that there's a bit in the definition that marks words that need to be handled differently.

If we put this into our listing and rerun -refs it picks up that COLON starts with DOCOL (as we noted before), but the rest of the implementation is unknown. Most colon definitions end with ".WORD SEMIS", which we noted before restores IP from the stack and calls NEXT. But a handful of words like COLON handle it internally; we'll have to postpone that discussion for COLON till we get to those words, specifically (;CODE).

The next word at $1261 is the word ';' (distinct from ';S'), followed by CONSTANT at $1273. Like COLON, CONSTANT contains some code after a few Forth definitions at $1288, so we need to manually disassemble that and add a DOCON label to it.

 
Next comes VARIABLE at $1293.  It also is followed by a short 6502 routine called DOVAR at $12A4.

 
The next word is USER, at $12B0, with a block of code called DOUSE at 12BD.

 
We now come to a whole bunch of CONSTANT and USER definitions.  Since those have a fixed size, and are then followed by the next definition, we should modify our decompiler to handle that more gracefully for us.
 
We start by adding some helpers before forth_buf.

    if (get_byteq($val)) {      print_byte($buff, $addr, $i);      $i += 1;    }    if (get_defq($val, $val1)) {      print "n";      def_buf(substr($buff, $i), $addr + $i, $size - $i);      last;    }

This is code deserves some explanation. Up to now, forth_buf, has just dumped words of data to add to the listing, switching to disassembling code when the word happened to be *+2. Now it's going to look for a couple of other conditions. If the current word is CLIT or DOUSE, the next value is a BYTE, not a WORD, so emit that next using print_byte. If the current word is SEMIS or DOUSE, or if the last word was DOCON or DOVAR, let's switch to decompiling a definition by calling def_buf. Occasionally for DOVAR this is incorrect as variables can be more than one word, but most of the time, this heuristic works well.

In other words, we've changed forth_buf to determine when to start decompiling a definition automatically in a lot of cases.

Running this new code at $12CC leads to a much nicer output, the definitions for 0, 1, 2, and 3.

Note that we could have done something similar in the primitives, assuming when we saw a JMP NEXT or JMP PUT we're at the end of a definition, but that heuristic isn't as reliable as the one we've just added for non-primitive definitions.

So let's crank through definitions using our newly augmented decompiler. After 1, 2, 3, and 4 we find BL, C/L, FIRST, LIMIT, B/BUF, B/SCR and then HIMEM. HIMEM isn't in the fig-Forth listing, it's the first such word we've seen, so we'll need some conventions for labeling its NFA and PFA. All the NFA's so far have been labeled with L and the line number in the original listing, which while not very descriptive was adopted to follow the original source code as closely as possible. For new words, we instead are going to prefix them with NF, so HIMEM's NFA is simply NFHIMEM, and it's PFA is simply HIMEM.

After HIMEM is +ORIGIN, which is the first traditional colon definition in the listing. By traditional, I mean it starts with DOCOL and ends with SEMIS.

After that are some more words original to the Dealer Demo, ICCM, ICCL, ICCAL, which are CIO constants. Then comes the fig-Forth words EMIT, CR, and ?TERMINAL and KEY.

We then hit the definitions of all the fig-Forth USER variables: TIB, WIDTH, WARNING, FENCE, DP, VOC-LINK, BLK, IN, OUT, SCR, OFFSET, CONTEXT, CURRENT, STATE, BASE, DPL, FLD, CSP, R#, HLD. Each reserves 2 bytes in the USER block, starting at $0A and ending at $31. Dealer Demo then adds two of its own USER variables, INPT and PHYSOFF.

If we now run -refs, we find that we've found two missing forward references:

 dd7.zip (11.43KB)
Number of downloads: 0

http://atariage.com/forums/blog/734/entry-15016-dealer-demo-part-7-forward-is-the-new-back/

Sign In

Atari_Ace's Blog - Dealer Demo, part 7: Forward is the new Back

Recommended Posts

RSS Bot

Link to comment

Share on other sites

Recently Browsing 0 members

Apps

My Activity Streams

More