Atari_Ace's Blog - Dealer Demo, part 4: Some Forth at last

RSS Bot · September 1, 2018

So we spent the last two posts working on tooling to reverse engineer the Dealer Demo, and got the bootloader disassembled for study. Now we can start disassembling the Forth kernel. Let's start with "-dis d00 8":

0D08: 00 00             .WORD 00D0A: 01 00             .WORD 10D0C: 00 00             .WORD 00D0E: 7F 00             .WORD $7F0D10: 80 04             .WORD $04800D12: EC 00             .WORD $EC0D14: FF 01             .WORD $01FF0D16: 00 01             .WORD $01000D18: 1F 00             .WORD $1F0D1A: 00 00             .WORD 00D1C: FB 86             .WORD $86FB0D1E: FB 86             .WORD $86FB0D20: 01 1C             .WORD $1C01

Of these, we can identify $EC as the TOS (top of stack), $86FB is the pointer to the top of memory (TOP), $1C01 should be the initial vocabulary pointer (VL0).

The next bit of data should be the first definition, the LIT word. In fig-Forth, all definitions start with a byte declaring the length of the word in the bottom 5 bits, then the word itself with the high bit set on the last letter, although in rare cases there will be an extra byte to avoid having the definition fall on a $..FF boundary (why will become clear later). This is called the name field, and its address is referenced as the NFA. There are then two words, the first is the link field which points to the last definition (and is called the LFA). The next word is the parameter field (at the PFA), which points to the code to run for this word (this is the value that must not be at $..FF). For primitive words, it is usually just .WORD *+2, meaning that the code immediately follows the definition.

Knowing this we could slowly disassemble the code definition by definition by hand, but we want our tools to do as much of this as possible for use, so let's implement a definition decompiler.

The first thing we need is a get_string method from a buffer. It should output something that can be appended to a a .BYTE directive. In most cases, surrounding the values with single quotes should be adequate, but recall that the last byte in a definition string will have it's high bit set and thus be unprintable, so let's handle that case in the most general fashion possible so we can use the code for other purposes later:

OK, now let's define def_buf, the routine that will parse a definition:

This looks for the length byte first, if the value found doesn't look like a length byte, we go to our general parser called forth_buf (which mostly just dumps .WORDs of data, more on that shortly).  It then adjusts the size if we're at the $..FF boundary, gets the string, passes it to multi_buf to output, and then resumes in the general forth_buf parser.
 
So what does forth_buf look like:

;  my $sval = sprintf "$%04X", $val;  $sval = sprintf "$%02X", $val if $val < 256;  $sval = $names->{$val} if exists $names->{$val};  $sval = sprintf "$%01X", $val if $val < 16;  $sval = $val if $val < 10;  $sval = '*+2' if $addr + $i + 2 == $val;  my $string = sprintf "%s.WORD %sn", get_label($addr + $i), $sval;  print sb($addr + $i, $b1, $b2), $string;  $val;}sub forth_buf {  my ($buff, $addr, $size) = @_;  my ($val, $val1) = (-1, -1);  for (my $i = 0; $i + 2 < $size; ) {    $val1 = $val;    $val = print_word($buff, $addr, $i);    $i += 2;    if ($addr + $i == $val) {      disasm_buf(substr($buff, $i), $addr + $i, $size - $i);      last;    }  }}sub forth {  my ($buff, $addr, $size) = read_img(@_);  forth_buf($buff, $addr, $size);}

It calls print_word to print the word sensibly. When the word is *+2, we switch to disasm_buf, otherwise we call print_word again on the next word.

Now to hook it up into main:

     =00EC      TOS = $EC     =00F0      N = $F0     =00F8      IP = N+8     =00FB      W = IP+3     =00FD      UP = W+2     =00FF      XSAVE = UP+2

We can now start disassembling and inserting the labels as we go along from the fig-Forth listing to get a more readable listing.

Continuing disassembling we see the PUSH, PUT, NEXT, CLIT and SETUP routines. The first difference between Dealer Demo and the published fig-Forth can now be seen. The NEXT routine doesn't have a JSR TRACE call and all the associated tracing code has been omitted. This is to be expected, since that code is there for the initial debugging of the kernel during bootstrapping, there's no need to keep it once the Forth is up an running.

There is another small difference, the single byte literal command CLIT is implemented, but the name and link for it have been dropped so you can't use it. Nonetheless we'll see it used in the code later, so how it got removed is a bit of a mystery.

NEXT is the most important routine in all of Forth, so it's helpful to study it for a moment.

-def  WORD     NFA   PFA    same as fig-Forth?d74   EXECUTE  L75   EXEC   Yesd8d   BRANCH   L89   BRAN   Yesdab   0BRANCH  L107  ZBRAN  Yesdcd   (LOOP)   L127  PLOOP  Yesdfc   (+LOOP)  L154  PPLOO  Yese36   (DO)     L185  PDO    Yes

In each case we have to add at least a couple of labels to the list (one at the NFA, and then one two lines later at the PFA), more if there are branches in the code. In a few cases, we need to revert a symbol inserted to better match the fig-Forth listing, but largely the tool we have is doing much of the work once we recognize the start of a definition.

In all of these cases, the listing matches the implementation in the original fig-Forth listing.

As we've been disassembling, we have also been modifying set_name to cover more cases. In particular W-1, N-1, N+1, N+2, ..., N+7, NEXT+2, and BRAN+2 should be recognized, so we've add the following lines to set_name:

That's enough for now, I'm attaching our progress and we'll pick it up again next time.

Attached File(s)

dd4.zip (5.1KB)
Number of downloads: 0

http://atariage.com/forums/blog/734/entry-15007-dealer-demo-part-4-some-forth-at-last/

Sign In

Atari_Ace's Blog - Dealer Demo, part 4: Some Forth at last

Recommended Posts

RSS Bot

Link to comment

Share on other sites

Recently Browsing 0 members

Apps

My Activity Streams

More