Cracking the Disks, Atari ST version

Entry posted by Atari_Ace September 13, 2020

340 views

In past blog entries I went through the steps developing an 8-bit ATR disk parser so that we could examine and repair damaged disks. Today, I will walk through the creation of a new disk parser, this time for Atari ST disks. The motivation in this case requires some explanation:

My first computer was an Atari 800, and when I went to university I brought an Atari 130XE and Indus GT drive with me. The school had more sophisticated machines (PC's, Macs, various Unix workstations), so before long the Atari was largely relegated to entertainment. When I graduated in 1990, my parents offered to buy me a new computer as a gift. I chose an Apple Mac SE/30. I don't think I even considered an Atari ST for even a moment; the Mac was a reasonably priced 32-bit computer (for a student, yay Apple educational discounts), and was clearly better supported than the Atari.

For the next eight years or so that Mac SE/30 and later a PowerMac 6100/60 were my main home computers. I used the Macs largely for word processing and games, and recall fondly the amazing sound generation capabilities provided through MOD files and other downloadable sound formats. But curiously, in my archives of files from that period, I didn't find any MOD files at all. Disk storage was expensive in the early 1990s, and I was a poor graduate student, so discarding files was common. Still, I usually kept a few examples, but for MOD files, I couldn't find anything.

Searching around the web today you can find MOD file archives, but I really wanted to track down files clearly from that late-80s/early-90s period. That's when it occurred to me that I had a great source in the public domain Atari ST disks. The page6.org site has an extensive collection of Atari ST disks (more than a thousand), and a few of them appear to be MOD file collections.

OK, enough backstory. We have some old disk images and we want to get the files off them. I'm sure there are tools already written for this purpose, but I like to develop my own tools when possible. As is my preference, I'm going to do it in perl, since it is well suited to working with binary files.

ST disks are essentially the same as a PC-DOS formatted disks. A disk starts out with a reserved section (to contain the disk metadata), and is followed by one or more FATs (for File Allocation Table). After the FATs is the root directory, followed by the actual data on the disk. So we need some code to locate all of these sections.

I'll start by borrowing the code from past blog posts to read the file into a buffer, and pull values from the first sector that describe the disk.

  my $rbuff = read_file($file) or return;

  my ($bra, $oem0, $oem1, $oem2, $ser0, $ser1, $bps, $spc, $ressec) 
    = unpack "vvvvvCvCv", substr($$rbuff, 0x00, 0x10);
  my ($nfats, $ndirs, $nsects, $media, $spf) = unpack "CvvCv", substr($$rbuff, 0x10, 0x08);

Although I've named all the values in the first 0x18 bytes, the ones we need to parse the disk are just $bps (bytes per sector), $spc (sectors per cluster), $ressec (reserved sectors), $nfats (number of FATs), $ndirs (number of directories), and $spf (sectors per FAT). Using these we can compute the location of the FATs ($fatOffset), the location of the root directory ($rootOffset) and the location of the disk data ($clusterOffset).

  my $fatSize = $spf * $bps;
  my $fatOffset = $ressec * $bps;
  my $rootOffset = $fatOffset + ($nfats * $fatSize);
  my $clusterOffset = $rootOffset + $ndirs * 0x20;
  my $clusterSize = $spc * $bps;

All data on FAT disks are organized in clusters, which are sequential runs of sectors. Sector sizes were generally fixed at 512 bytes, so by varying the number of sectors per cluster you could support larger disks while keeping the cluster numbers small. Disks used 12-bits for cluster numbers (so 0 - 4095), so with a cluster size of one sector you could support up to about 2MB (some of the high cluster numbers had special meanings, so not every number is valid). Nevertheless, most disks I've examined use a cluster size of two sectors for some reason.

Let's now unpack the FAT. The FAT contains a map that tells you what cluster follows a given cluster, data that on Atari DOS2 disk would be in the last bytes of the sector itself. Since typically files are contiguous runs of clusters, that information is usually in just one or two sectors of the disk, so finding the location of a particular byte in a file doesn't require loading all the preceding sectors, just one or two sectors of the FAT. This design is much more sensible than what is done in Atari DOS.

Anyhow, the only complication for FAT is that the numbers are 12 bits, so every three bytes contain two entries, so the code requires a little bit shifting. FAT uses little endian ordering for data, thus the first byte contains the lower bits of the first entry and the last byte contains the upper bits of the second entry.

sub unpack_fat12 {
  my ($fat) = @_;

  my $map = [];
  for (my $i = 0; $i + 2 < length($fat); $i += 3) {
    my ($a, $b, $c) = unpack "CCC", substr($fat, $i, 3);
    my $val0 = $a | (($b & 0x0f) << 8);
    my $val1 = (($b >> 4) & 0x0f) | ($c << 4);
    push @$map, $val0, $val1;
  }

  $map;
}

  my $fat;
  for (my $i = 0; $i < $nfats; $i++) {
    my $fatBuff = substr($$rbuff, $fatOffset + $i * $fatSize, $fatSize);
    $fat = unpack_fat12($fatBuff);
  }

There are potentially multiple FATs, which all should be duplicates of each other. The code I've written unpacks all of them but simply uses the last one.

Now that we have the FAT map, we create a $disk object now to hold it and the other parameters of the disk (clusterSize, clusterOffset, rbuff) and we pass that around.

We will soon need a utility to get a file or sub-directory from the disk, so let's write that. Like in Atari DOS, directory entries are going to include only the starting cluster, so we need code to get all the clusters. We simply chain from one cluster to the next until we reach an end of cluster marker (a very large cluster offset).

sub get_clusters {
  my ($disk, $scluster) = @_;

  return '' if $scluster == 0;
  my $data = '';
  my $rbuff = $disk->{rbuff};
  my $clusterSize = $disk->{clusterSize};
  while ($scluster < 0xff0) {
    my $offset = $disk->{clusterOffset} + ($scluster - 2) * $clusterSize;
    $data .= substr($$rbuff, $offset, $clusterSize);
    $scluster = $disk->{fat}->[$scluster];
  }

  $data;
}

OK, the home stretch now. We need code to handle directories and files. A directory is just a contiguous sequence of entries, each of which is 0x20 bytes in size. Those entries are generally either files or sub-directories. Whether it's a file or a directory, it contains the starting cluster of the data, so we can use get_clusters to get the data. So we end up with the following code for reading directories and entries.

sub read_entry {
  my ($disk, $entry, $path) = @_;

  my $first = unpack "C", substr($entry, 0, 1);
  return if $first == 0;
  return 1 if $first == 0xe5; # deleted
  my $name = get_name($entry, 0);
  my ($attrib) = unpack "C", substr($entry, 0x0b, 1);
  my ($dostime, $scluster, $fsize) = unpack "VvV", substr($entry, 0x16, 10);
  my $root = get_clusters($disk, $scluster);
  if ($fsize > 0) {
     my $new = substr($root, 0, $fsize);
     write_file($name, \$new);
  }

  my $fullName = $path eq '' ? $name : "$path\\$name";
  my $directory = $attrib & 0x10;
  print "$fullName:$attrib:$scluster:$fsize\n" if !$directory;
  if ($directory && $name ne '.' && $name ne '..') {
    read_directory($disk, $root, $fullName);
  }

  return 1;
}

sub read_directory {
  my ($disk, $root, $path) = @_;

  my $offset = 0;
  while (1) {
    my $entry = substr($root, $offset, 0x20);
    last if !read_entry($disk, $entry, $path);
    $offset += 0x20;
  }
}

get_name is a routine to get the file name from the 11 bytes padded with spaces, and write_file writes a data file. Our main routine needs to invoke read_directory on the root directory now to extract all the files. Notice that when read_entry finds a sub-directory, it calls read_directory, recursively traversing the directory hierarchy.

So that's it, a ~150 line utility to extract the contents of an ST disk.

Let's apply it now. At http://www.page6.org/st_lib/standard/st0627.php is the "The Module Collection 1", which contains a number of MOD files. Running the tool will extract them, and the contents of the READ_ME.PDQ file is good, so I'm sure the tool is working successfully. That said, xmplay (which should support old MOD files) won't play the files so I have more research ahead of me to figure out how these files differ from other MOD files.

Although I moved away from the Atari and onto Apple computers back then, I have since developed an interest in the Atari ST (largely from reading old user group newsletters) and hope that this tool will help me extract old text files, images and sound files to get a taste of the Atari ST scene of the time. Even if I don't do much more with ST images, it was fun writing this little tool.

stdisk1.zip