The U.K. Atari Computer Owners Club Newsletter, or an issue with issuu
Recently I've been reading the Tyne & Wear Atari User's Group Newsletters which I mostly got from atarimania.com. They are missing a couple of issues, so I was relieved to find another set of scans at at https://www.strotmann.de/~cas/Infothek/ to fill in the gaps. Yay!
Anyhow, there is a long series in that newsletter by Keith Mayhew called "Cracking the Code", which teaches assembly language. I'm always an advocate for more assembly tutorials, and this one apparently is a reprint from another newsletter I'd not heard of before, The U.K. Atari Computer Owners Club Newsletter, or Monitor. So I went looking for that newsletter and found http://www.page6.org/ukacoc/archive/archive.htm, a complete archive of the newsletter, and the newsletter appears to be professionally published (minus the first few issues). Yes!!
I followed the link, and landed on the site issuu.com, where the content is hosted. And started to weep... Issuu.com provides the newsletters, but in a web viewer that won't export the data no matter how nice I ask it (it wants me there to show me endless ads of course). I was hoping to use these scans to patch up some of the listings which are difficult to read in the TWAUG scans, but if I can't get them out of the browser, they're useless to me.
Necessity being the mother of invention, I rolled my sleeves up and tried to figure out how issuu.com delivered the content.
First, let's open a newsletter in any browser that monitors network traffic (I used Microsoft Edge), and look at the requests made while paging through the newsletter. This showed me that the pages of the newsletter were in files called page_1.bin, page_2.bin, all with a similar complex URL from layers.isu.pub for a given newsletter.
Now, let's grab one of these binary files with wget and analyze it. The files start like so:
page_01.bin 0 1f 8b 08 00 00 00 00 00-00 ff 9c bb 7b 38 93 7f ............{8.. 10 fc 3f 7e af c9 88 48 a9-c8 98 b2 52 92 1c 3a 6c .?~...H....R..:l
If you search for "1f 8b 08 …" you'll get hits indicating this is a gzip file. OK, let's gunzip and dump it again:
page_01.xxx 0 08 01 10 02 1a 03 31 38-30 20 a5 08 28 dc 0b 30 ......180 ..(..0 10 01 4a 08 0a 06 10 a5 08-18 dc 0b 5a 90 90 1a 0a .J.........Z.... 20 86 90 1a ff d8 ff e0 00-10 4a 46 49 46 00 01 01 .........JFIF... 30 00 00 01 00 01 00 00 ff-db 00 43 00 08 06 06 07 ..........C.....
OK, that JFIF signature is familiar, it means this is a JPEG file. Although the first 0x23 byte don't look right to me, so I need to rip those off. Fortunately, I wrote a script for that ages ago.
sub cut_file { my ($file, $offset, $size) = @_; $offset = hex $offset if $offset =~ /^0x/; $offset = -hex substr($offset, 1) if $offset =~ /^\-0x/; $size = hex $size if $size =~ /^0x/; my $buff; my $fileSize = -s $file; $offset = $fileSize + $offset if $offset < 0; open my $fh, '+<', $file or die; binmode $fh; seek $fh, $offset + $size, 0; read $fh, $buff, $fileSize - $offset - $size; seek $fh, $offset, 0; print $fh $buff; truncate $fh, $fileSize - $size; close $fh; }
If we put that in a script called cut.pl, we can fix these files via cut.pl page_1.jpg 0 0x23
And sure enough, once I remove the first 0x23 bytes, the file parses and displays as an ordinary JPEG file.
So for a given issue, you can generate a cbz (Comic Book viewer file) by doing something like so:
for /l %i in (1,1,60) do wget --no-check-certificate https://layers.isu.pub/<...>/page_%i.bin for /l %i in (1,1,9) do ren page_%i.bin page_0%i.binren *.bin *.jpg.gz gunzip *.gz for %i in (*.jpg) do cut.pl "%i" 0 0x237z -tzip a ukacoc.cbz *.jpg
where <…> is the URL to the content, and 60 is just a number large enough to get all the pages (early issues had ~30 pages, later ones in the mid-50s).
Now to get to work transcribing some of these...
0 Comments
Recommended Comments
There are no comments to display.