Geneve OS development discussion

+Ksarul · March 4

3 hours ago, 9640News said:

All Jerry did was have a disk with a modified skew and an interlace of one where system/SYS was written. I wrote about that in 9640 News at the time. You could create your own disk with HyperCopy.

Won't Jump Boot do this as well? I'm pretty sure that one was based on his code. . .

+9640News · March 4

1 hour ago, Ksarul said:

Won't Jump Boot do this as well? I'm pretty sure that one was based on his code. . .

I thought that was what Dan was referencing as far as the "disk" that was sold. JumpBoot was just a specially formatted disk with interlace/skew that could be created with HyperCopy.

+InsaneMultitasker · March 4

I recall an article that I believe was written by Jerry Coffey, that described how to make the jump boot disk. This was not a MP article but some other format, either in a newsletter or a DV80 article. Anyway, it involved a few different steps to ensure the correct, fastest format. I doubt that I still have the printed article in my archives but I will check anyway.

+dhe · March 4

Mike Maksimik did a lot of testing on different drives and published a chart, I think in the Chicago Times, of different head step times vs skew and interlace to get the fastest boot. I asked @InsaneMultitasker because he borrowed my serialized JumpBoot disk, and I thought while poking at other floppies, that might have made it to the look at list.

+InsaneMultitasker · March 4

2 hours ago, dhe said:

Mike Maksimik did a lot of testing on different drives and published a chart, I think in the Chicago Times, of different head step times vs skew and interlace to get the fastest boot. I asked @InsaneMultitasker because he borrowed my serialized JumpBoot disk, and I thought while poking at other floppies, that might have made it to the look at list.

Yep, I have the disk - it is in the last case that I have been working through. I will be down to 10 or fewer disks soon. To be honest, I am always amazed at the disks’ longevity; some of them are squeaking as they turn but overall, even the disks from the late 80s are reading just fine. And for some files I've tested, like Infocom and MDOS, there has been no bit rot. I've steered clear of disk drive head cleaners these past 30+ years, which makes me wonder if those were all a ruse to sell to the masses or if I just got lucky with the various drives. But I digress.

+InsaneMultitasker · March 9

On 3/3/2024 at 10:54 PM, InsaneMultitasker said:

Noting here for future reference:

While going through my disks, I came across a few that had bad SYSTEM/SYS files. I had written notes on the sleeve and/or label stating that the file was bad yet MDOS and directory managers reported the file copied without an error. This is a nasty bug so far as updating the same file succeeds without detecting the disk error. One label suggests the issue only occurs with single density but I have a double density disk with what seems to be the same problem.

I grabbed the single density disk and tested as follows:

1. Copy SYSTEM/SYS from floppy to another device. FAILS. Bad sector encountered.

2. Load Birdwell's Disk Utilities (/4a mode)

- copy fails

- sector read of all 9 affected sectors fails

3. Load SECTOR ONE (mdos sector editor)

- sector read of all 9 affected sectors fails

4. Attempt to write a known good sector to one of the bad sectors in /4a and Geneve modes

- fail to write

- does not 'fix' the sector; still unreadable

The sector write and binary write routines both use the same core sector IO routines. I surmise the problem is in the binary write routine and/or the error trap return to the OS, but nothing stands out to me. I identified a few routines where I can add debug output to better understand what is happening when I get some free time.

I tracked down the problem. This bug is very old and very naughty.

During the "save program" and "binary write" operation, the DSR is told to write a certain number of sectors. The DSR writes sectors until the buffer is exhausted or until an error is encountered. When the DSR encounters a write error due to bad sector, open drive door, etc. the write loop exits with an error code. The OS then tries to update the FCB (FDR) for the file. Unfortunately, the OS does not save the previous error condition. When the FCB is written, the previous error byte is overwritten. To the caller, the write operation is typically reported as successful. This can happen multiple times per file.

Where is the bug?

The low-level floppy controller routine sets the error condition. If the FCB/FDR write is successful, the error is cleared; if it fails, an error is reported but it might not match the earlier error. PABERR is the universal reporting byte for the DSR. R9 is cleared and in most cases, the write succeeds, so the earlier error vanishes.

(L8.SECT3 IO and L8.FORMAT return)

GODRT

BADRT BL @RESPA1 RESTORE PAD
CLR R9 ZERO DSR LINK ERROR CODE
MOVB R10,@MYPAD+>50 STORE ERROR CODE IF ANY
JEQ NE BRANCH IF NO ERROR
LI R9,>C000 BAD OP 6
CB @HB34,R10 WRITE PROTECT?
JNE NE
LI R9,>2000
NE MOVB R9,@PABERR for 9640 mode format error code

The "simple solution" is to give the first error priority over the FCB write condition. I also need to check the record write routines for a similar problem. We now have an explanation for some (many?) of the incomplete files over the years. File managers, terminal emulators, Archiver, BASIC/XB, COPY command, and programs that use SAVE opcode 0x06 or direct output/bwrite opcodes are susceptible to this bug.

+InsaneMultitasker · March 9

Initial tests show that trapping the first error code works; MDOS now reports an error when the bad sector is encountered.

I loaded DSKU and tried to copy to the same file. DSKU halted with an error, suggesting the /4a-mode DSR routine is immune to the bug. I suppose this makes sense since the above routine is only modifying the MDOS error code. But it also means I must walk through the MDOS level 3 record IO routine as it could incorporate the same bug. To test, I'll create a fake file and FDR, with clusters that overlap the SYSTEM/SYS bad sectors. I can then open the file as relative/fixed and see what happens during a write operation. (I suppose the same test is required for SAVE opcode >06).

At first glance, I don't see a similar bug in the corresponding hard drive BWRITE routine but I have no (easy) way to test it without a known, bad file on hard drive media.

+Ksarul · March 9

Excellent sleuthing, @InsaneMultitasker!

+9640News · March 9

@InsaneMultitasker

Let me know with file updates when you are confident you have everything worked out.

I wonder if you format a disk as DS/SD then format SS/DD, then update sector 0 so the disk thinks it is DS/DD whether that will be sufficient to create a "bad" disk when it tries to write a file beyond sector 720.

Beery

+InsaneMultitasker · March 9

2 minutes ago, 9640News said:

Let me know with file updates when you are confident you have everything worked out.

I'm not sure about confident SAVE jumps into code for close/restore, and I see similar routines used to update the FCB. However, it looks as if the original error from the trap is preserved unless there is a media error, in which case the latter is reported. This leads me to wonder why the MDOS bwrite exits the DSR via a different path and if that is indeed the problem for the >0B opcode. I've almost run out of time for today.

+InsaneMultitasker · March 9

@9640News if you have a moment, please inspect L8.SYSRTN and the SYSRTN routine at the start of the file.

CLOSEF, when set, is intended to flush the VIB. However, the later VIBCHG flag test can override CLOSEF. It seems to me if CLOSEF is set, then the VIBCHG test should be skipped so the flush occurs. Another potential problem is that when the CLOSEF is not set and there is an open file, the routine jumps to SYSRT1 and bypasses the VIBCHG test. The two flags are used in different routines and for different purposes. Do you recall the reason for implementing VIBCHG.

+9640News · March 9

@InsaneMultitasker

Hmm. I am wondering if it may have something to do with having two or more files open simultaneously and either getting an error, out of diskspace, or closing one file but not the other????

+InsaneMultitasker · March 9

1 hour ago, 9640News said:

@InsaneMultitasker

Hmm. I am wondering if it may have something to do with having two or more files open simultaneously and either getting an error, out of diskspace, or closing one file but not the other????

I think some of that comes into play. The flags are set in different ways leading to confusion. And VIBCHG is byte value, so sometimes VIBID comes along for the ride. To put another way, there are actually three flags at play: CLOSEF, VIBCHG, and VIBID. From my tests in GPL and MDOS and ABASIC, I think the BWRITE issue is resolved. I confirmed some of the operations via debug output. I'm mostly satisfied with the fix and the code, though I still want to try to test the SAVE and record IO with the bad sectors.

I included some of my very raw notes at the end.

1. CLOSEF=>FFFF, VIBCGH=>02, VIBID=>01 (disk number). This is the VIB flush that occurs when a FCB is written via BWRITE. This is good.

2. This is the end of the failed copy operation. TRAP4 shows error code 0xC000. CLOSEF and VIBCHG are cleared, VIBID is 01. MDOS reports the error.

3. DF80 fixed record IO. The log shows that we opened the file (opcode >00), write 5 records (opcode >03) and closed (opcode >01).

The VIB is flushed upon open (to preserve the file attributes and update bitmap) and upon closure. VIBCHG looks random but its value depends on the called routines.

4. SAVE operation under ABASIC

CLOSEF and VIBCHG are set, so a flush occurs to DSK9. the opcode is >06.

5. /4a mode GPL routine >15, direct output. The >FF in the PAB indicates VDP memory, as expected.

CLOSEF is >FFFF and VIBCHG is >02. This flushes the VIB during creation of the file on DSK1.

Notes related to the flags, where they are set/cleared, and general observations:

CLOSEF - when set, flushes VIB sector 0. Set for delete, close, and a few other opcodes. (CLOSE, DELETE, SAVFCb for >15, OPENUPdate,
VIBCHG - used to also flag VIB alter. This may incorrectly override the forced update via CLOSEF. See OPEN&LS2: COPVIB . Rename. GETVIB. GETSEC. I think
that the SYSRTN should be skipping the VIB test if the flag is set.

VIBCHG/VIBID:
GETAUS: calls getvib then getsec. This is where VIB flush is set, in spite of the
CLOSEF flag. So in theory, all direct output will flush VIB?
GETSEC: changes VIBCHG MSByte to the bitmap mask via GS$J2
GETVIB: changes VIBCHG to the VIBID (drive?) 16-bit value| clears if error
DELETE-P: changes VIBCHG MSByte to >01xx
OPEN&LS2: RENDR9 to indicate change;
BSECW1 binart secwrite, sets to H0001. (save)
COPVIB, sets to HFF
SYSRTN: BUDOP4 clears both vib and close flag
SYSR11, clears both
CLOSFL, tests the VIBCHGH MSByte only! So sometimes, MSByte is >00xx

CLOSEF is set at
CLORESSTS:CLOSE (for closure),
DELETE:DENTRY (forced update) + vibchg,
FILEOP!:SAVFCB forced update for GPL>15 and BWRITE creation
FILEOP!:OPENUP forced update for Open,save, maybe others.
HDR2-P2: OPOK, clears flag before entry into the device table call
SYSRTN: SYSRTN, tests flag, then looks at ID/FDR/open file/etc.
if set, skip test for open file
check VIBCHG byte, if 0, skip flush; if <>0, flush.
SYSRTN: BUDOFF, clear flag

During bwrite, CLOSEF >FFFF and VIBCHG might come from GETSEC. but save/write set vibchg to >0001, so there is no flush. FILEOP: OPENFI, creates fcb

Note: this is a byte value. See l8.layout
VIBCHG BYTE 0 FLAG 0=VIB NOT ALTERED, 0<>VIB ALTERED
VIBID BYTE 0 DRIVE NUMBER OF VIB IN VIB BUFFER
FDRVID BYTE 0 drive for the file, tested in SYSRTN

+InsaneMultitasker · March 9

3 hours ago, 9640News said:

I wonder if you format a disk as DS/SD then format SS/DD, then update sector 0 so the disk thinks it is DS/DD whether that will be sufficient to create a "bad" disk when it tries to write a file beyond sector 720.

I changed the FDR (file descriptor record) of SYSTEM/SYS from a program image file to a DF255 file. I then wrote a small ABASIC program to read the last 30 records of the file. When the program tried to read the first bad record (bad sector), the DSR reported the error as >C0 and returned to ABASIC, which reported an error 26. I then attempted to write to the file; the first bad record generated an error >C0 and ABASIC error 36. So I'm pretty confident record IO is ok. Later today, I will manipulate a small program file's cluster nybbles to point to the bad sectors. With luck, the save opcode will generate an error but I won't bet on it.

+InsaneMultitasker · March 10

SAVE doesn't report an error. There is yet another place where the error byte is cleared improperly. MDOS now captures the original error condition and I confirmed the trap works for both SAVE (>06) and BWRITE (>0B). Both Advanced BASIC (MDOS mode) and Extended BASIC (/4a mode) report IO Error 66.

@9640News I had trouble with ABASIC crashing the system until I removed some of the debug statements. I am fairly certain at least one DSR section is near the 8k page boundary or close to stomping on an AORG'd/DORG'd area.

+9640News · March 10

Yeah, I think a couple of those pages are within a handful of bytes. Do you have the LINK script with the extra EVAL statements at the end that shows where the page? You can use this link to get the updated LINKer command file at ABasic/!ABLINK.txt at main · BeeryMiller/ABasic (github.com) or the info you need is shown below.

EVAL P7ENDB From file 166\CALL2-SRC
EVAL P7ENDA From file 167\SYMBOL-SRC
EVAL P6END From file 165\CALL1-SRC
EVAL P8END From file 168\NUD2-SRC
EVAL P2END From file 162\NUD1-SRC
EVAL P4END From file 164\CMD-SRC

+InsaneMultitasker · March 11

All done! I added conditional assembly directives to where I incorporated the debug output. The code all assembles and links, and I have confirmed that the error trap is working as expected. I'll get the files to you in the near future. Last but not least, here is an image of the disk I saved hoping that one day the root cause would be found.

+InsaneMultitasker · March 17

Today I separated the table-driven CRC routine from the CRCOS program so that I could use it within other programs, including the PFM and CYA utilities.

In the process of testing the programs, I observed two unexpected results:

1. Protecting a file does not inhibit BWRITE (direct file write) from writing to the blocks within the file. I was surprised when instead of an error, my program successfully wrote new CRC values to a protected OS file! I haven't confirmed whether the file can be overwritten though I suspect it is possible. I need to test the same operation on a floppy. Does anyone know how the /4a DSRs behave?

2.Writing to a file with BWRITE does not update the file's last modified date. I have this vague memory that long, long ago the timestamp update was disabled in the interest of speed. I'm not sure of the best approach nor the right approach. I suppose this explains why sector-editing a file doesn't change the timestamp, though I never had cause to notice, until now.

+InsaneMultitasker · March 17

PFMSYS and CYA have been updated with the faster CRC routine. The table-driven CRC finishes in just under 3s for the current OS; the old byte-wise routine finishes in about 10s.

I also took the opportunity to re-enable the PFM device options, so that CYA can now load and save directly from/to the PFM+ and PFM512. The load routine works properly but I think I'll double-check the chip programming loop later today - before I enable the save routine, lest I make a mistake and "brick" my system.

+InsaneMultitasker · March 18

Ok, the CYA utility successfully updated my PFM+ chips. Whew. @9640News I sent you CYA 7.50 for review. I want to finish source file cleanup for CRCOS, PFM, and CYA. I'll send you a PM with some more information tomorrow, including test results and any observations related to the MDOS 7.44 release packet you sent me.

+InsaneMultitasker · March 25

I completed my validation and build updates for CYA, CRC, GenCFG, TSTAT, the PFM utilities, and GenWIPE. Each program had its own version of CRC and PFM routines; I have cleaned this up so that there is only ONE version and the dependencies are defined in the respective "Makefiles". So in theory, if we need to fix something in the CRC routines, the changes will be represented in all of the programs. @9640News the packet is on its way to you for review. I'll continue to build off the same structure with more utilities as time permits, with the intended eventual posting to Github.

I had just enough time to work through four C programs: DM (Directory Manager), MYS, POWER, and PRINTME. There are two versions of the TIC compiler - 1.63 and 1.67. Directory Manager will only compile under 1.63, so I created a separate container for the program and its resources.

DM v2.66 removes the old, FIXED record warning that does nothing but confuse people when it appears. Since no other program accounts for that very old MDOS bug, I felt it safe to remove the warning. If you want to include this in the MDOS 7.44 release, I'll get the files to you on Monday.

Edited March 25 by InsaneMultitasker
Added comments regarding the four C utilities

+InsaneMultitasker · April 13

While -trying- to wrap up my work on the Infocom Interpreter this afternoon, I ran into an issue that consumed my evening: MDOS would not delete files from my ramdisk's 'hard drive' partition. I first thought I came across a MAME issue because surely, this issue did not occur on real hardware. I refreshed MAME to no avail. I then formatted the ramdisk, to no avail. I tried other versions of MDOS, no luck. I tried the delete operation on my real hardware and it was successful, so for a while I pondered why only MAME failed to delete a file. It made no sense. I puttered around for an hour and then it occurred to me that my real PEB contains a SCSI card whereas MAME uses an emulated HFDC card. Sure enough, when I removed the SCSI card from the PEB, I could no longer delete a file from the ramdisk. Again, this made no sense! I could read and write sectors, delete directories, create new files, and more. Only 'delete' seemed to be affected. I enabled the RS232 debug output and started comparing operations with and without the SCSI card. It was then that I noticed all of the bitmap and FCB update operations were never executed.

It turns out the bug is based on an assumption and a "trick". The below routine assumes that R12 contains a non-zero value. While this is all well and good for the HFDC and SCSI, R12 is not used (here) for the IDE and Ramdisk devices, as their CRU is isolated to their respective device driver. The comment after JEQ TOPFC4 states 'fix this to be NotEqual'. R12 is MOV'd to itself, which resets the EQ bit. Alas, in my case R12 is zero, so EQ is set, and the calling routine treats EQ as an error condition. Worse, the error condition is ignored or incorrectly returned, so the DSR reports a successful deletion and happily leaves the file as-is.

I did not find any more MOV R12,R12 statements in the DSR. The failure was resolved by MOV'ing a known non-zero-content register to itself. I considered fixing the logic but the routine is called elsewhere. I am a bit surprised that no one has encountered this problem with IDE or Ramdisk files in the past 2+ years. The error reporting is a bit more complicated and may be similar to what I found in the floppy write routines. That's for another day.

File: SCS2.MDOS.HD.VDPFCB

TOPFC4 MOV R12,R12 THIS IS THE CRU BASE, SO EQ IS RESET
TOPFC5 RT
*
TOPFCB MOV R9,@RAMBUF
TOPFC2 MOV @PTRPRE(R9),R0
JEQ TOPFC4 FIX THIS TO BE <>
MOV R0,@AUNUM

+OLD CS1 · April 13

9 minutes ago, InsaneMultitasker said:

It turns out the bug is based on an assumption and a "trick".

Wow. Talk about esoteric. Nice catch.

+mizapf · April 13

6 hours ago, InsaneMultitasker said:

I puttered around for an hour and then it occurred to me that my real PEB contains a SCSI card whereas MAME uses an emulated HFDC card.

Just to remind (anyone), you can safely use the SCSI emulation in MAME *and* use TIMT3 to format the SCSI drive and to copy stuff on it. Also, the new boot ROMs are available so you can boot from SCSI.

Of course, you would probably not have found that issue if you had done that.

+InsaneMultitasker · April 13

3 hours ago, mizapf said:

Of course, you would probably not have found that issue if you had done that.

Quite true. Not that I was using the HFDC in this fashion to find bugs, mind you. I did wonder if last night was the evening before a TI Faire, given the then-unexplained problems!

Upon further inspection of the TOPFCB routine, I find that it is used during the CLOSE operation and the SAVE operation. This use almost certainly explains the handful of corrupted DF128 files I encountered during testing last weekend, which I thought were being caused by my interpreter tests. Hah.

Oh, that reminds me. I encountered - during soft and hard resets - times when the TIPI web socket was not yet 'connected' before I accessed the TIPI, which in turn caused the OS to hang indefinitely. I am not yet sure if this is related to a change in MDOS and/or something that needs to be reviewed in MAME's emulation, so I will do some more testing and review. I confirmed that on the real hardware the OS will "hang" until the TIPI is ready and when ready, the operation will complete; the OS never recovers in MAME. I -think- there is a difference between MDOS 7.40 and the current development version. And I need to check my MAME version after all my experimentation last night. More to come...

Geneve OS development discussion

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members