Welcome, Guest. Please login or register.

Author Topic: Amiga history: Why was "Disk doctor" so spectacularly bad at its job? Here is why...  (Read 11998 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline olsenTopic starter

If you have been around long enough, you may remember the "standard" disk repair utility which shipped with Kickstart/Workbench versions 1.x and 2.0, which was dropped when Workbench 2.1 was introduced.

Whenever a floppy disk error came up, as they did in those early days, you would see an AmigaDOS requester window urging you to repair your floppy disk with the "Disk doctor" command. Many of us tried, because there was practically no alternative, and very few succeeded. Because it had such a poor track record, it became a running joke. You had to be really desperate to use "Disk doctor", because like it or not, "Disk doctor" mostly left the disk in a poorer state than it was before, but sometimes it succeeded in "rescuing" data.

So, why exactly did it work so poorly? I did some research this week, because I was curious.

The purpose of "Disk doctor" was to return a damaged volume back to operational state again after the disk validator or the Amiga ROM file system had failed to achieve this. Specifically, "Disk doctor" was supposed to make the disk validator work again, by repairing the on-disk data structures, even reconstructing the first block and the root directory from scratch if necessary.

To this end, it scanned the entire disk, figuring out which disk block contents were still sound, and if there were any read errors. This was the first step at which things could go very wrong. If "Disk doctor" found that the first block of the disk, or the root block, were unreadable, it would first attempt to rewrite their contents and then reformat these blocks as a last resort (this could correct "hard errors"). However, it would not just format these individual single blocks, it would format the entire track instead of just that block. So you lost not just the contents of one block, you lost 11 blocks of data. Because "Disk doctor" overwrote those blocks with the contents of whatever was last in its scan buffer, random data was repeated all over those 10 trashed blocks.

And it only got worse from there. By knocking out those blocks, "Disk doctor" could damage file contents, would later detect that corruption and subsequently delete those files. Deletion was incomplete in that the data structures which allow for directory scanning would be damaged, and "Disk doctor" was unaware of the need to repair them. This in turn rendered the disk validator inoperable.

Mind you, pre-existing damage to file contents had the same knock-on effect: "Disk doctor" detected the file damage, but did not know that it had to repair the directory data structures. Once a directory was corrupted, "Disk doctor" would never repair it.

"Disk doctor" version 1.3.3 (and presumably older versions, too) suffered from a bug in the initial scanning process. It failed to scan the entire portion of the disk which was used by the file system, omitting the final two blocks. If there was a file or directory header stored there, or a file data block, then "Disk doctor" would assume that these directory entries did not exist, or that the files associated with the "missing" data blocks were corrupt.

As part of its recovery operation "Disk doctor" identified files and directories whose parent directories no longer had valid parent directories themselves. These were considered "orphaned" directory entries, because they were no longer reachable through the existing directory structures. "Disk validator" would add these orphans to the root directory, making them accessible again. This may sound reasonable, but there was a twist: if several orphaned files shared the same name, they would all show up in the root directory. If you tried, as the "Disk doctor" documentation recommended, to copy the contents of the "corrected" disk to a separate disk, only one of those orphans which shared the same name would be copied. Trying to delete those files with the same name on the corrected disk would most likely lead to all of them getting deleted.

That was, of course, not the end of it. "Disk doctor" had trouble recovering files which were longer than 72 blocks of data (on a floppy disk, that would be 36865 bytes or larger). Due to how the Amiga ROM file system works, a file covering more than 72 blocks of data (assuming a block size of 512 bytes) needs to record where the other blocks are stored in a separate data structure called the "extension block list".

"Disk doctor" ran into trouble with the extension blocks because of a mix-up in the disk read routines. One version read blocks, verified that their contents were sound and that the block was of a certain type. The other version did not insist on checking all of that, and it did not have to. The problem was that "Disk doctor" used the version which insisted that the block had to be of a certain type, which happened not to match the extension block type. Hence, "Disk doctor" considered all files larger than 72 blocks to be corrupt and would schedule them for deletion. Which had further knock-on effects, corrupting the directory structures.

This still is not the end of the tale. "Disk doctor" used an incredible amount of memory for its internal bookkeeping. You might have been able to "correct" a floppy disk with its 1760 blocks on your 512 KByte Amiga (which would require 140 KBytes of RAM), but with a 20 MByte hard disk partition you would need at least 1 MByte of RAM. In the early days, that was a lot to ask for.

"Disk doctor" reached the end of its usefulness (for some values of "useful") when it turned out to be too hard to adapt it for use with the Fast Filesystem (FFS) in 1988. Commodore struggled to find a proper fix for its limitations, but the basic problem was that so much of "Disk doctor"'s code was restricted to the original Amiga ROM file system layout and operations.

How directories and data blocks look like is notably different for the FFS, but confusion prevailed. The documentation for the last "Disk doctor" version 1.3.5 manages to both discourage and encourage you at the same time to use it with FFS volumes. As it turned out, if you were so unlucky to try, "Disk doctor" could render perfectly workable FFS format directory structures unsuitable for use with the FFS. If you were spectacularly unlucky (the first disk block was unreadable), "Disk doctor" would simply assume that your volume was actually in standard Amiga ROM file system layout and then make it so: when rewriting the first disk block, it always assumes that the disk uses the original Amiga ROM file system layout.
« Last Edit: September 09, 2017, 08:42:53 AM by olsen »
 

Offline olsenTopic starter

Quote from: BozzerBigD;830475
It was before my time but DiskSalv does perform admirably for FFS partitions and was programed by 'hardware' Amiga guru Dave Haynie!!!


DiskSalv 1.0 was released in May 1986, but "Disk doctor" certainly is older ;)

No change history is recorded for "Disk doctor", so my best guess is that it may have been written or adapted from earlier code by Dr. Tim King in 1984/1985.
 

Offline olsenTopic starter

Quote from: Matt_H;830476
Dave Haynie's notes about Disk Doctor (from the Disk Salv documentation) also merit repeating here. The story he recounts is that the developers were unsure of whether to keep Disk Doctor, so they put its sourcecode on a disk and ran Disk Doctor on it. Needless to say, it did not survive. :)

We may never know if that's 100% true, but it's a great story.


This story was one of the reasons why I looked into how "Disk doctor" worked. Dave Haynie acknowledges that the story came from a secondary source ("The story I was told goes something like this").

Can "Disk doctor" corrupt an otherwise perfectly sound disk, rendering files stored on it unreadable? From what I now know about how "Disk doctor" works, there are only two cases in which this might happen. Both cases assume that the disk is physically readable and writable, and the on-disk data structures are all consistent.

Case #1: "Disk doctor" would find orphaned files and directories on the disk, representing data which has been deleted. These files and directories would be added to the root directory. If any of these newly-added directory entries would share the same names as the "Disk doctor" source code files would use, then this could render the source code inaccessible.

Case #2: This assumes that the floppy disk uses the FFS format rather than the original Amiga file system format. "Disk doctor" always resets the root directory and subsequently adds those files back to it (this is actually a side-effect and arguably unnecessary to begin with). Because the FFS format requires that the directory entries are added so that they are sorted by ascending block number, and "Disk doctor" fails to do that, those files might not show up when the directory contents are listed. This could happen to the "Disk doctor" source code files, which would appear to be gone, although the individual files could still have been opened (using the "Type" or "Copy" commands).

If the assumption that the disk was sound to begin with does not hold, trouble would be expected, and "Disk doctor" would likely leave the volume in a corrupted state.
 

Offline olsenTopic starter

Quote from: kolla;830500
Speaking of file systems...
* when booting, if no RTC is available, the OS will set system time to the creation time stamp of the filesystem from which it boots
Actually it's even weirder - the default ROM file system reads the volume 'last altered date', and if the system time happens to be unset, will change it to that date.

Quote
 - very clever, if only there was a tool to adjust and update this time stamp :)
The way the default file system works, it should be sufficient to change a file on the volume. That change will bubble up to the root directory, and it should replace the 'last altered date' unless that change came with a time stamp which preceded what's already recorded in the root directory.

Quote
* OS4.1 comes with a newer version of FFS (FFS2?)
Well... it's a reimplementation in 'C' and not so much a version of the code which existed before it. It's like the original FFS, which was reimplemented in assembly language by Steve Beats, with the precursor written in a completely different language (BCPL).

Quote
... with long filenames (0x444F5307) - is there a filesystem handler for OS3.x that supports this?
Of course there is :)  I wrote it, starting in 2001, for use on AmigaOS 2.x/3.x. This is still the most complex and challenging software I ever wrote. The AmigaOS 4 and MorphOS versions are ports of the original 68000 implementation. I still use the 68000 version to this day on my A3000UX development machine at home.
 

Offline olsenTopic starter

Quote from: Thomas Richter;830499
If I recall correctly, it was the OFS in kickstart 1.2 which presented an error message such as


if the file system or restart segment (aka "disk validator") found a problem it could not resolve. Telling.


That might have come from the SetCPU command, which "pranked" the original requester text through a patch. In the original text it says "corrected", the same word which "Disk doctor" uses when it prompts you to "Insert disk to be corrected and press RETURN". At least it's consistent ;)
 

Offline olsenTopic starter

Quote from: kolla;830561
Are you really sure about this?

I ask because that is not at all how I experience it, for me it looks like the time of formatting, and not 'last altered', is used. I typically keep a loopy script update a .timestamp file regularly, and doing that alone does not work.
Not as sure as I should have been. I just looked over the original Amiga file system code and found the reason why a change to the file system contents need not affect the "last altered date" of the root directory.

This has something to do with how the file system keeps track of which blocks are still available for use, and which have already been allocated. That information is stored in what's call the "bitmap" in which a series of blocks account for all the allocated and unused blocks.

Because the bitmap may not be current, the validation process may have to be summoned to rebuild it. The trigger is stored in the root directory block, which is a flag that indicates whether or not the bitmap is valid. That flag is cleared when new blocks are allocated and the bitmap has not been written back to disk yet, and it is set again when whole operation which eventually required the bitmap update has concluded.

This is where the "last altered date" field comes in: it is updated only if the bitmap contents change. This happens when the validator finishes its job, if you add a new directory entry, add data to a file or delete a directory entry. So, in a nutshell, changing the date stamp of a file is insufficient for moving the "last altered date" record ahead.

I have not looked at the V34-V40 FFS implementation (funny that I can make sense of a pageful of BCPL code more quickly than a dozen of 68k assembly pages), though. My best guess is that it reproduces the behaviour of the original Amiga file system.

Quote

Right, and of course.

Turned out my problem was not FFS at all, it was really a partition with SFS/02, that didn't show up in OS3.9 - not FFS with long filenames. I had forgotten to add SFS/02 to RDB. This is my A3000 that multiboots between OS3.9, OS4.1 and Morphos 1.4. I am pondering on moving back to FFS as I keep getting random errors from SFS about not being able to read blocks (750GB ATA drive with ACard IDE-UW SCSI). It's sadly not so easy to find a filesystem that one can rely on for all the different "flavours". Also, I hope Amiga filesystems will improve so that they transparently can deal with bad-blocks, bit-flips etc, as all such errors become statistically impossible to avoid on large drives.


You would not want to go back to the all-checksum OFS blocks, I suppose ;) This is a file system for desktop computer systems after all, which has little CPU time and memory to burn. The "boring" solution to the problem of data corruption is still to keep multiple sets of backups :(

As for the storage devices involved, they ought to report decoding/consistency errors as read errors, as floppy disks used to do. Not sure if this is still an option today, though, when everybody can just throw CPU power and memory at the problem on the file system level.
 

Offline olsenTopic starter

Quote from: kolla;830600
:)

Code: [Select]
setdate dh0:
It should just work ;)

It should, but it probably won't :(

SetDate, Protect and Filenote are the three commands which are not supported for the root directory (which does not have file protection bits, nor a comment field).

All three operations run through the same common function. In the original Amiga default file system, as well as the FFS versions 34 through 45, all three are rejected with the error "object is the wrong type".

Apparently, the only way to advance the "last altered date" field of the root directory is in making a change to the file system which has the effect of allocating/releasing storage space. This should do it:

Code: [Select]
Echo > dh0:change_date
Delete dh0:change_date
 

Offline olsenTopic starter

Quote from: mark_k;830682
ACTION_SERIALIZE_DISK sets both creation and last-modified dates in the root block. It's possible that due to it doing TR_GETSYSTIME once for creation and once for last-modified fields, that the last-modified time is one or more ticks later than creation.

ACTION_RENAME_DISK changes the volume last-modified date (at "size-10") only.

What changes the root datestamp at "size-23"?


Any change in the bitmap will do that. The file system rewrites the root directory as a by-product of marking the bitmap as "valid" again after a write access (added data to a file, truncated a file, created a directory entry, deleted a directory entry) has made changes to it. Before the root directory is written back to disk, the "last date altered" field (at offset size-23) is updated, too.

So the root directory actually contains three distinct date stamps: 1) size-7 is the date and time of the disk initialization ("serialization"), 2) size-10 is the date and time of the last volume update (which only advances in time and never goes back) and 3) size-23 is the time and date of the last change made to the root directory block.