I thought you guys might be interested to know about some research I've been doing lately.
As some of you may know, I've made an external USB-attached amiga floppy drive controller, that can create .ADF files from amiga floppies using regular PC floppy drives.
There's some detailed links, etc, in this forum thread.
Amiga Floppy Project Status threadThere are a bunch of problems with reading 20 year old floppies and the problems generally fall under the problem of "bitrot." Now amiga disks can go bad a bunch of different ways, which is also something I'm looking into.
In any event, I have created a method of error correction that just doesn't DETECT where errors occur, it actually fixes them. When an Amiga writes data to a disk, it encodes that data using MFM. MFM has a very particular structure --- the embedded clock bits are computed/generated from the data bits. If you are missing a bit here or there, you can actually reconstruct that bit if the surrounding bits are still intact.
Another interesting thing about Amiga MFM is that because of the structure, there are only certain raw MFM byte possibilities. There are actually only 32 bytes (out of the possible 256) that are legal raw MFM bytes. As a result, I can actually tell WHICH bytes in a particular sector are bad. Also, just like the english language where not every two-letter combination is valid (do you ever see ZX together, or QM together?) MFM has illegal and legal digraphs where not all combinations are valid. A full 1/3 of them are illegal. I can also use this to determine where exactly an error is occurring.
Last, but not least, I use a best effort guess protocol where I'm not just brute-forcing an answer to correct the byte. I start off assuming there are 1-bit errors, then move to 2-bit errors, and then to 3-bit errors. This is per byte. And remember, the search space is much smaller than you think!
The net result, if you don't care about all the technical details, is that THEORETICALLY, I can recover something on the order of tens of bytes per sector with each of those bytes having up to (3) bit errors each. I haven't done the math, but the good news is that if you throw more CPU power at the problem, you can recover more data. There's a point of no-return and I haven't done enough research to know how valuable this is.
Anyways, I thought this was a neat idea, and I've never seen these concepts presented before.
I have very rough initial code done and while it works high level, I've got to integrate the error correction into the existing client software that I wrote that reads the disks.
Thanks for reading.
Keith