Hi Thomas and Olsen!
Thank you for your answers, very interesting!
...
I don't know of a good framework to test C code either, even less so for OS and OS ROM and libraries... Could you explain in more details what you mean about "It is not designed to be testable"? Naively, I would have think that, being a set of libraries (in ROM or software), there could be tests for each and every available data structures and functions provided by these libraries, but I'm certainly missing something here... in particular, when it comes to dynamic stuff, like tasks and processes.
What is missing in the operating system in general is even the most humble unit test which more complex tests could build upon.
From how I understand the rationale for implementing tests (as described in great detail in Michael Feathers' book "Working effectively with legacy code"), the tests are to assist in making code changes more robust, if not allowing them to be made in the first place. Code which is more easily changed is also likely to be more easily changed for the better: reduced complexity, fewer side-effects, better understanding of its design and relationships within the context it is a part of.
There is nothing there which one could build upon. You would have to start from scratch and construct the unit tests as well as build test harnesses. The Amiga operating system is comparatively small but it is by no means simple. Adding the scaffolding for building a testable operating system out of what we have is a major challenge.
The "modern" testing approach which grew out of the use of object oriented implementation languages was not available to the designers of the Amiga operating system. This is why testing began from the outside in, by testing how software of particular interest/import interacted with the operating system, and by analyzing the behaviour triggered by the changes which were introduced. It might have been the industry standard in the 1980'ies and the 1990'ies.
What methods I applied under such circumstances (only 'C' and assembly language used as implementation languages) came from the books I read on this matter: "Writing solid code" (Steve Maguire), "Code complete" (Steve McConnell) and "Clean code" (Bob Martin). Unit testing is part of this toolbox, but it's much easier to write new code to follow the principles described in the books than to make legacy code conform to them after the fact. The new Disk Doctor was written from the ground up and this is why I could design it to be testable, and to apply the best practices I learned from these three books (as well as I was able).
Could you tell us more about the features that you exercised? Do you have a set of programs that you know are "tough" on the OS and "play" with them to test the OS?
As far as I know "empirical testing" was used to as large a degree as was possible. You install the new software then try to use the programs you are familiar with. If necessary, unwelcome changes in behaviour are then documented and this goes into the bug tracker. Then it's the old reproduce, analyze, fix, retest loop. There was no shortage of known operating system bugs which went through the same loop without having to start out as a bug tracker ticket. In short, the whole process used the methods available at the time when the Amiga operating system was designed and still being maintained in the years 1985-1994.
With the new Disk Doctor things were a bit different, and only because I had the luxury to start from scratch. The central data structure which the new Disk Doctor uses to keep track of what type of information it finds in a block on the medium had to be both very memory-efficient and still fast enough to work on a plain 68000 machine. To this end I designed and implemented a sparse bit array, with unit tests and a test harness. This was tested separately before I integrated it into the Disk Doctor code. Testing the new Disk Doctor both involved old school "empirical testing" and building a test set of some 450 ADF images which were then fed into the new Disk Doctor through a script file. This test exercised the entire program, including its in-memory database, the memory management, and the diagnostic as well as the data recovery functions.