If the scope is narrowed to application level test. And in particular the calls mostly used. So things like loading a sector from hard-disc actually returns 512 bytes and not 511 bytes or open a window with intuition actually does so etc..
I don't think automated test for OS internals will be needed. Though it doesn't hurt. Not many Amiga CPUs does threads nor does the OS implement privileges.