1. The reason is that the code is easier to maintain when there is consistency in argument passing. Also, C typically uses d0-> for numeric arguments, and a0-> for pointers, and the same is "natural" for asm too.
Also d0/d1/a0/a1 are scratch registers, so they're "free" when passing arguments (that is: not used by the main app as these registers are trashed by the OScall / subroutine anyway).
2. CLI programs must return the ReturnCode in d0. The result is the success/warn/failure status of the program. 0 means success (RETURN_OK), 5 means warning (RETURN_WARN), 10 means error (RETURN_ERROR), 20 means failure (RETURN_FAIL). dos/dos.i defines these. Workbench ignores the value returned in d0.
3. Register preservation is not needed, except in your own subroutines (so that you don't trash the registers of the main program). How and which registers you save is upto you. OS calls trash d0/d1/a0/a1, unless if otherwise stated in the function AutoDoc.
At program exit a7 (sp) must be the same value as when the program was entered. For CLI programs d0 must be the ReturnCode as explained above.
Doesn't the OS take care of this automatically when switching between tasks?
Yes it does. Task switching saves all register automagically. What you need to do is to save registers in your own subroutines so you don't trash registers of the main app.