I discovered that calling Forbid() / Permit() around the function call stops any glitching with any of the versions above.
This points to you forgetting to change A7 (SP) properly. Perhaps your src_stack_ptr < a7 (it could easily happen due to buggy alignment code, for example: sub.l #x,a7 / move.l a7,d0 / and.w #-16,d0 / move.l d0,a1 .. Depending on the initial alignment of the stack, no trashing or partial trashing would occur) ? If this is the case then task scheduling will trash the src stack area when rescheduling occurs.
Also, Forbid would obviously hide the problem as no task scheduling happens -> no registers are pushed to stack -> no trashing.
The proper alignment would do something like this (this is just one way of implementing it):
move.l sp,d0
sub.l #16,d0 ; at least 16 bytes storage
and.w #-16,d0 ; ...aligned by 16
move.l sp,a2 ; save old sp
move.l d0,sp
move.l d0,a1
....
move.l a2,sp ; restore original sp