subq.l #1, d1
move16 (a1)+,(a0)+
bgt.b .loop
I don't have the 68k docs to hand at the moment, so I can't check this out.
But does the move16 change any condition codes in the way a normal move.l(a1)+,(a0)+ would do ?
Or changed the subq - bgt section to
move16 (a1)+,(a0)+
dbne d1, .loop .. dunno if this is slower though.
Another thing that pops into my head (yeah I know, a lot of free space at the moment !), are you running a replacement scheduler on the system that shows these problems?
Anyway, I'm sure Piru will be along any moment to tell me I'm wrong and give you the right answer :-P