If you look at the macro vs function code on say the 68060, the overhead of a function through the LVO is at least 14 cycles (ignoring cache misses where inlined code like the macro has an advantage) for the JSR+JMP+RTS. For short functions like SetWriteMask() where the code is only a couple of cycles, this is significant. There is more overhead in setting up for a function call than using the macro also. The newer PPC processors are likely to have a link stack which might cut the function overhead in half but the macro is still significantly faster.
Excuse me, no, it's not. You seem to assume a program that does nothing except calling SetWriteMask() in a tight loop without doing anything else, so the program speed is dominated by function call vs. macro.
However, that's certainly not the case here. If the program would make such tight function calls, the argument would be all different and I would agree that this is possibly something to consider in worst case. However, we're talking about probably 20 cycles vs. probably 5 cycles somewhere in the middle of a large program where the 20/5 difference is only relevant in a small percentage of the overall code coverage.
So, in other words, the whole argument is mood. Don't worry about problems you don't have in first place. The dominant problem here is coding style, and that outweights the hand full of cycles nobody is able to measure or notice.