Remember our OS blitting routine argument? You just stated the reason for writing one's own blit routine: A one size fits all routine isn't always the best solution.
Not exactly. The question is "what is the solution you look for", and "what can the Os do for you", and "is a patch worth doing", and is "not using the Os" worth it. Each decision has advantages and drawbacks.
In case of doubt: Avoid a patch, especially if the average savings are negligible. In case of doubt: Use the Os for the job, unless you get substantial savings doing otherwise.
What happens now in the average program? If you don't care much, you probably pick memcpy() from the standard library or CopyMemQuick(). The former may or may not use the Os - it is rather inlined. If it matters much, you problaby have your own routine.
For the blitter, you get however a substantial disadvantage from not using the Os: If you try to implement a graphics primitivity, it might simply not work on an rtg system if you don't use the Os. Is it worth not using the Os? Typically not, because you "shoot yourself in the foot".
Thus, the situation between "patch" and "program", "copy mem" and "blitter" are not quite as symmetric as you may want to present them.