It is optimal it is inlined. Actually, any decent compiler will that do automatically for you if the size of the memory block to be copied is known at compile time, including a complete unrolling of the copy if the size is small enough, and probably fall back to a small inlined routine if it is not avoiding the per-call overhead then. At least this is what SAS/C does, and it's quite plausible that this is the best option.
I'm not so sure that "most" compilers will optimize memcpy() for static cases. Most compilers will optimize to some extent small copies implied with '=' which are practically all static sizes. The C memcpy() function is generally optimized for small to medium sized copies and doesn't detect for small static sizes. I have never seen the SAS/C memcpy() function inlined (and it is poor for the 68040 and awful for the 68060). Vbcc will use assembler inlines by default for memcpy() and the new unreleased version of vbcc has much improved 68000 and 68020 optimized assembler versions of memcpy(). Maybe GCC is smart enough to check for static cases when using memcpy(). Medium to large sized copies are best handled by exec/CopyMem() or exec/CopyMemQuick() if they are patched with CPU specific optimized versions. The new vbcc versions of memcpy() will likely outperform unpatched exec/CopyMem() and exec/CopyMemQuick() for all sizes on the 68040 and 68060. Using an '=' for copies is likely to be the fastest for tiny to small copy sizes with practically all compilers on all 68k processors though.
I'm not clear which programs actually use CopyMemQuick() anyhow - probably the FFS does in case the "Mask" does not fit the target of the memory buffer (i.e. the handling device is "broken" and cannot reach the memory indicated as buffer by the caller). In any event, the impact CopyMem and friends has on performance of your average AmigaOs installation should be quite irrelevant. It would be interesting to see if someone could just set a breakpoint there and measure how often it is used. I suspend "not so often".
CopyMemQuick() is rarely use by the AmigaOS or any other programs (Scout is one of the few programs). CopyMem() is used extensively by the AmigaOS as well as many programs. The AmigaOS uses many tiny copies (<16 bytes) and it looks like some may be static. CopyMem() may have been used for tiny and small copies because the alignment is unknown and the source originally was for a 68000 where it could crash. It's fastest for tiny sizes to inline unaligned copies on the 68020+. I made a Snoopy (like SnoopDOS) script for CopyMem() and CopyMemQuick() to determine how often, which alignments, which sizes and which programs use these functions. The Snoopy script is available in this archive:
http://aminet.net/util/boot/CopyMem.lhaThe script records the uses to memory. Be prepared to break the script after a few seconds from starting or you will run out of memory in less than a minute on the average Amiga. I wouldn't call that kind of use "not so often".