All over the place would be horribly inefficient for memory and cpu usage.
No, not really. Typical software has hotspots where most CPU time is spend. Optimize there, use a function pointer to call into the critical part, set the function pointers to the optimized hotspots in the init functions.
There is no overhead, and it helps. For all practical purposes, it does not make sense to compile everything for all CPUs. Most code does not matter much timing wise.
Or Kickstart from an 020+ machine in an A500. I'd be interested in where it was.
ObtainPen() (actually completely uncritical, useless optimization) and GetColorMap(). This is compiled in for all machines with an AA chipset, which were not delivered with a 68K only. Thus, Kickroms of those machines will not work with a plain 68K.
Actually, the above optimizations are completely pointless (as often), and where it makes sense, the kickstart provides a 68K and a 68020 version, but switches dynamically (graphics/Text() does that).