040 optimized apps run faster... (just a little)
Not at all. In fact 040 emulation is slower than 020 emulation. The more complex the emulated machine is, the slower is the emulation. The speed gain of a 040 about a 020 mainly is the cache. And in WinUAE cache is adjusted by the JIT slider, not by the processor type. Another improvement is the FPU, but AFAIK the builtin FPU of any processor has less commands than the external FPU of the 68020+FPU setting. (As mentioned above, FPU emulation might slow down emulation compared to a processor without FPU).
You need 040 emulation only if your program does not work on a 020. Most programs do.
The same is true for 040 <-> 060 issue. This is the reason why there is no 060 emulation: every 060 program will work on a 040, too. The only differences are in hardware that is not really emulated (caches, pipelines, number of integer units etc.). On emulation there would be no difference in speed.
Bye,
Thomas