@Kenny,
I guess the phrase we are looking for is 'Your Mileage May Vary'...
MOS speed advantage is not just down to its 68K emulation - the fact that the OS calls (usually the most time consuming) are PPC native helps just as much. No doubt register allocation is a trivial operation on PPC too ;-)
Anyway, back to WinUAE. I've written code and tested under pretty strict conditions. On my PC at least, 040 optimised code running in 040 emulation mode (all other UAE settings unchanged) is faster than than 020/882 code in 040 emulation mode. This is unusual, unless the 040 emulation is written differently in some way.
The 040 optimised code running in 020/882 mode is no different from 020/882 optimised code in 020/882 mode.
Naturally any small set of tests are subject to random fluctuations, but the differences I've noticed on my PC are totally systematic over many repetitions.