>Of course, I could be talking out my ass

You are

If the JIT is optimized for anything, then it's for AMD and/or PentiumII/III CPUs.
There are a couple of workarounds (rather than optimizations) for the Pentium range of CPUs. They (or at least the PII/PIII core) have a nasty thing called "RAT stall" which causes long delays in some circumstances, so when a Pentium is detected, some code is done in a less-obvious-but-faster-in-the-face-of-RAT-stall way.
The other special treatment has to do with the PentiumIV being the first (and only) x86 CPU which, for a certain couple of instructions, actually treats a few flags which are supposedly "undefined" after those instructions as, indeed, undefined --- as opposed to previous CPUs, which simply left them unchanged (and thus the instruction was a good way to set just the ZERO flag). Much uglier code for P4 type processors there.
What I suspect is that, probably due to the OS difference, WinUAE on one machine manages to set up a 1:1 memory mapping (enabling the JIT code to simply access memory directly), whereas on the other, it probably fails and forces memory access to be handled through table lookups, wasting quite a few cycles each time.