I hope you mean move.l #$12345678,D0. But look at Psxphill's example. Supposing you do
move.l (A0),D0
move.b (A0),D1
which byte of (A0) ends up in D1? Given that (A0) is now cached.
Since it is A0 itself that is byte swapped, the correct value for the lower byte will be returned.
Let A0 point to an address that contains $12345678. FPGA swaps it to $78563412. Second line then reads $78 instead of $12!
No, because the BYTE value of $78563412 is $12 - just as it would be with a software only op-code interpreter.
Again, I think the FPGA would end up slowing down the process when you had a fast x86. I am not sure where the break even point would be - that would depend on how fast you could clock the FPGA.
Endian is really not that big of a deal when doing a 68K emulation. It's just an extra process.
I guess you guys have to decide if you want 100% compatibility or not. If you do, you absolutely must have a cycle exact emulation. There are quite a few programs that require it. You could also deliberately set emulation thresholds. For example, you could set the speed/emulation type to be Amiga 500 (68000), Amiga 3000 (16MHz or 25MHz 030), Amiga 4000 (25MHz 68040), etc. This way you could run those euro demos in Amiga 500 mode that won't work on anything else.