So here are some hard facts. FYI, this is measured with DMandel, a fractal generator, quite deeply zoomed in. This runs a couple of multiplications and additions (and nothing else) in a very tight loop, probably around 10000 iterations per pixel, for a 1024x768 XGA screen.
68000 integer (fixed point) math requires 1:33 for the full picture.
68020 integer (fixed point) math: This is based on 64 bit integer operations (add, addx, but also 64x64 multiplication) - 7:20.
68882 FPU: This is using FPU instructions in a tight loop, based on floating point. Note that this does not require explicit scaling (as in the fixed point case) and hence the loop is tighter - 1:08. Surprisingly, the FPU is here faster than the CPU, at least on the 68060. For the 030, this is not the case.
FPU, but via the mathieeedoubbas.library. Essentially, it is using the same instructions as in the 68882 case, but has to go through the ieeedoubbas library interface, hence some "register ping-pong". The loop is hence not very streamlined. - 3:35.
68020 math as above, 64 bit multiplications, but patched with MuRedox: 3:55. So a bit less than a factor of two as speedup, or approximately doubling the overhead for going through an instruction decoding phase.
Note again, this is for calling instructions in a tight loop (probably 20 instructions long), nothing you would typically find in most other applications.
I forgot to say: This is a 68060@50Mhz.