The Superscaler 68060 averages better than 1 instruction per cycle. A good assembler programmer should be able to average about 2 instructions per cycle in some code. This means that the 68060 is able to decode in parallel with variable length instructions.
68060 can despatch two instructions at the same time, I don't think it decodes them at the same time.
"The superscalar micro-architecture actually consists of two distinct
parts: a four-stage instruction fetch pipeline (IFP) responsible for
accessing the instruction stream and dual four-stage operand execution
pipelines (OEPs) which perform the actual instruction execution. These
pipeline structures operate in an independent manner with a FIFO instruction
buffer providing the decoupling mechanism."
I don't believe it can sustain 2 instructions per cycle for long before the fetch pipeline runs dry & that is if you can even find worthwhile work to do in instructions that can run in parallel.
The ARM CPU on the FPGA Replay is most likely busy serving the FPGA with disc emulation and doesn't have the code space to do much else. The transfer capacity to the FPGA may also be a serious bottleneck.
Even so, the ARM is a SOC. If you want to use it for emulation then you'd need to be able to configure it's memory map. Maybe you could do it with MMU tricks, but it's not really in the spirit of the FPGA arcade.