The gap will be narrower. As I said the routing delays become similar to the logic delay. I'll run some tests next week.
America and Europe are getting closer to each other each year by continental drift...

That in itself doesn't mean much. Let us hear about your test results, any progress is good. But right now it doesn't sound like a convincing approach to have a more expensive product which has less processing power than your competition and then trying to solve this by putting in a more expensive component.
Yup, we have a HD capable graphics card with dedicated blitter, and ~ 060 performance (sans MMU and FPU) already.
That must be a very wide interpretation of "approximately". You have 28 MHz clock, the 060 at least 50 MHz. The 060 does simple instructions in one clock cycle and so do you. But the 060 can do a second simple instruction in the same clock cycle while you can't. The 060 has very good branch prediction and fast branches, you have none. The 060 has 32 bit wide buses, you still need to remove the 16 bit limitation. The 060 has fast caches, you still need to add those.
I estimate that you are currently in the 030 range of performance. If you manage to do the caches, the wider buses and improve the clock rate, you'll be entering 040ish performance. Still a long way to go as the apollo core is twice as fast as an 060 right now and still improving. Let's hear your adoom fps, Riva playback, mp3 decoding and how they are improving while you are making the core faster. For us the adoom fps is a standard test because it is more interesting than just some sysinfo MIPS.
BTW, does the fact that you didn't say anything about the offer to try to maintain compatibility between the apollo core and the fpgaarcade's implementation of AGA and 68k that you are going to consider it?