I am not aware that there are currently MMU-activites for the Phoenix/Apollo core... On the 68K, there is a coprocessor ID reserved for it, but in reality, the MMU instructions differ already between 68020/68030 and the 68040/68060 substantially. Actually, given the small number of programs that really depend on the hardware interface, it is less a problem to create a new interface here. The FPU is more often used at instruction level, hence, compatibility on instruction level is much more important here. Partially. For the FPU, certainly. For the MMU: This is not sufficient, because the MMU does more than execute instructions and update registers. If you would want to emulate the MMU at instruction level, you would also need to emulate the table-walk of the MMU.
The FPGA Arcade and Mist have small enough memory sizes to get away with a 68040/68060 MMU. If Gunnar is not interested in a 68040/68060 MMU compatibility layer for his Apollo core then there isn't much reason to create a standard unless other FPGA Amiga hardware is introduced with more memory. Maybe a better way to query what hardware is available would be useful though.
Actually, concerning the FPU: Additional FPU registers are again problematic, again due to the exec scheduler. I believe it would be wiser to have the second set of FPU registers, or a specialized vector-FPU available under an additional coprocessor ID. The standard scalar FPU would preserve the legacy interface, and could be saved and restored by exec. Hence, programs and tools for applications that only use the scalar FPU could remain unchanged. If you need more speed, you would engange the "vector FPU" under a new coprocessor ID, and an updated exec scheduler would save and restore its registers. Hence, the incompatibility would only involve programs that actually use the new FPU, and not all programs. See above. I would advice against extending the scalar FPU. If you need more registers, or vector instructions, enable an extended vectorial FPU. This would allow exec to continue using its legacy stack frame if the vectorial FPU is not used, and hence incompatibilities could be minimized.
This is a good idea in theory but there are problems. A new SIMD/vector unit in an FPGA may only support integer operations because of the cost of even single precision floating point. There are several choices:
1) 8 register FPU with integer only SIMD = slow fp performance
2) 8 register FPU and wait for an SIMD with single precision fp = slow fp performance now
3) No FPU with SIMD supporting single precision fp = poor fp compatibility
4) 16 register FPU with integer only SIMD = average fp performance
5) 16 registers FPU and wait for single precision SIMD = average fp performance now
I thought 8 FPU registers was adequate when Gunnar wanted 16. I encoded it and found that it works out very well (unlike adding integer registers). The registers are orthogonal except for FMOVEM but the upper 8 FPU registers can and should be scratch registers, IMO. The 68k FPU would be much more efficient with more scratch registers (the cost of saving and restoring extended precision FPU registers is very expensive). Making all 8 new FPU registers scratch registers would cut the number of FPU register saves and restores in half or more, would be very efficient for passing arguments to functions in FPU registers which do not need to be preserved and fp instructions can be interleaved which could up to double performance when the result of one instruction can't be used in the next instruction. This could allow reasonable performance of wide operations in a slow FPGA and/or a 2nd FPU superscalar unit. Keeping the FPU extended precision makes more sense with 16 registers because of the register argument passing and reduced register saves and restores. Most compilers waste the 68k FPU extended precision by passing arguments as 64 bits on the stack which is easy and efficient to improve with 16 FPU registers and a new ABI. Which do you think would cause the least incompatibility?
A) adding 8 FPU registers and patching the exec scheduler
B) reducing FPU precision from 80 bits to 64 bits
I would choose A) above. Most FPU code does not rely on the extended precision but I know that several 68060FPSP algorithms would need fixing (if possible) or new algorithms. The performance advantage of a 64 bit FPU is significantly reduced when extra instructions are needed to retain maximum precision which are not needed with extended precision. I agree that the extra few bits of precision can be very useful too.
Compatibility is very important given the current state of the 68k Amiga but we need to increase performance and plan for the future also. Adding 8 FPU registers looks like a tremendous opportunity to me even if it adds some incompatibility when the extra FPU registers are used.