Does anyone have any hard data on the performance with software trap based FPU emulation? I'd be interested to see how much of an impact it actually has, but I can't find any published benchmarks anywhere.
Hard data is going to be difficult to find. How are the floating point operations going to be performed after the exception?
1) full precision floating point using integer instructions
2) reduced precision floating point using integer instructions
3) partial hardware accelerated floating point (possible with this new PPC board? Lattice FPGA?)
The following link is a paper discussing the floating point options and performance for a processor without an FPU.
http://www.ll.mit.edu/HPEC/agendas/proc08/Day1/11-Day1-PosterDemoA-Spetka-abstract.pdfThe performance difference is going to depend on the number of exceptions generated. Also, some floating point operations are much more difficult to perform with integer operations than in hardware. Light floating point use may see no difference in performance while heavy use will likely be a night and day difference. OxyPatcher/CyberPatcher/MuRedox patching of 6888x FPU code on a 68060 can be 50% faster with heavy floating point use and this is with the most common 6888x instructions available in the 68060 FPU. These patchers have partial floating point hardware acceleration by using the simpler 68060 FPU also. No hardware floating point acceleration and full precision with integer operations would be devastating to floating point performance.
I do think in this day and age processors with only scalar FPU support are a little bit pointless. May as well go SIMD FPU or go home. Like you said with X64 mode, scalar FPU isn't even an option, only SIMD is available. For scalar operations you just ignore the high elements of your SIMD registers. They still compute as fast as the scalar FPU would compute them.
If creating new floating point hardware, a combined FPU and SIMD (the result being a SIMD) like the x86_64 makes a lot of sense. An FPU with double precision support for C programs and an SIMD which is single precision only also has advantages (PPC with Altivec way).
I think you can get a ton of mileage out of just a handful of SIMD/FPU instructions - multiply, add, subtract, and enough permute and conversion operations to get your data ready for those appropriate math operations.
Right. Keep it fairly simple for hardware floating point support. Hardware designers need to pay attention to what compilers need and are using where they have made major mistakes before in cutting common and valuable instructions though.
Anyway, I think in that sense worrying about current Amiga scalar FPU instruction sets and compatibility/performance might be slightly overblown. If we really want to crunch through a lot of floating point values at some point it might be wise to create or adopt a SIMD FPU instruction set everyone could adopt going forward.
Both the 68k and PPC would probably maintain a separate FPU and SIMD for compatibility. This requires more logic but the SIMD only needs to support single precision floating point which saves some logic. Today, logic savings isn't as important as compatibility or performance.