What's funny about all of this is on PCs back in the day, adding an FPU DID in fact speed up things dramatically. I've still got the system I used back in the early 90's sitting next to me, a 80486SLC (80386 replacement) with an 80387 co-processor.
I guess most PC software back then autosensed the presence of an FPU and took advantage accordingly, whereas any well designed Amiga application would simply allow the math libraries to handle and redirect the necessary data.
Unless the app was designed for a direct hardware hit, I can see where there would be a bottleneck going from app to library before figuring out which processor to hit next.
Might also explain why computers, no matter how fast the hardware gets, still take the same amount of time to do anything as they did 10 years ago. So many freakin layers of crap to go through before actually getting to perform a function in harware. Translate this, translate that, blah blah!
I miss DOS.