I'm not an expert on the matter, but Cell strikes me as one of those things like Itanium where it's just too damn complicated to bother with. Software-directed caching sounds good in theory, and if you're willing to tweak it properly I'm sure it can run very well indeed, but for normal software development, you'd need a pretty freaking smart compiler to make the most of it.
Besides, FPU performance is only one small part of overall system performance.