@ Bif
In regard to the Cell, I still don't know what they were thinking about that one.
The design works pretty smoothly if you don't have many unexpected branches, and the floating point performance is good (and was even greatly improved in the now discontinued PowerXCell 8i).
But its not an ideal design for general purpose computing. And in its original form it required the use of XDR memory (and anytime you see a Rambus designed idea backed by a limited number of vendors you should run away at high speed).
Also, while IBM did a pretty good job of documenting the chip, their marketing left something to be desired.
You could contact IBM about it, but they didn't want to sell any without "qualifying" the users design and intended use.
In other words they expected companies to partner with them.
That might work when you are building millions of game consoles of a relatively static design, but it doesn't work so well in other more rapidly evolving consumer products.
Anyway, enough talk about dead architectures.
The real competition for ideas like the Parallella is likely to come from gpu computing (where parallelism has already been taken to the extreme).