@da9000
Decomposition. Yes. Although I'd move that into software, probably the compiler. Now it's separate from the hardware, and the compiled code can scale appropriately depending on the target.
Everyone take note: without a major engineering breakthrough, processors are not going to get faster than they are today. It's all about rethinking how we solve problems.
@darule
That sounds like a great machine for continuing the classic legacy (although I'd ditch PCI and use PCIe and probably drop Zorro altogether); however, it's not going to move things forward.
Here's an excellent example of innovation using existing technology:
1. Ageia releases the PhysX SDK and companion hardware for accelerating physics calculations in games and other software.
2. nVidia buys Ageia, ports the PhysX middleware to its existing GPUs, and ends up with a solution that runs PhysX software an order of magnitude faster than the original companion hardware.
Here's my point: you don't need special purpose hardware. You just need a configuration flexible enough to allow your developers to make the best use of existing hardware. (Yes, "GPUs" are now general purpose processors.) That's what makes Cell, Tesla, and other low-cost, high-performance processor subsystems so attractive.