@Hammer
My recently acquired nVidia GTX-260 (overclocked to 640MHz core speed) turns in many a merry FLOP in the CUDA application's I've played with this far :-)

(above represents source code level FLOPs, not actual GPU FLOPS)
What's more, the given API is quite nicely general purpose.