The FPGA is the magic pill

It allows you to phase lock, manipulate bit-by-bit in realtime etc.
I think the big machines to really implement is A500 and A1200 which I suspect are the ones that are most widespread. The rest has to be configure your setup and pray (tm).
And the talk about how hard it is to code for all weird combinations of hardware is just a limitation of the number of willing coders. But with a baseline to work with it's way easier. It's easier to improve a specific chip to do something than to have to create a whole system from scratch as when you do that you have no working startpoint to work from. But rather have to make a theory and try.