If you have an A4k, then you don't want a RetinaZ2, you want the much improved RetinaZ3, which can do CybergraphX (and possibly Picasso96).
As for the accelerator, any one that also has included RAM on board will be faster than the A3640 due to the bottleneck to the A4k's on board system RAM. An accelerator such as a WarpEngine, or CyberStorm with it's own RAM will be much faster, even if they have the same CPU at the same clock speed as your A3640.
Cost can be all over the place, from $100+ to the insane $1,000+ range for accelerators (specially when they also have a PPC CPU, like the Phase5 & DCE cards do).
Good luck
Edit: I see that you also have an A2000. For that I would suggest the PicassoII. It is a good Z2 graphics card which has a pass-through, but no scan doubler.
Sell the A2000 and spend the money fixing up the A4000.