I wasn't going to wade in here but ...
The NatAmi board uses a Cyclone IV EP4CE40 device (currently). This has 39,600 LE (Flop + Logic element) and 1.134 MBit internal memory.
The FPGAArcade Replay board uses a Spartan3e1600 device. This has 13,752 Slices, but each slice has 2 logic and 2 flops, so that gives about 29,504. It has 0.66 MBit internal memory.
I had a quick look at the timing data, the Cyclone is probably slightly faster - but there is not much in it.
The NatAmi team can fit a larger FPGA, true, but their device is already significantly more expensive than the Spartan3e. I believe getting a low cost and mass production is more important at this point.
Writing in AHDL is rather painful, but there are tools to convert from AHDL to VHDL.
I used to design 3D graphics hardware. I have a lot of respect for the designers working on NatAmi, but with this FPGA they will do well to match the performance of a 10 year old GPU.
For me, the aim is to get a highly accurate, high performance 68020 grade processor which high resolution, high bit depth screen modes (with a few hardware tricks thrown in).
I have to get back to testing me boards ....
Best.
MikeJ