An fpga-fpga internal expansion bus serialisation of 8 MHz, 16 bit data, 24 bit address gives a 3072 Mbps datarate. The XC3S500E manages 500 Mbps LVDS. So it doesn't need that many I/O ports.
And there's proberbly room for optimisations.
And yes dual fpga will consume I/O, but will also gain a lot more I/O than lost.
You can load all FPGAs from a single eeprom but the catch is that changing core will be a pain for non developers. That's why my preference for an flashmemory load.
Using MS-win kills the possibility for efficient remote console. Requires local harddisc. And gives the headache of virus etc..