My idea with usb was to exploit the User-I/O header. Connecting User0+User1 to D+ & D-. And let say User2 select pullup on one D line to indicate speed. Thus eliminating the need for any external PHY, and any resulting io consumption.
Also USB might save pins. The SD/MMC slot, Keyboard, Mouse, Joystick0, Joystick1 can be replaced.
Maybe replacing the MCU with an CPLD is a viable option?
That CPLD would then be fast enough to handle Ethernet, USB aswell as booting the system etc.. And it doesn't need a boot eeprom.
Another option with CPLD is FPGA reconfiguration via ethernet.. :-)