Use several BIG lookup tables if you can afford the memory. I tried several smaller tables, but ended up using a couple of large tables. I think the emulation (PC) ended up being around 1.25MB in total size because tables were imbedded in the executable (along with the instruction emulation), repeated over and over in most cases, instead of being built in memory. But, the speed difference is dramatic. You have to remember, if you can save just 4 cycles, benchmark programs and anything else sitting in loop is going to really benefit from it!
I think Mike's table based microcode lookup is going to the most efficient (and fastest) method of doing the microcode core emulation.