Edit: Also, like I said, *I* can't reasonably do it in an FPGA. I can do a localbus interface and I can do software. I'm just tired of waiting for "it's not that hard" to happen and following my gut on the quickest path to get there.
I think doing the PCB of an FPGA (or SoC FOGA) only design would be pretty straight forward. I do not expect the softcore to be trivial. Luckily TG and Yaqube have already been doing wonderful things. This part needs a processor designer, and Motorola surely had quite a team. As do Arm, Intel, AMD etc. Not just a simple student engineer like me, but we do have some good people doing this sort of thing, and at some point it'll open up and more of us can see and play, and if time allows try some more ideas in there. I was surprised at how simple a RISC control unit can be this past fall semester, I really wish our project had been in something useful like VHDL or Verilog instead of the crummy schematic tool LogicWorks...
Look at the instruction set, and start breaking each down into easy steps. (hardwired logic or microcode, I don't have a good handle on microcode way yet) Look for common pieces, such as the first part of every instruction is the fetch part. Then decode. If you know VHDL and are good with digital logic, you should be able to figure it out. I had my aha moment and it's making a lot more sense now, so anybody can...
It starts with a counter, up to the maximum number of steps for your instruction analysis. The longest instruction (most steps) determines the maximum this counter increments up to. Shorter instrictions can clear the counter earlier back to 0 (fetch step). Along side that you have your instruction opcode, which is used to address/select registers, select which ALU operation gets to the ALU output, etc.
Then things get more complicated as you bring in pipelines, bypassing (shortcut result of one instruction back to ALU input for use before it actually gets written to its destination register), mux NOPs into pipeline stages if they need to stall/bubble for a cycle, start doing superscalar (multiple parallel ALUs), caching, branch prediction, etc. And goes on from there.
The fundamental thing isn't hard. Competing with an 060 most likely is hard, but is possible. Will take a while though.