I'd like to see a technical explanation of why you can emulate a CPU but not an MMU with an FPGA.
You *can* if you want to, but you probably do not *want to*. The current (small) Phoenix core does not implement the bitfield instructions, to give you an example. It could, but it does not because the complexity of these instructions is too high to make this feasible. A single bitfield instructions accesses between one and five bytes, dynymically, depending on register values. That's nothing the current pipeline can do. In the 68K, this is a microcoded instruction, i.e. it is slow.
Pretty much the same goes for the MMU table walk. It is a pretty complex algorithm, it requires to access an even larger number of bytes in a completely serial fashion, testing several conditions on the way. Yes, you can probably implement that on the FPGA and spend a lot of gates for that, but in reality, you probably simply do not want to. This part of MMU handling is also microcoded on real hardware, so you can now either add more complexity to add a microcode interpreter into the FPGA, or you can have it interpreted by the CPU core that is there anyhow, i.e. you can do it completely in software.
One way or another, it makes a lot of sense to split off the complex algorithms belonging either to instructions or CPU internals to eliminate complexity in the core. If you look at the current small core, a lot of the complex less used instructions are in software: ABCD and friends, MOVEP and friends, 64x64->64 arithmetic (the double-sized MULx and DIVx), and bitfields. And these are rather harmless compared to *some* of the MMU functions, namely the table walk.
I personally don't have to decide, but if I would have to, I would offload this stuff also to the CPU and build something simpler underneath.