A dynarec/JIT doesn't really have to be all that slow. You don't have to deal with the mess of emulating a fundamentally different architecture, just expanding certain unimplemented instructions into multiple implemented ones. You don't have to deal with register mapping or simulating flag behavior (well except for those previously mentioned multiply instructions). Plus in theory, a sufficiently advanced dynarec can actually improve performance. HP's Dynamo is a dynarec that doesn't do any translation between architectures it just does processor specific optimizations and optimizations that can only be reasoned about at runtime.