one where most instruction would go to the coldfire but when an instuction not supported was needed, the appropriate register values are copied to the normal (040 or 060) cpu and executed there and the changed registers copied back to the coldfire...
For this to work you'd need transfer the whole state of the CPU (and with FPU and MMU it gets even more complicated). It would be even slower than emulating the unsupported instructions with coldfire itself.
I'm sure with some heavy coding it might be possible to have such solution, but it would be pointless: Slower than coldfire alone, and more expensive.