It might be a good starting point. [ColdFire] Differences are:
1. No DBcc
2. No bitwise rotation (rol, ror)
3. No bitfield operations
4. Multiply instructions don't set flags. From the Coldfire manual:
CCR[V] is always cleared by MULS/U, unlike the 68K family processors
1-3 on your list are not 68k conflicts but trapping would make them very slow. Here is a list of 68k and ColdFire conflicts that are
not fixable by trapping (i.e. ColdFire.library):
1. ColdFire stack is 4 byte aligned (68k 2 byte). MOVE.B/W (SP)+ and -(SP) fail.
2. REMS/REMU encoding is incompatible with DIVSL/DIVUL encoding.
3. ColdFire multiply instructions don't set flags like the 68k.
In addition, practically anything in Supervisor mode will not work.
Coldfire also has a few extra commands (some of which would be quite useful, such as saturate and multiply-accumulate)
MVS, MVZ and BYTEREV should have been in the 68060. SATS is good for DSP/Codec type processing but where is SATU and ABS? The CF MAC processor is powerful but is a bolt on that doesn't fit with the 68k/ColdFire IMO. It's a poor man's SIMD as the CF is low end and cheap, cheap, cheap. Freescale will sell you PPC or now ARM (which they sadly license) if you need some real processing power.
Well, there are quite a few Amiga programs - including several of my own that all follow 100% legal programming practices (according to common sense and the RKMs) that will not run on an 060 with superscalar and/or branch caching enabled. I don't recall all of the reasons behind the issues. I should go look at the mmu.library replacement that we made for EMPLANT and FUSION... I know I commented some things there. I know that self modifying code is definitely one of the things that causes a problem when one of the cached instructions in the pipeline has been modified (like a branch table). Yes, I consider self-modifying code 100% legal. You are suppose to flush the caches (or turn them off) with self modifying code, but when you do that you are then running at sub-030 speeds.
Self modifying code needs to flush the caches (including branch cache) which negates the advantage of the caches and any speed gains of self modifying code. If you don't like caches, stick to the 68000 until you change your mind :/. Some early 68060.library's may not have flushed all the caches properly, fixed the superscaler bugs in the 68060 properly or may have had bugs in the CPU support code used for trapping. The best ones matured and work fine. Fusion works fine on the 68060 here except for an occasional random crash. The last ShapeShifter was more stable though. That was using the last version of Fusion which I bought in your Fusion/PCx CD bundle. Fusion had some nice features over ShapeShifter like the file transfer and auto screen mode changes from within the Mac but stability is more important. I would still use Fusion if it was more stable and supported more hard drive options which ShapeShifter is better at.
The Natami fpga CPU was going to use writethrough caching with snooping and auto flushing of detected dirty cache lines. This is a good option that allows very large caches with excellent compatibility. It would be possible to auto flush a branch cache in the address range of the dirty lines that are detected by snooping also. With the faster memory and larger caches of today, this should give cache performance close to that of the 68060 with better tolerance for self modifying code.
The 060 really only adds dual instruction pipelining and a 4-way cache. A higher speed (100MHz+) 040 core would probably be better in the long run, especially if it handled floating point without completely stalling the core like the 060 does.
The MC68060UM says:
"The MC68060 allows simultaneous execution of two integer instructions (or an integer and a float instruction) and one branch instruction during each clock."
"The MC68060's FPU operates in parallel with the integer unit. The FPU performs numeric calculations while the integer unit continues integer processing."
The 68060 FPU was a nice improvement over the 68040 FPU. It dropped a few 040 FPU instructions that were very rarely used and added back the FINT and FINTRZ instructions which compilers use commonly. The execution speeds were also improved across the board and more parallel operation is possible. The 040 can do some limited parallel operation also.
The 68060 is a great processor which does a lot of parallel work but it's not easy to make and it's probably not as easy to make in an fpga. A faster clocked more 68040 like CPU makes sense in the fpga. Bigger caches, a branch cache and more parallel operation are needed for maximizing performance though.