Welcome, Guest. Please login or register.

Author Topic: ColdFire V4 Amiga accelerator project from the m68k emulation perspective  (Read 4321 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline PiruTopic starter

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Original & revised document is available at http://www.iki.fi/sintonen/coldfire-v4-m68k.txt v1.0.2   28-Jun-2003

 ColdFire V4 Amiga accelerator project from the m68k emulation perspective
                                 by
                   Harry Sintonen


MCF5407's V4 ColdFire core adds Revision B of the instruction set. At first
glance it looks like it really adds lots of improvements especially suited
for the m68k emulation, however a closer examination reveals the grim truth:
These additions help only a little, more likely not at all.


Examining the CFPRM.pdf(1), the MCF5407UM.pdf(2) and the M68000PRM.pdf(3)
reveal that a lot of m68k features are missing.



V4 COLDFIRE CORE REVISION B INSTRUCTION SET IMPROVEMENTS


The improvements in Revision B of the instruction set for m68k
compatibility are:

long offset for Bcc/BRA/BSR
byte/word sizes for CMP, CMPI
word size for CMPA
byte/word sizes for MOVE.[BW] #,d16(Ax)



MISSING ADDRESSING MODES


Address Register Indirect with Index
  Base Displacement  (bd,An,Xn)

Memory Indirect
  postindexed (od,[bd,Ax,Xn])
  preindexed  ([bd,Ax,Xn],od)

Program Counter Indirect with Index
  Base Displacement  (bd,PC,Xn)

Program Counter Memory Indirect
  postindexed (od,[bd,PC,Xn])
  preindexed  ([bd,PC,Xn],od)



MISSING SIZES FOR OPCODES


add .b .w
adda .w
addi .b .w
addq .b .w
addx .b .w
and .b .w
andi .b .w
and .b .w
andi .b .w
asl .b .w
asr .b .w
lsr .b .w
lsl .b .w
eor .b .w
eori .b .w
movem .w
neg .b .w
negx .b .w
not .b .w
or .b .w
ori .b .w
sub .b .w
suba .w
subi .b .w
subq .b .w
subx .b .w
[probably missed some]



COMPLETELY MISSING OPCODES


BCD missing (binary coded decimal, virtually unused)
bit field (bf*, rarely used, 020 XPK SQSH)
logical rotate (rol/ror roxl/roxr)
DBcc, (dbra/dbf, dbne, dbeq etc, used a lot)
mulu/muls 32*32->64 (used a lot)



CONCLUSION


All the missing addressing modes, opcodes and sizes for opcodes need to be
emulated. Especially common math operations like add, and, asl, eor, neg,
not, or, sub for sizes .b and .w will be met a lot. Also rol and ror are
common and will need to be emulated. DBcc is common in loops, and need to
be emulated aswell.

My conclusion is that MCF5407 with the V4 ColdFire core will be unbearably
slow when executing average m68k code typically found in the Amiga
environment. I'd estimate that the overall performance drops below MC68060
@ 50MHz performance.

The situation can be improved by replacing slow programs and os components
with specially ColdFire optimized code or by doing just-in-time
recompilation.

Another possibility is to use CyberGuard/OxyPatcher method to replace the
emulated opcodes with 'jsr (abs).w' to low 64K memory and that will contain
the emulation routines (or JMP to actual code, if 64k is not enough to hold
all the emulation code).

However, with such method new problems arise:
- ROM code cannot be patched (can be worked around with MMU ROM mirror)
- Some software do checksum of their code (virus checkers, Elbox drivers)
  This problem can perhaps be worked around with different code and data
  virtual memory space. However I didn't look closer at ColdFire to see if
  this is possible.



REFERENCES


1) CFPRM.pdf     ColdFire Family Programmer's Reference Manual

2) MCF5407UM.pdf MCF5407 ColdFire(r) Integrated Microprocessor User's Manual

3) M68000PRM.pdf Motorola M68000 Family Programmer's Reference Manual


--- edit ---
Updated to 1.0.2
 

Offline PiruTopic starter

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Sorry for the double thread post, but I couldn't figure out how to rename the first thread.
Moderators: Please delete the dupe without topic.
Suggestion: Make it impossible to post a thread without topic. :-)

--edit--
Ok the bogus thread is gone now. Good.
 

Offline csirac_

  • Full Member
  • ***
  • Join Date: Feb 2002
  • Posts: 154
    • Show only replies by csirac_
Quote
My conclusion is that MCF5407 with the V4 ColdFire core will be unbearably
slow when executing average m68k code typically found in the Amiga
environment. I'd estimate that the overall performance drops below MC68060
@ 50MHz performance.


I guess it depends on how the emulation is done. I'm guessing it probably (?) executes the code nativly and the emulator is in the form of an interrupt service routine for "unimplimented instruction trap"; from there, does the equivilent sequence of instructions, and then returns.

I really think no-one can really tell how much it will slow things down. Perhaps lots, if there are emulted instructions inside inner loops... hmm. It's surely not a show stopper, for now :-) That's what the first prototype is: proof of concept. I hope everything goes well.

- Paul
 

Offline JoannaK

  • Hero Member
  • *****
  • Join Date: Dec 2002
  • Posts: 757
    • Show only replies by JoannaK
Problem with emulation Trap idea is that it causes a LOT slowdown. For Each instruction that don't exist there creates exeption and that causes switch from user mode to supervisor mode... For execption handling lots of registers must be saved/restored and there need to be a logic to decode all combinations by software.  This kind of bouncing around will cause a lot slowdown due cache trashing,

But.. in the end.. Slowdown is allmost impossible to estimate by papers alone.. It'll need first workin version of code before we'll see how much it affects.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Id have to agree with the basic notion that existing methods for executing the code would be simply too slow.

Trapping instructions would be a killer - real 68060 cards that shipped with poor 68060.library TRAP based emulation often turned in slower performance than 030s for some code (especially stuff making heavy use of 32-bit muls/divs).

As detailed in the report, patching the unimplemented opcodes with jsr instructions is an option also.

However, there are a lot of opcodes to patch (unlike for 040/060) so the indirection could cause a slowdown too (not as bad as TRAP though).

My guess would be the best way forward would be to do what everybody else does to run 68K code these days and create a JIT emulation.

The JIT for a coldfire cpu could be a bit less complex than for eg x86 and could still make direct use of most instructions giving a decent transcription ratio.

I'd imagine that in many cases the transcription would be little more than expanding the unimplemented 68K code inline and keeping track of PC relative references in the current block as the transcripted code lengthens in relation to the original...

That could be pretty swift :-)
int p; // A
 

Offline jdiffend

  • Sr. Member
  • ****
  • Join Date: Apr 2002
  • Posts: 302
    • Show only replies by jdiffend
Does anyone actually know how many times this has already been covered?
FWIW, Motorola's own estimates of full cpu emulation based on millions of lines of existing code is much better than what is mentioned here.
BTW, the 4e core (which Motorola has been talking about with developers a lot) will help a little too.
 

Offline KennyR

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 8081
    • Show only replies by KennyR
    • http://wrongpla.net
We've been through it before but I doubt in so much depth. There is a huge question on whether the coldfire would be usable. Piru is no beginner and he is most often right when it comes to these things.

And I wouldn't trust MC's word on this, btw - they are trying to sell the things, after all.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Quote

jdiffend wrote:
FWIW, Motorola's own estimates of full cpu emulation based on millions of lines of existing code is much better than what is mentioned here.
BTW, the 4e core (which Motorola has been talking about with developers a lot) will help a little too.


Well, theyre targetting the embedded applications market with coldfire. Much of the development for that will use specially adapted compilers / assemblers etc. generating code for specific applications. So full compatibility with existing 680x0 stuff isn't seen as that much of an issue.
If it were, the (d8, An, Xn) addressing modes would definately not have been dropped, since a lot of code uses this, especially on 020+.

Running unmodified code designed for a full 680x0 computer system is not quite the same thing as Motorola intend for these devices and woud require a fair bit of work to emulate efficiently.
int p; // A
 

Offline jdiffend

  • Sr. Member
  • ****
  • Join Date: Apr 2002
  • Posts: 302
    • Show only replies by jdiffend
Quote
We've been through it before but I doubt in so much depth

LOL, uh... how about a couple years ago before the last major hack of this site... like right after the V4 came out.
 

Offline jdiffend

  • Sr. Member
  • ****
  • Join Date: Apr 2002
  • Posts: 302
    • Show only replies by jdiffend
Quote
Running unmodified code designed for a full 680x0 computer system is not quite the same thing as Motorola intend for these devices and woud require a fair bit of work to emulate efficiently.

The entire purpose of the emulation is for running unmodified code.  They wanted companies that had invested a lot of money in 68K development not too feel like they would have to start over.
 

Offline KennyR

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 8081
    • Show only replies by KennyR
    • http://wrongpla.net
Quote
We've been through it before but I doubt in so much depth

LOL, uh... how about a couple years ago before the last major hack of this site... like right after the V4 came out.


I'm pretty sure the missing CPU instructions weren't listed in such detail, which is what I mean by "depth".

Besides which, you're still treating the Amiga as if it will work as well under Coldfire as an embedded system would, which it probably won't.
 

Offline jdiffend

  • Sr. Member
  • ****
  • Join Date: Apr 2002
  • Posts: 302
    • Show only replies by jdiffend
I know the info was posted because I posted it.  I copied if straight from some of the docs on the emulation and conversion tools.  That was a looooong thread mostly made up of my posts just passing on what I'd found out.  I also pointed out which of these instructions don't even work on the 060.

Any instruction that can be emulated will slow down the system but it will run.  Instructions that can't be emulated are the ones that are a problem.  

The Coldfire can't emulate some of the instructions due to the way the pipeline executes instructions.  By the time the Coldfire discovers the illegal address mode it has already modified  the program counter which makes it impossible to decode the original instruction to emulate it.  The other ones that can't be emulated are legal instructions with different behavior.

The complex address modes would result in an illegal instruction guru.  The incompatible math instructions would result in altered program behavior.  I suggested patching code that doesn't run, with illegal instructions that could be trapped.  That would let most code that couldn't be recompiled run.  The code would still run on an old cpu with an interrupt handler (similar to decigel) that patches the instructions back to their original form.  From then on they would run full speed.

Motorola's estimates for slowdown by emulating the 68K was between 10% and 20% for user mode code.  It wasn't mentioned but I'd guess you  could expect up to a 50% slowdown on when emulating both supervisor and user code with a V4 Coldfire.  That's still considerably faster than an 060.

Now, with the intro of the V4e core you'll see higher clock speeds, better compatibility (supervisor stack pointer and more address modes) and faster speed at the same MHz.  That means that even with a 50% slowdown it would still stomp on an 060.  

Oli doesn't seem to think the exec needs a rewrite to get it to boot.  If you look at my comments in other threads you'll see I don't criticize Oli's efforts but I do point out that it isn't working yet and sometimes that I don't think his emulation idea will work.

I say the exec needs to be rewritten because the supervisor stack is different on the Coldfire, the V4 (and below) has no supervisor stack pointer, the emulation interrupt handler really needs to be part of the exec's interrupt handler and a coldfire compatible exec would be much faster.  

BTW, I also posted that I thought we'd have to wait for the V4e core for it to be practicle to make a Coldfire Amiga.
 

Offline jdiffend

  • Sr. Member
  • ****
  • Join Date: Apr 2002
  • Posts: 302
    • Show only replies by jdiffend
Quote
The complex address modes would result in an illegal instruction guru.

Actually, it's been a while since I looked at the guru docs or code so it may be something else.

BTW, one of the problems I saw while working on the ROM disassembly was just what bits do you set to identify a Coldfire CPU?  I know how to detect it but after I do... then what?  Programs need to be able to identify it and there is no standard.
 

Offline PiruTopic starter

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Quote
BTW, one of the problems I saw while working on the ROM disassembly was just what bits do you set to identify a Coldfire CPU? I know how to detect it but after I do... then what? Programs need to be able to identify it and there is no standard.

Excuse me?
Of course it would set the SysBase AttnFlags to identify the CPU/FPU it is emulating.

No program need to know that it is really running under ColdFire and not real m68k CPU.

identify.library is open source nowadays, it could be easily expanded to report ColdFire CPU. That's how system information programs would know about the CPU.

However, this is all theoretical until the card is actually working.
 

Offline jdiffend

  • Sr. Member
  • ****
  • Join Date: Apr 2002
  • Posts: 302
    • Show only replies by jdiffend
Quote
Excuse me?
Of course it would set the SysBase AttnFlags to identify the CPU/FPU it is emulating.


It's not quite that simple.

Let's see... you build the emulation for 020 since it will emulate the most instructions that way and give you the most compatibility... and then a program thinks it's running on an 020 and decides to use some of the 020 address modes that can't be emulated.

68000?  Some software won't like that and it won't be as fast.

For the best speed, setting bits for 060 would work unless the software tries to detect the CPU.  So you drop the extra compatibility for that reason. But there's no FPU or MMU emulation so that's still a problem.  Ummm... does the emulation code even offer 060 compatibility?  Didn't used too.  We could do our own by removing instrucitions from the 020.

Or go with the ec040.  It wouldn't require as many mods to the code as for the 060
A Coldfire aware program still can't tell which optimized c2p routine or math functions would be best to use without detecting the CPU itself which defeats the purpose of the AttnFlags in the first place.

Then enters the V4e or higher core which has FPU and MMU... but they aren't quite the same as 68060 versions.

The point was making it possible to detect a coldfire without every program having to detect it on their own and still having existing software identify it as the cpu it's emulating.