Author Topic: AROS68K and the Freescale Coldfire CPU (Read 25903 times)

vidarh · « **on:** January 16, 2011, 07:31:57 PM »

Quote from: WolfToTheMoon;607070

I'd be happy with 68040 performance.

In that case an FPGA Arcade Replay board seems like it'd be a better bet. Last time Yaqube posted an update he got SysInfo results of 0.54 x A4000 w/ 25MHz 68040 - I'm sure that can be improved on further with more work and/or with a rev.2 of the board with a faster FPGA or other changes such as faster memory subsystem...

But by all means, go ahead and try Coldfire too, be interesting to see the results with AROS given that far less of the code would need trapping/emulation than with unmodified AmigaOS

vidarh · « **Reply #1 on:** January 20, 2011, 03:43:11 PM »

Quote from: Iggy;607885

The problem with CF68Klib, if you check the documentation, is that even in supervisor mode it doesn't trap instructions that run differently on the Coldfire than they do on the 68K.
I'm sure there's a way around this, but while CF68Klib looks promising, it may not be the only answer.

The best bet is probably something like a simple tracing JIT translator where most instructions translate 1-to-1. You could do it mostly "in place" by scanning block by block and rewriting any offending instructions, and replace Bcc, JSR, JMP etc. with traps if you can't show they point to "safe" (already JIT'ed) code, then you jump to the code. If/when you trap again, you continue JIT'ing and patch the instruction that brought you there back to its original.

As an example, lets assume this completely bogus example sequence:

MOVE.L D0,-(SP)
SOME_BROKEN_INSTRUCTION
RTS

You'd decode all three instructions, then rewrite it like this:

MOVE.L D0,-(SP)
BSR some_free_location (unless SOME_BROKEN_INSTRUCTION is long enough for you to be able to "emulate" it inline, in which case you have it easy)
RTS

some_free_location:
[code to emulate SOME_BROKEN_INSTRUCTION]
RTS

The biggest problem is if SOME_BROKEN_INSTRUCTION is too short to provide enough space to branch elsewhere, in which case you have two choices: Resort forcing a trap or rewriting following instructions too. The latter quickly makes things trickier as you then have to deal with rewriting branches etc. that may point to the later instructions that you move.

You pay the additional cost of the JIT process, but once critical paths have been JIT'd, it'll run at near optimal native speed. "Near" because you get the extra overhead of potentially having extra branches to account for "patch sites" where there was no space in the original code to plug in the modified instructions, unless you go to the potentially significant extra trouble of rewriting the whole thing.

Note that this is not foolproof. Self modifying code etc. or code that intentionally jump into the middle of an instruction could still cause trouble and is much harder to deal with.

The JIT could be very fast, as for any instructions deemed "safe" it'd just need to recognize them and move on to the next instruction, and recognizing them could be done with a very small, compact decoder since it could discard large groups of instructions as safe with a few simple bit masks.

vidarh · « **Reply #2 on:** January 20, 2011, 09:32:44 PM »

Quote from: Iggy;607914

Having seen what the MorphOS team managed to do with JIT for 68K code running on PPCs I'd have say you've got a point. Since many of the instructions would not require modification, this should work fairly quickly.

Exactly - for m68k on PPC it's a massive amount of work, and even more work to get it fast. For M68k on Coldfire a lot of it should be no-op's (though reading up on it, there *is* a lot of stuff that's common in Amiga code that will need JIT'ing - such as ROR/ROL, arithmetic on bytes or words, DBcc etc.) - just map and figure out if it's a "safe" instruction, if so decode enough to know the length of the instruction and skip; if not, check if it's one that can be replaced in-line, and patch or worst case patch in a branch and use a small-ish set of functions to generate the appropriate replacements...

If you want to be fancy, you can later deal with relocating jumps etc. and so inline all the modifications (which would even if you wanted to let you do "proper" tracing the way modern Javascript JIT's does, and take advantage of the larger cache on the ColdFire to trace longer instruction streams and unroll loops and auto-inline other functions).

Author Topic: AROS68K and the Freescale Coldfire CPU (Read 25903 times)

vidarh

Re: AROS68K and the Freescale Coldfire CPU

vidarh

Re: AROS68K and the Freescale Coldfire CPU

vidarh

Re: AROS68K and the Freescale Coldfire CPU