Author Topic: Is the Coldfire project dead? (Read 9294 times)

lordv · « **Reply #29 on:** November 21, 2004, 12:50:23 PM »

Quote

Actually, JIT mechanism for 68K on coldfire need not be more complex than HP's Dynamo mechanism. That is effectively a hotspot JIT emulating the actual same CPU it is running on and gets faster performance due to various runtime optimisations that arent possible to make at compile time.

Like Dynamo, a JIT 680x0 on coldifre implementation has the benefit that most of the instructions would need no translation - you only need to worry about branch offsets that change as a result of expanding some code "inline", like your unimplemented 680x0 instructions.

Don't know anything 'bout dynamo (maybe some urls?), but with coldfire it's not that easy.

Just imagine - you have all regs filled with data and you must JIT-emulate add.w d1,d2. No stack push/pops are allowed by definition, flags must be also set correctly. The same but to a greater extent goes to mulu.l d1,d2:d3 or smth similar.

That's why I'm saying about g4/g5 JIT emulator as an accelerator for amiga. Unfortunately I have no needed knowledge to do such thing by myself.

Karlos · « **Reply #30 on:** November 21, 2004, 02:40:59 PM »

Dynamo is a dynamic recompilation engine for HP PA8000, running on he HP PA8000.

It blows away the arguments against register allocation and so on. It is a hotspot JIT (ie mostly interpretation which in this case translates into just executing the code as is) that simply optimises away loops, conditional branches etc. at runtime.

See here

The concept is equally applicable to any CPU. If you imagine the coldfire as a 68K, it will simply operate by running the code as is until it encounters a trap triggered by an unimplemented instruction. This will invoke the recompilation stage for that part, which can expand inline into a sequence of instructions to achieve the same end and then saved out into the JIT cache. You will just have to push any registers you clobber at the start of the expansion and pop them at the end (not neccessarily to the system stack, but a reserved area that is part of the JIT engine). You make sure the last thing you pop are the cc/sr (and modify them as required). It really isn't any different to the dynamo case - they are optimising away expensive loops etc, the coldfire system would be optimising away the unimplemented instructions. 99% of the time, you'd just be executing coldfire compatible 68K code, the remainder would be actually doing the transcription where needed.

bloodline · « **Reply #31 on:** November 21, 2004, 03:16:30 PM »

But karlos... the coldfire isn't going to trap the 32bit mul, it's just going to return the wrong result.

Also, if one were to go the JIT route... then I would prefer an XScale rather than a G4, as the ARM uses less power and generates less heat and is much cheaper!

Karlos · « **Reply #32 on:** November 21, 2004, 03:33:19 PM »

But Matt, the beauty of the dynamo approach is that you can trap any instruction you want. You dont need to rely on hardware exceptions - all code blocks are analysed before 'interpetation' (which means execution in this case) ;-)

Simply put, dynamo is the only emulation strategy that acutally outperforms the the cpu it is emulating *on that cpu*! :-D

It runs PA8000 code on a PA8000 up to 25% faster than the PA8000 does by itself. Read the article to see why ;-)

bloodline · « **Reply #33 on:** November 21, 2004, 03:41:39 PM »

Yeah I know about dynamo :-)

I see what you mean, the JIT can catch the erroneous instructions rather than the CPU... but personally... if we are going the JIT route... I wanna use an XScale :-p

Karlos · « **Reply #34 on:** November 21, 2004, 03:43:24 PM »

Fair enough, but *if* Oli's coldfire hardware takes off, you have to admit it would be a killer way to emulate the 680x0 on it :-D

lordv · « **Reply #35 on:** November 21, 2004, 03:54:48 PM »

@Karlos

Quote

Dynamo is a dynamic recompilation engine for HP PA8000, running on he HP PA8000.
It blows away the arguments against register allocation and so on. It is a hotspot JIT (ie mostly interpretation which in this case translates into just executing the code as is) that simply optimises away loops, conditional branches etc. at runtime.

Ok, now it's clear. It works as any other JITter - first runs code interpretive, then taking some fragments, generating for them new code fragments in an isolated buffer and running them from there instead of interpreting original code. In such a manner it could perform on coldfire as well (for a system-wide code fragments). BUT! - when doing so, it would be more effective to use G4/G5 - because they're significantly faster, having more registers (JIT is easier), at last have ability to run warpup/powerup applications natively.

lordv · « **Reply #36 on:** November 21, 2004, 04:00:21 PM »

@bloodline

Quote

But karlos... the coldfire isn't going to trap the 32bit mul, it's just going to return the wrong result.

Even the standard 040/060-like OXYpatcher method isn't applicable for coldfire, because OXYpatcher replaces every unsupported command with jsr -xxx.w, which takes 2 words and fits perfectly into any unsupported command. But when we have unsupported add.w d1,d2... Nothing except trapping possible. The unrecognized mulu.l commands do even worse - preventing coldfire from direct (trapped) execution of 68k code.

Quote

Also, if one were to go the JIT route... then I would prefer an XScale rather than a G4, as the ARM uses less power and generates less heat and is much cheaper!

Does it have altivec-like features? How faster is it over existing G4/G5 processors? How do you manage to support 'legacy' powerup/warpup applications with it?

Karlos · « **Reply #37 on:** November 21, 2004, 04:08:15 PM »

@lordv

Agreed but it isn't quite like any old JIT in that it has a much smaller transcription overhead than other JIT since the majority of any code is simply pased through the translation stage unchanged - in fact, in many cases blocks that contain no unimplemented instructions do not need to be copied at all, simply referenced.

However that is not the issue here. The topic is about the existing coldfire project and the problems it faces - I'm just trying to suggest solutions to those problems.

A G4/G5 card would be great, but then there is absolutely no reason not to use MOS/OS4 on those systems instead since that will be even faster than a pure 680x0 emulation of OS3.x (by virtue of having the native OS / driver resources).

MskoDestny · « **Reply #38 on:** November 21, 2004, 09:40:25 PM »

From skimming the Coldfire docs and reading the information about CF68KLib (the 68K emulation library for Coldfire that traps the exceptions for the missing instructions), I see no reason to believe that the 32 x 32 -> 64 multiply wouldn't generate an exception. It's not like they replaced it with a 32 x 32 -> 32 multiply since that already existed on the 68k and I don't see any evidence (though perhaps I'm looking in the wrong place) to suggest that they remapped the 32 x 32 -> 64 multiplies onto the 32 x 32 -> 32 multiplies (especially since it would do little but screw up attempts to run 680x0 software on the chip while adding to the complexity of the instruction decoding logic). The CF68KLib documentation doesn't suggest that the 32 x 32 -> 64 multiplies are a problem (though there are other problems, like MULU and MULS not setting the overflow bit).

buzz · « **Reply #39 on:** November 21, 2004, 10:05:49 PM »

http://www.microapl.co.uk/Porting/ColdFire/FAQCF68KLib.html

"A lot of code can be run without any changes, since CF68KLib includes handlers to emulate all missing 680x0, CPU32 and CPU32+ instructions. However, there are a few special cases where an instruction behaves slightly differently under ColdFire and which the emulation library cannot automatically correct. The most important of these are:

MULU and MULS instructions executed on ColdFire do not set the overflow flag. Because these are legal ColdFire instructions, with the same opcodes as the 680x0 equivalents, no exception is generated and the CF68KLib handler will therefore not be called. If the original code depends on multiply setting the overflow flag, it will need to be patched or modified to run correctly.
Certain variants of the divide instructions DIVS.L and DIVU.L behave differently under ColdFire.
MOVE.B ,-(A7) and MOVE.B (A7)+, change the stack pointer by one byte on ColdFire instead of 2 bytes as on the 680x0.
An instruction such as MOVE.L (A7)+,(A0,D0.W) is not legal in ColdFire because word-length displacements are not supported. Although the ColdFire processor will take an exception for this instruction, it does so only after incrementing the stack pointer, and so the exception stack frame overwrites the data to be restored. As a result, it is impossible for CF68KLib to reproduce the correct behavior. "

lordv · « **Reply #40 on:** November 22, 2004, 10:55:37 AM »

Quote

MULU and MULS instructions executed on ColdFire do not set the overflow flag. Because these are legal ColdFire instructions, with the same opcodes as the 680x0 equivalents, no exception is generated and the CF68KLib handler will therefore not be called. If the original code depends on multiply setting the overflow flag, it will need to be patched or modified to run correctly.
Certain variants of the divide instructions DIVS.L and DIVU.L behave differently under ColdFire.

Not quite right. See here for details.

http://www.microapl.co.uk/Porting/ColdFire/Download/CF68KLib.pdf

"The most significant difference between ColdFire and 680x0 is that some of the
multiply/divide instructions introduced with the 68020 do not behave the same
and do not cause an exception. The following instructions are affected:
MULS.L ,Dh:Dl (Signed multiply: 32x32 -> 64)
MULU.L ,Dh:Dl (Unsigned multiply: 32x32 -> 64)
DIVS.L ,Dr:Dq (Signed divide: 64/32 -> 32r:32q)
DIVSL.L ,Dr:Dq (Signed divide: 32/32 -> 32r:32q)
DIVU.L ,Dr:Dq (Unsigned divide: 64/32 -> 32r:32q)
DIVUL.L ,Dr:Dq (Unsigned divide: 32/32 -> 32r:32q)"

Then go to freescale.com and see CFPRM.pdf (coldfire programmers reference manual). There are NO 64bit mul/divs.
So "they behave differently" means "they just generate rubbish"!

lordv · « **Reply #41 on:** November 22, 2004, 11:07:13 AM »

@Karlos

Quote

Agreed but it isn't quite like any old JIT in that it has a much smaller transcription overhead than other JIT since the majority of any code is simply pased through the translation stage unchanged - in fact, in many cases blocks that contain no unimplemented instructions do not need to be copied at all, simply referenced.

But you can't modify the original codem because it can rely on itself in an unpredictable manner! (I think it's obvious!

. So anyway mainloop is emulation and JITting of frequently-used parts.

Quote

A G4/G5 card would be great, but then there is absolutely no reason not to use MOS/OS4 on those systems instead since that will be even faster than a pure 680x0 emulation of OS3.x (by virtue of having the native OS / driver resources).

There IS reason not to use mos/os4. os3.1 is freely available to you as amiga user, while mos and os4 were made especially for some ppc workstations (wrongly called as new 'amigas'). Neither firm will release them for g4 amiga accelerator, because it will lead to decreasing of their sales.

whabang · « **Reply #42 on:** November 22, 2004, 11:23:06 AM »

:crazy:

This discusstion has turned waaay too technical for me.

Karlos · « **Reply #43 on:** November 22, 2004, 11:37:09 AM »

@lordv

To be honest I think we are talking about crossed purposes.

All I am saying is that for a coldfire based solution, the dynamo style JIT would be the best way to go for highest performance 680x0 emulation.

I didn't say it was perfect, but it is a lot better than the average JIT in terms of efficiency. Again, the whole point I am suggesting it is because the topic is about the coldfire and not PPC/XScale.

Obviously if someone released a CPU card powered by an x86/PPC/XScale etc. you would have to employ a conventional JIT approach. If, on the other hand, you are using a coldfire, you can use a dynamo style JIT (which is not fully applicable to a CPU with a totally different instruction set) which is proven more effective for the specific case of emulating like on like.

@Whabang

Dont worry, most of it is academic argument - unless someone does release a G4/G5 card for the classic :-D

bloodline · « **Reply #44 from previous page:** November 22, 2004, 12:33:12 PM »

Quote

Karlos wrote:

@Whabang

Dont worry, most of it is academic argument - unless someone does release a G4/G5 card for the classic :-D

Or an XScale card :-D

Imagine a Trapdoor connector with an FPGA to convert the ZII (A1200 trapdoor) bus signals (and generate an interupt) to one of the hi-speed serial interfaces (Hirose DF12C(3.0)60DS0.5V80 or NSSP?) for a gumstix computer...

Gumstix Board

That would be a cool A1200 Acelerator...

Author Topic: Is the Coldfire project dead? (Read 9294 times)

lordv

Re: Is the Coldfire project dead?

Karlos

Re: Is the Coldfire project dead?

bloodline

Re: Is the Coldfire project dead?

Karlos

Re: Is the Coldfire project dead?

bloodline

Re: Is the Coldfire project dead?

Karlos

Re: Is the Coldfire project dead?

lordv

Re: Is the Coldfire project dead?

lordv

Re: Is the Coldfire project dead?

Karlos

Re: Is the Coldfire project dead?

MskoDestny

Re: Is the Coldfire project dead?

buzz

Re: Is the Coldfire project dead?

lordv

Re: Is the Coldfire project dead?

lordv

Re: Is the Coldfire project dead?

whabang

Re: Is the Coldfire project dead?

Karlos

Re: Is the Coldfire project dead?

bloodline

Re: Is the Coldfire project dead?