Author Topic: 68060 and the 68882 (Read 15503 times)

matthey · « **Reply #14 on:** April 21, 2011, 01:28:05 AM »

Quote from: SpeedGeek;632671

But you won't see the performance you would have with a 50 Mhz 68030 by using the instruction trap kludge.

Trapping would slow the CPU down to a crawl. The 68882 is a dog compared to the 68060 FPU also. ~1/8 of the speed on average at the same clock rate comes to mind (not counting any trapping overhead). If I remember correctly, Motorola did something to keep the 68881/68882 from being easily used with 68040+ as well. Don't quote me on the last 2 statements though. Here is a chart of some common FPU instructions and timings in cycles for the 68882, 68040 and 68060 in that order...

FMove FPn,FPn 21 2 1
FMove.D ,FPn 40 3 1
FMove.D FPn, 44 3 1
FAdd FPn,FPn 21 3 3
FSub FPn,FPn 21 3 3
FMul FPn,FPn 76 5 3
FDiv FPn,FPn 108 38 37
FSqrt FPn,FPn 110 103 68
FAdd.D ,FPn 75 3 3
FSub.D ,FPn 75 3 3
FMul.D ,FPn 95 5 3
FDiv.D ,FPn 127 38 37
FSqrt.D ,FPn 129 103 68

For trapping, add in 19 cycles for the trap and 17 for the RTE instruction on the 68060. Also consider that integer instructions and branches can operate in parallel with FPU instructions on the 68060 and can't while trapping. The 68060 would probably be faster with an all software floating point library in most cases. You should look at the Natami project if you want a faster 68k CPU and FPU.

Pentad · « **Reply #15 on:** April 21, 2011, 01:35:52 AM »

Quote from: matthey;632679

Trapping would slow the CPU down to a crawl. The 68882 is a dog compared to the 68060 FPU also....

I would agree with Matthey as well.

The FPU being moved on the same die with the optimizations that were made by Motorola just can't compare to an over-clocked '882.

-P

Iggy · « **Reply #16 on:** April 21, 2011, 01:56:32 AM »

Quote from: Karlos;632668

Trap and emulate is one of those things that the 680x0 programmer manuals will tell you about. All you are doing is implementing your own exception handler and then writing some code to deal with the exception (note that this all happens in supervisor state and you need to know the layout of your 680x0 exception stack frame which do vary from CPU to CPU).

You can write a handler to do some specific bit of work and then have it return. Normally, you'd write the handler to implement the unimplemented operation and return from the exception. However, you can go a step further and patch instead. Basically what you do here is modify the opcode that resulted in the exception and have it jump to a location of your choosing. If you are not fairly comfortable poking around in 680x0 supervisor mode this is not trivial to do, you have to be careful how much space there is to insert your jump and also you have to make sure you flush the instruction cache and so on. However, this is the basic gist of how tools like CyberPatcher and OxyPatcher do their magic.

I'm not sure if it will help you much but I played with some CPU exception handling a few years ago on 680x0 albeit for a different purpose:

http://www.amiga.org/forums/showthread.php?t=25181

In this case, I was using the CPU to trap illegal operations and have it invoke a language level exception mechanism (a C++ throw in this case). It does demonstrate some of the sneaky shenanigans you can get up to though.

A further question. On 68Ks without FPUs, do all floating point operations produce exceptions? Further, would it be possible to program an FPGA to emulate (or improve upon these trapped illegal opcodes?

Iggy · « **Reply #17 on:** April 21, 2011, 02:12:52 AM »

Quote from: matthey;632679

The 68060 would probably be faster with an all software floating point library in most cases. You should look at the Natami project if you want a faster 68k CPU and FPU.

Interesting idea. Are you suggesting that an EC processor with a software floating point library might be faster than using the built in FPU of a full 68060?

matthey · « **Reply #18 on:** April 21, 2011, 02:34:47 AM »

Quote from: Iggy;632683

A further question. On 68Ks without FPUs, do all floating point operations produce exceptions?

Yes.

Quote from: Iggy;632683

Further, would it be possible to program an FPGA to emulate (or improve upon these trapped illegal opcodes?

It's not that simple. The CPU is set up to communicate with co-processors. The FPU instructions actually have a 3 bit coprocessor ID specified in them. When set up properly, the instructions are sent to the appropriate co-processor without trapping. The co-processor signals when it's done with the instruction. If I remember correctly, Motorola changed something in the 68040+ so that the old external FPU co-processors didn't work any more. They wanted people using the newer style and faster built in FPU as well as customers buying them. You could probably research how external co-processors were done in the 68020/68030 and 68881/68882 manuals. More than 1 FPU was possible too. Someone at C= supposedly made a 16 math coprocessor card (8 should be the limit of co-processor IDs). That would probably have more processing power than a 68060 FPU if they could all be used in parallel. Still, some operations like fmove have less overhead being integrated to the CPU.

An fpga can contain a full FPU running much faster than a 68882. If it's not integrated with the CPU, it's going to have a bottleneck even if the traps can be avoided. The CPU+FPU can be contained in a fpga without the overhead. Less clocks without a longer pipeline than the 68060 are possible. Gunnar (Natami project) claims 1 cycle for a floating point multiply (fmul) should be possible for example.

Quote from: Iggy;632690

Interesting idea. Are you suggesting that an EC processor with a software floating point library might be faster than using the built in FPU of a full 68060?

No. A 68060 without FPU using a floating point software library would likely be faster than using a 68882. A 68060 without a FPU and software floating point would have to run several times faster than a 68060 with FPU to match the same performance. There is still a trap here as well unless the AmigaOS IEEE math libraries are used.

Iggy · « **Reply #19 on:** April 21, 2011, 03:05:11 AM »

Thanks matthey,
I haven't talked with Gunnar recently and I am aware that the best performance will result when the 68K is integrated into the FPGA.
But I did just exchange a message with another Natami team member, Peter. And he mentioned that the CQFP 68060 processor used one one of the 68K cards is the same processor I've been exploring. While it lacks an FPU and an MMU its clocked at a minimum of 75Mhz.
While I'm not concerned about the lack of an MMU, I was trying to find a work around for the FPU functions. If trapping exceptions and using a software library is plausible it might provide one solution.

matthey · « **Reply #20 on:** April 21, 2011, 04:42:08 AM »

Quote from: Iggy;632700

Thanks matthey,
While I'm not concerned about the lack of an MMU, I was trying to find a work around for the FPU functions. If trapping exceptions and using a software library is plausible it might provide one solution.

Thomas Richter created a program much like OxyPatcher called MuRedox that avoided the trapped instructions with replacement code on the fly. He posts on the Natami forum frequently. His code would need a fair amount of work to support all FPU instructions though. Optimized 68060 integer code and no traps should be faster than a 68882. If single precision was all that was required then the integer unit might be as fast as 1/4 the speed of a 68060 with FPU (my guess). Most calculations are done with extended precision though which can be time consuming for an integer processor, especially multiplication (no 64 bit integer multiplication in 68060), division and square root. A better option is for everyone who wants a fast 68060 to let the Natami team know so when there is enough demand and bug fixes that a N68070 with FPU can be burned in a real chip. Think 300-500MHz and faster/MHz than a 68060. Probably won't be ready for another year or two though

.

alexh · « **Reply #21 on:** April 21, 2011, 08:24:16 AM »

EC processors don't appear to be any more overclockable than their full counterparts for the same mask varient.

bloodline · « **Reply #22 on:** April 21, 2011, 08:34:40 AM »

Quote from: alexh;632725

EC processors don't appear to be any more overclockable than their full counterparts for the same mask varient.

Though, with missing/inactive hardware they should disipate less heat and for any given cooling solution you might be able to get a higher clock? No?

alexh · « **Reply #23 on:** April 21, 2011, 08:55:14 AM »

Quote from: bloodline;632726

Though, with missing/inactive hardware they should disipate less heat and for any given cooling solution you might be able to get a higher clock? No?

Maybe. Depends :

a) If an integrated FPU uses clock gating when not in use. This technique was used even back then.

b) Depends if you have a REAL EC/LC chip. i.e. Mask G59Y.

Most EC/LC parts I've seen used in the Amiga are actually full 060's which failed quality control in the MMU/FPU and are marked up with a different designator on the package. These will output the same heat as full 060's

A lot of EC/LC parts were sold as full 060 parts by wheeler-dealers trying to make a quick buck. Test it for a bit and if it works sell it as a full one. A-kin to opening up gfx pipelines on gfx cards.

Iggy · « **Reply #24 on:** April 21, 2011, 01:26:53 PM »

These processors are the same CQFP package 68060 used on one of the Natami 68060 boards. They should be rated at 75Mhz and do appear to be true EC components.
While I have heard rumors of over clocks as high as 133Mhz, I believe 100Mhz is a reachable goal.
The lack of an FPU is a disadvantage, but as '30 accelerators frequently use EC processors the added speed must offer some advantage,

joekster · « **Reply #25 on:** April 21, 2011, 03:03:11 PM »

I think this thread has gotten way off base. Just because something is possible, doesn't mean it's a good idea. The peripheral 68882 was really just a hack so you could use a math copro without an 020. The only adapters I know of are zorro-2 (microbotics starboard comes to mind) and would have really high latencies. I would guess that it would run 1/4 of the speed in peripheral mode. But, the really big problem is that there is nearly NO software that takes advantage of a peripheral math copro. I think only v1.0 of real3d supports it. Lightwave, vistapro, turbosilver, etc do not support a peripheral math copro.

The only way an 882 could reach 100mhz would be with liquid nitrogen...

jj · « **Reply #26 on:** April 21, 2011, 03:10:11 PM »

What do you mean , did the 020 and 030 have an FPU Then.

Why did all the blizzard 030 acccelrators have option for 68882 was this because they used non full types ?

psxphill · « **Reply #27 on:** April 21, 2011, 03:41:25 PM »

Quote from: matthey;632679

Trapping would slow the CPU down to a crawl.

Not if you patch the instruction when it traps, so it won't trap the next time.

Franko · « **Reply #28 on:** April 21, 2011, 03:46:27 PM »

Quote from: joekster;632796

I think this thread has gotten way off base. Just because something is possible, doesn't mean it's a good idea. The peripheral 68882 was really just a hack so you could use a math copro without an 020. The only adapters I know of are zorro-2 (microbotics starboard comes to mind) and would have really high latencies. I would guess that it would run 1/4 of the speed in peripheral mode. But, the really big problem is that there is nearly NO software that takes advantage of a peripheral math copro. I think only v1.0 of real3d supports it. Lightwave, vistapro, turbosilver, etc do not support a peripheral math copro.

The only way an 882 could reach 100mhz would be with liquid nitrogen...

There are numerous Amiga progs that benefit greatly from an FPU (especially GFX/Audio & DTP utils) PPaint, Lame, PageStream, Final Writer, APDF, Mystic View, MpegA.library to name but a few...

So to claim that "there is nearly NO software that takes advantage of a peripheral math copro" is absolute nonsense, you either haven't looked hard enough or don't use a FPU otherwise you would already know this...

psxphill · « *Last Edit: April 21, 2011, 05:31:20 PM by psxphill* »

Quote from: Franko;632809

So to claim that "there is nearly NO software that takes advantage of a peripheral math copro" is absolute nonsense, you either haven't looked hard enough or don't use a FPU otherwise you would already know this...

He's talking about using a 68881/68882 in memory mapped mode. I don't remember any software supporting that on the Amiga, you'd struggle to actually find the hardware as well. It was only the 68000/68010 that needed it, on the 68020+ it worked as a coprocessor.

Author Topic: 68060 and the 68882 (Read 15503 times)

matthey

Re: 68060 and the 68882

Pentad

Re: 68060 and the 68882

Iggy

Re: 68060 and the 68882

Iggy

Re: 68060 and the 68882

matthey

Re: 68060 and the 68882

Iggy

Re: 68060 and the 68882

matthey

Re: 68060 and the 68882

alexh

Re: 68060 and the 68882

bloodline

Re: 68060 and the 68882

alexh

Re: 68060 and the 68882

Iggy

Re: 68060 and the 68882

joekster

Re: 68060 and the 68882

jj

Re: 68060 and the 68882

psxphill

Re: 68060 and the 68882

Franko

Re: 68060 and the 68882

psxphill

Re: 68060 and the 68882