Amiga.org
Amiga computer related discussion => Amiga Hardware Issues and discussion => Topic started by: Iggy on April 20, 2011, 06:14:53 PM
-
Can a 68060 without an FPU use a 68882RC50 as a memory mapped peripheral?
What is the fastest overclock for a 68882?
-
Yes, any 680x0 can use a 68882 as a memory mapped peripheral.
I've been able to clock several '882's to 60mhz. I would guess that the fastest would be in the range of 66-70mhz.
-
Yes, any 680x0 can use a 68882 as a memory mapped peripheral.
I've been able to clock several '882's to 60mhz. I would guess that the fastest would be in the range of 66-70mhz.
How about 75Mhz if heat sinked and fan cooled?
-
>68882 - 32-bit FPU - Frequency: 16 - 50MHz
In my A3000D 030@25MHz, I've modded the 68882@25MHz to 50MHz!
In an (old) article, it was write that Commodore has tested these 68882 at 100MHz (information not confirmed!!).
So I think there is actually an operating margin, joekster is right: 50MHz can certainly work up to 66/70MHz!
(http://t.hotimg.com/ts/FKG6dJs) (http://www.hotimg.com/direct/FKG6dJs)
-
Does anyone know how to get a hold of the creator of the Oxyron Patcher?
The last address I have is Achim Koyen, Nûbelfeld 49, D-24972 Quern, Germany.
Anyone know a newer address or better yet an e-mail address?
-
>68882 - 32-bit FPU - Frequency: 16 - 50MHz
In my A3000D 030@25MHz, I've modded the 68882@25MHz to 50MHz!
In an (old) article, it was write that Commodore has tested these 68882 at 100MHz (information not confirmed!!).
So I think there is actually an operating margin, joekster is right: 50MHz can certainly work up to 66/70MHz!
(http://t.hotimg.com/ts/FKG6dJs) (http://www.hotimg.com/image/FKG6dJs)
Can you send me that article? 68EC060s are more over-clockable then standard 68060s. I'd like to experiment with combining an EC with an '82.
-
AMINET: http://aminet.net/package/docs/hard/A3000-50
-
AMINET: http://aminet.net/package/docs/hard/A3000-50
So, potentially, we could run a 75Mhz 68EC060 and a 50Mhz 68882 (both with heat sinks and fans) at 100Mhz?
-
So, potentially, we could run a 75Mhz 68EC060 and a 50Mhz 68882 (both with heat sinks and fans) at 100Mhz?
I'd be surprised if the 68882 lived for very long at 100MHz as that's a factor of 2 overclock from their highest rating. Remember that the last mask 68060's run at 100MHz due to their improved manufacturing tolerances and also the fact they are 3.3v parts which dissipates less heat than 5V logic.
At 100MHz, and with appropriate care taken to the method (eg, trap and patch rather than trap and emulate) you'd probably be able to write a software floating point library that is faster than you'd get a 68882 running in any case.
-
less work to just buy a fully working 060
-
less work to just buy a fully working 060
Not very easy getting a 68060 to 100Mhz.
-
I'd be surprised if the 68882 lived for very long at 100MHz as that's a factor of 2 overclock from their highest rating. Remember that the last mask 68060's run at 100MHz due to their improved manufacturing tolerances and also the fact they are 3.3v parts which dissipates less heat than 5V logic.
At 100MHz, and with appropriate care taken to the method (eg, trap and patch rather than trap and emulate) you'd probably be able to write a software floating point library that is faster than you'd get a 68882 running in any case.
Karlos, where can I learn about both trap and patch and trap and emulate? Specifically, I'm interested in the trapping of FPU calls and the potential to improve upon them.
-
Karlos, where can I learn about both trap and patch and trap and emulate? Specifically, I'm interested in the trapping of FPU calls and the potential to improve upon them.
Trap and emulate is one of those things that the 680x0 programmer manuals will tell you about. All you are doing is implementing your own exception handler and then writing some code to deal with the exception (note that this all happens in supervisor state and you need to know the layout of your 680x0 exception stack frame which do vary from CPU to CPU).
You can write a handler to do some specific bit of work and then have it return. Normally, you'd write the handler to implement the unimplemented operation and return from the exception. However, you can go a step further and patch instead. Basically what you do here is modify the opcode that resulted in the exception and have it jump to a location of your choosing. If you are not fairly comfortable poking around in 680x0 supervisor mode this is not trivial to do, you have to be careful how much space there is to insert your jump and also you have to make sure you flush the instruction cache and so on. However, this is the basic gist of how tools like CyberPatcher and OxyPatcher do their magic.
I'm not sure if it will help you much but I played with some CPU exception handling a few years ago on 680x0 albeit for a different purpose:
http://www.amiga.org/forums/showthread.php?t=25181
In this case, I was using the CPU to trap illegal operations and have it invoke a language level exception mechanism (a C++ throw in this case). It does demonstrate some of the sneaky shenanigans you can get up to though.
-
Can a 68060 without an FPU use a 68882RC50 as a memory mapped peripheral?
What is the fastest overclock for a 68882?
They can usually run @ 60-75 Mhz. The latest revision and mask MC68882 may even go faster.
But you won't see the performance you would have with a 50 Mhz 68030 by using the instruction trap kludge.
-
But you won't see the performance you would have with a 50 Mhz 68030 by using the instruction trap kludge.
Trapping would slow the CPU down to a crawl. The 68882 is a dog compared to the 68060 FPU also. ~1/8 of the speed on average at the same clock rate comes to mind (not counting any trapping overhead). If I remember correctly, Motorola did something to keep the 68881/68882 from being easily used with 68040+ as well. Don't quote me on the last 2 statements though. Here is a chart of some common FPU instructions and timings in cycles for the 68882, 68040 and 68060 in that order...
FMove FPn,FPn 21 2 1
FMove.D ,FPn 40 3 1
FMove.D FPn, 44 3 1
FAdd FPn,FPn 21 3 3
FSub FPn,FPn 21 3 3
FMul FPn,FPn 76 5 3
FDiv FPn,FPn 108 38 37
FSqrt FPn,FPn 110 103 68
FAdd.D ,FPn 75 3 3
FSub.D ,FPn 75 3 3
FMul.D ,FPn 95 5 3
FDiv.D ,FPn 127 38 37
FSqrt.D ,FPn 129 103 68
For trapping, add in 19 cycles for the trap and 17 for the RTE instruction on the 68060. Also consider that integer instructions and branches can operate in parallel with FPU instructions on the 68060 and can't while trapping. The 68060 would probably be faster with an all software floating point library in most cases. You should look at the Natami project if you want a faster 68k CPU and FPU.
-
Trapping would slow the CPU down to a crawl. The 68882 is a dog compared to the 68060 FPU also....
I would agree with Matthey as well.
The FPU being moved on the same die with the optimizations that were made by Motorola just can't compare to an over-clocked '882.
-P
-
Trap and emulate is one of those things that the 680x0 programmer manuals will tell you about. All you are doing is implementing your own exception handler and then writing some code to deal with the exception (note that this all happens in supervisor state and you need to know the layout of your 680x0 exception stack frame which do vary from CPU to CPU).
You can write a handler to do some specific bit of work and then have it return. Normally, you'd write the handler to implement the unimplemented operation and return from the exception. However, you can go a step further and patch instead. Basically what you do here is modify the opcode that resulted in the exception and have it jump to a location of your choosing. If you are not fairly comfortable poking around in 680x0 supervisor mode this is not trivial to do, you have to be careful how much space there is to insert your jump and also you have to make sure you flush the instruction cache and so on. However, this is the basic gist of how tools like CyberPatcher and OxyPatcher do their magic.
I'm not sure if it will help you much but I played with some CPU exception handling a few years ago on 680x0 albeit for a different purpose:
http://www.amiga.org/forums/showthread.php?t=25181
In this case, I was using the CPU to trap illegal operations and have it invoke a language level exception mechanism (a C++ throw in this case). It does demonstrate some of the sneaky shenanigans you can get up to though.
A further question. On 68Ks without FPUs, do all floating point operations produce exceptions? Further, would it be possible to program an FPGA to emulate (or improve upon these trapped illegal opcodes?
-
The 68060 would probably be faster with an all software floating point library in most cases. You should look at the Natami project if you want a faster 68k CPU and FPU.
Interesting idea. Are you suggesting that an EC processor with a software floating point library might be faster than using the built in FPU of a full 68060?
-
A further question. On 68Ks without FPUs, do all floating point operations produce exceptions?
Yes.
Further, would it be possible to program an FPGA to emulate (or improve upon these trapped illegal opcodes?
It's not that simple. The CPU is set up to communicate with co-processors. The FPU instructions actually have a 3 bit coprocessor ID specified in them. When set up properly, the instructions are sent to the appropriate co-processor without trapping. The co-processor signals when it's done with the instruction. If I remember correctly, Motorola changed something in the 68040+ so that the old external FPU co-processors didn't work any more. They wanted people using the newer style and faster built in FPU as well as customers buying them. You could probably research how external co-processors were done in the 68020/68030 and 68881/68882 manuals. More than 1 FPU was possible too. Someone at C= supposedly made a 16 math coprocessor card (8 should be the limit of co-processor IDs). That would probably have more processing power than a 68060 FPU if they could all be used in parallel. Still, some operations like fmove have less overhead being integrated to the CPU.
An fpga can contain a full FPU running much faster than a 68882. If it's not integrated with the CPU, it's going to have a bottleneck even if the traps can be avoided. The CPU+FPU can be contained in a fpga without the overhead. Less clocks without a longer pipeline than the 68060 are possible. Gunnar (Natami project) claims 1 cycle for a floating point multiply (fmul) should be possible for example.
Interesting idea. Are you suggesting that an EC processor with a software floating point library might be faster than using the built in FPU of a full 68060?
No. A 68060 without FPU using a floating point software library would likely be faster than using a 68882. A 68060 without a FPU and software floating point would have to run several times faster than a 68060 with FPU to match the same performance. There is still a trap here as well unless the AmigaOS IEEE math libraries are used.
-
Thanks matthey,
I haven't talked with Gunnar recently and I am aware that the best performance will result when the 68K is integrated into the FPGA.
But I did just exchange a message with another Natami team member, Peter. And he mentioned that the CQFP 68060 processor used one one of the 68K cards is the same processor I've been exploring. While it lacks an FPU and an MMU its clocked at a minimum of 75Mhz.
While I'm not concerned about the lack of an MMU, I was trying to find a work around for the FPU functions. If trapping exceptions and using a software library is plausible it might provide one solution.
(http://www.amiga.org/forums/member.php?u=5111)
-
Thanks matthey,
While I'm not concerned about the lack of an MMU, I was trying to find a work around for the FPU functions. If trapping exceptions and using a software library is plausible it might provide one solution.
Thomas Richter created a program much like OxyPatcher called MuRedox that avoided the trapped instructions with replacement code on the fly. He posts on the Natami forum frequently. His code would need a fair amount of work to support all FPU instructions though. Optimized 68060 integer code and no traps should be faster than a 68882. If single precision was all that was required then the integer unit might be as fast as 1/4 the speed of a 68060 with FPU (my guess). Most calculations are done with extended precision though which can be time consuming for an integer processor, especially multiplication (no 64 bit integer multiplication in 68060), division and square root. A better option is for everyone who wants a fast 68060 to let the Natami team know so when there is enough demand and bug fixes that a N68070 with FPU can be burned in a real chip. Think 300-500MHz and faster/MHz than a 68060. Probably won't be ready for another year or two though ;).
-
EC processors don't appear to be any more overclockable than their full counterparts for the same mask varient.
-
EC processors don't appear to be any more overclockable than their full counterparts for the same mask varient.
Though, with missing/inactive hardware they should disipate less heat and for any given cooling solution you might be able to get a higher clock? No?
-
Though, with missing/inactive hardware they should disipate less heat and for any given cooling solution you might be able to get a higher clock? No?
Maybe. Depends :
a) If an integrated FPU uses clock gating when not in use. This technique was used even back then.
b) Depends if you have a REAL EC/LC chip. i.e. Mask G59Y.
Most EC/LC parts I've seen used in the Amiga are actually full 060's which failed quality control in the MMU/FPU and are marked up with a different designator on the package. These will output the same heat as full 060's
A lot of EC/LC parts were sold as full 060 parts by wheeler-dealers trying to make a quick buck. Test it for a bit and if it works sell it as a full one. A-kin to opening up gfx pipelines on gfx cards.
-
These processors are the same CQFP package 68060 used on one of the Natami 68060 boards. They should be rated at 75Mhz and do appear to be true EC components.
While I have heard rumors of over clocks as high as 133Mhz, I believe 100Mhz is a reachable goal.
The lack of an FPU is a disadvantage, but as '30 accelerators frequently use EC processors the added speed must offer some advantage,
-
I think this thread has gotten way off base. Just because something is possible, doesn't mean it's a good idea. The peripheral 68882 was really just a hack so you could use a math copro without an 020. The only adapters I know of are zorro-2 (microbotics starboard comes to mind) and would have really high latencies. I would guess that it would run 1/4 of the speed in peripheral mode. But, the really big problem is that there is nearly NO software that takes advantage of a peripheral math copro. I think only v1.0 of real3d supports it. Lightwave, vistapro, turbosilver, etc do not support a peripheral math copro.
The only way an 882 could reach 100mhz would be with liquid nitrogen...
-
What do you mean , did the 020 and 030 have an FPU Then.
Why did all the blizzard 030 acccelrators have option for 68882 was this because they used non full types ?
-
Trapping would slow the CPU down to a crawl.
Not if you patch the instruction when it traps, so it won't trap the next time.
-
I think this thread has gotten way off base. Just because something is possible, doesn't mean it's a good idea. The peripheral 68882 was really just a hack so you could use a math copro without an 020. The only adapters I know of are zorro-2 (microbotics starboard comes to mind) and would have really high latencies. I would guess that it would run 1/4 of the speed in peripheral mode. But, the really big problem is that there is nearly NO software that takes advantage of a peripheral math copro. I think only v1.0 of real3d supports it. Lightwave, vistapro, turbosilver, etc do not support a peripheral math copro.
The only way an 882 could reach 100mhz would be with liquid nitrogen...
There are numerous Amiga progs that benefit greatly from an FPU (especially GFX/Audio & DTP utils) PPaint, Lame, PageStream, Final Writer, APDF, Mystic View, MpegA.library to name but a few... :)
So to claim that "there is nearly NO software that takes advantage of a peripheral math copro" is absolute nonsense, you either haven't looked hard enough or don't use a FPU otherwise you would already know this... ;)
-
So to claim that "there is nearly NO software that takes advantage of a peripheral math copro" is absolute nonsense, you either haven't looked hard enough or don't use a FPU otherwise you would already know this... ;)
He's talking about using a 68881/68882 in memory mapped mode. I don't remember any software supporting that on the Amiga, you'd struggle to actually find the hardware as well. It was only the 68000/68010 that needed it, on the 68020+ it worked as a coprocessor.
-
A further question. On 68Ks without FPUs, do all floating point operations produce exceptions?
Any opcode (floating point or otherwise) not implemented by the CPU will result in an unimplemented/illegal opcode exception.
Further, would it be possible to program an FPGA to emulate (or improve upon these trapped illegal opcodes?
Well, however you decide to implement your trap handler, the cost of invoking the exception mechanism is what kills you. For a fast 68060, I'd strongly advocate trap-and-patch, which only goes through the exception handler once for each encountered illegal opcode, over any mechanism that requires the exception to happen every time it hits said opcode.
The next decision is, how to implement the desired operation. Performing complete emulation of floating point on the 68060 will not be that fast (probably faster than a reeal 68882 for well-written, superscalar 68060 code) but I am not sure how you'd invoke an external FPGA device to do it. Unless it's a memory mapped bit of hardware that you can move data to and read back from. IIRC, on the 68020, the CALLM instruction allowed the invoking of external processors but that instruction is missing on later parts.
-
Sounds interesting. If I can find a way to develop trap and patch code that relies on a external memory mapped peripheral I may be able to increase the speed of the math operations.
An FPGA is not the only option for this peripheral, I'm also considering what could be done with an e300 class PPC (which would be cheaper than an FPGA).
-
Sounds interesting. If I can find a way to develop trap and patch code that relies on a external memory mapped peripheral I may be able to increase the speed of the math operations.
An FPGA is not the only option for this peripheral, I'm also considering what could be done with an e300 class PPC (which would be cheaper than an FPGA).
I fear that approach is taking you down the path towards cache coherency issues.
Remember, you would either have to make sure that any memory shared between both processors was either uncached by both, or you'd have to take care to flush cache lines, which can be expensive. All of which is taking you back to where PowerUP and WarpOS were, except you'd be getting cache issues per externally-handled instruction.
-
I fear that approach is taking you down the path towards cache coherency issues.
Remember, you would either have to make sure that any memory shared between both processors was either uncached by both, or you'd have to take care to flush cache lines, which can be expensive. All of which is taking you back to where PowerUP and WarpOS were, except you'd be getting cache issues per externally-handled instruction.
I was not thinking of caching at all. Just a very small memory mapped area for exchange.
-
I clocked my 882 on my 3k motherboard to 50 about 15 years ago. Runs perfect to this day... Suspect with a heat sink 75 should be easily achieved.
Good luck!
Matt
-
@iggy
Sorry to drag up a very old thread but I read this with interest. I couldn't fathom why you needed to to run an 060 at 100MHz and even more confusing was what application you had for running the FPU so hard? Only things I could think of was you are doing raytracing, or you are a scener but at demo parties the productions need to run on the party machine - usually an 060 at 50 to 75MHz or you are one of those overclockers who would think of nothing about cooling their system with liquid nitrogen just because they can!
-
@iggy
Sorry to drag up a very old thread but I read this with interest. I couldn't fathom why you needed to to run an 060 at 100MHz and even more confusing was what application you had for running the FPU so hard? Only things I could think of was you are doing raytracing, or you are a scener but at demo parties the productions need to run on the party machine - usually an 060 at 50 to 75MHz or you are one of those overclockers who would think of nothing about cooling their system with liquid nitrogen just because they can!
I have upgraded plenty of Apollo 1260, Blizzrd 1260 and even a few CS-MKII cards to 100+Mhz using the rev 6 version of the full 68060. It runs wuite cool and has been done by a lot of people. Even more so in a CD MKII. I have seen those do 120Mhz.
-
Thomas Richter created a program much like OxyPatcher called MuRedox that avoided the trapped instructions with replacement code on the fly. He posts on the Natami forum frequently. His code would need a fair amount of work to support all FPU instructions though. Optimized 68060 integer code and no traps should be faster than a 68882.
Sorry for necro-bumping this, but just one correction: MuRedox *does* patch all FPU and integer instructions, quite unlike other programs. The interesting part of it is a "code generator" that creates the necessary stub-code on the fly. That is, if a missing instruction is used for the first time, the program appears a little bit slower as MuRedox first needs to create the necessary code, but it will "learn" quickly.
-
Sorry for necro-bumping this, but just one correction: MuRedox *does* patch all FPU and integer instructions, quite unlike other programs. The interesting part of it is a "code generator" that creates the necessary stub-code on the fly. That is, if a missing instruction is used for the first time, the program appears a little bit slower as MuRedox first needs to create the necessary code, but it will "learn" quickly.
does it work if you have a 68060 without an mmu or fpu?