Author Topic: Can FPGA 060 run more than 100 Mhz? (Read 12451 times)

NovaCoder · « **Reply #14 on:** April 16, 2014, 01:52:37 AM »

Quote from: matthey;762580

The ECS/AGA gfx become a bottleneck and then it makes sense to add gfx on the accelerator and/or move to a new stand alone board or motherboard replacement. A faster CPU and gfx will allow you to port more modern games and use DOSBox for older games instead of porting them .

When you go down that road (GFX card attached to the accelerator) you might as well replace the entire motherboard (FPGA Arcade style). If you want to keep it authentic (e.g. use AGA) then you can do some very impressive things if you have enough MIPS, even the HD transfer rate will benefit if the card had the same kind of memory timing improvements used by the ACA cards.

Hopefully someone will do it one day, maybe the Vampire600 will come back to life and inspire someone to try the same thing for the A1200 and big box Amiga's

matthey · « **Reply #15 on:** April 16, 2014, 01:57:56 AM »

Quote from: SpeedGeek;762588

250MHz FPGA chips have been available for quite a while now. That's the easy part... the hard part is reverse engineering the 060 and programming the FPGA to correctly emulate an 060 under a fairly large number of different operating conditions.

The internal clock speed of the fpga is not the same as the speed an fpga CPU will run. It will be less even with a deep pipeline.

Quote from: psxphill;762611

You can't be 110% compatible, if you deviate then you're less than 100% compatible.

Unless all exceptions are generated the same as the 68060 (unimplemented instruction etc) then it's not compatible.

Handling more in hardware without an exception shouldn't cause a problem. The way exceptions are handled on a 68k is similar although some 68k processors generate different exceptions and the stack frames used by different 68k processors are different. A new fpga CPU would probably be enough different that some changes and incompatibilities would be necessary in Supervisor mode like the 68060 was over the 68040. Is it better to create a 68060 "copy" that has a limited future or an fpga CPU that can be further developed, improved in performance and possibly adopted for embedded applications as well as retro use?

psxphill · « **Reply #16 on:** April 16, 2014, 08:19:01 AM »

Quote from: matthey;762628

Is it better to create a 68060 "copy" that has a limited future or an fpga CPU that can be further developed, improved in performance and possibly adopted for embedded applications as well as retro use?

It is better to create a 68060 "copy" that has a guaranteed future for retro use because it is the only accurate one.

People using softcores for embedded use don't have any emotional attachment so will go with whatever is the best/cheapest/etc. Starting with a 35 year old ISA is not going to help. Any traction you get today could be lost tomorrow & you're left with something that fits no purpose.

So I will rewrite your loaded question:

Is it better to create a 68060 "copy" that will be used in Amiga/Mac/AtariST fgpa for many years to come or an fpga CPU that will get abandoned when it can't keep up with other designs that aren't limited to a 1978 ISA?

Or do both, but create the 68060 "copy" first because that is where it will get the most use.

Lurch · « **Reply #17 on:** April 16, 2014, 08:46:31 AM »

Quote from: NovaCoder;762564

Come to think of it, only DosBox AGA and NetSurf AGA need more than 130 MIPS!

Netsurf isn't that bad at 100MIPS :-) What makes it appear slower than say ibrowse is the 68k version appears to try and load the entire page first before displaying it.

I use netsurf now and then with RTG quite useable. Of course faster is always better.

NovaCoder · « **Reply #18 on:** April 16, 2014, 12:01:52 PM »

Quote from: Lurch;762647

Netsurf isn't that bad at 100MIPS :-) What makes it appear slower than say ibrowse is the 68k version appears to try and load the entire page first before displaying it.

I use netsurf now and then with RTG quite useable. Of course faster is always better.

I was just struggling to think of something else for 68k that could make use of more that 130 MIPS

OK how about 'The Curse of Monkey Island', that sucker could really use 200+ MIPS

psxphill · « **Reply #19 on:** April 16, 2014, 12:47:54 PM »

Quote from: NovaCoder;762661

I was just struggling to think of something else for 68k that could make use of more that 130 MIPS

First person shooters, PCTask/PCx?

There are probably some wild demos that could benefit as well, which usually only run full speed on WinUAE.

billt · « **Reply #20 on:** April 17, 2014, 08:25:48 PM »

Quote from: SpeedGeek;762588

250MHz FPGA chips have been available for quite a while now. That's the easy part...

Not that simple. When they say an "FPGA runs at 250MHz", what does that actually mean? What is the FPGA configured to do/be for that measurement? Different designs will have different pathways in the FPGA, and thus different results and different max running speeds. You're extremely unlikely to end up 1:1 with whatever that 250MHz claim, or any other clock rate claim is based on in an FPGA. Marketing people at chip compnies feel the need to say things, but IMHO, stating system clock rates for an FPGA chip doesn't make very much sense. If a chip vendor is saying a clock rate based on some reference design they've put into each FPGA, the exact same reference design, then such a clock rate might be a useful general comparison of speed from one FPGA product to another, so you might be able to say that Generation 2 of our product runs at approximately 2x clock rate as our Generation 1 product. But in terms of knowing what clock rate your design (such as tg68 or 68K00 or 68K30) will run at is pretty useless. You learn this when you define a clock period and run synthesis if that speed works or not. (Done by timing analysis of the synthesized and place/routed result in the FPGA tools) If you decide on a target clock speed for your final product based on whatever particular FPGA chip, then you essentially get a yes or no answer from the timing checks. If yes, maybe you can go even faster, and try that. If no, then either work on improving your design and creep closer toward your goal, or realize that you need to change your goal (and datasheet and marketing) or maybe change your FPGA to a faster one. Even if whatever marketing blurb said the chip runs at your target clock rate.

I used to be part of a team that designed FPGA silicon, so that other people could buy those FPGA chips and put their own stuff inside. The architect/team lead once said that the 0.35micron chips could do 350MHz, as long as you only wanted an inverter chain that never left the IO buffers to get into the core of the chip. That wouldn't be very useful. Any design that was inside the core would be slower than 350MHz due to the longer pathways.

A simple implementation of a particular instruction set processor will run at a slower clock rate than a pipelined implementation of the same instruction set thing. A longer pipeline design will run at a faster clock rate than a shorter pipeline. (The "simple design" is essentially a 1-stage pipeline)

But longer pipelines waste more time on a branch (stuff happening in the pipeline that gets thrown out, then refill the pipeline with the branch's stuff) than a shorter pipeline design. So there is a tradeoff between pipeline length, clock speed and branching.

Quote

the hard part is reverse engineering the 060 and programming the FPGA to correctly emulate an 060 under a fairly large number of different operating conditions.

You don't need so much to reverse-engineer the 060 chip. Anything you put into an FPGA will be somewhat different from the 060 silicon anyway. No FPGA place and route tool will give you exactly what's in the 060 silicon, even if it came from the same RTL, which isn't very likely to be the case anyway. You'd have to tool the RTL to the FPGA paradigm, and perhaps to some extent to your particular FPGA and tool (ISE or Quartus, etc.)

What such a design would really be, is an entirely new implementation of the instruction set. That instruction set is published and known publicly. It would be nice to do that as portable-friendly as possible, considering that there may be some differences in inferring memories (FIFOs, RAMs cor cache, etc) from one FPGA architecture to another (spartan3 vs spartan6 vs Cyclone5 etc) or tool to another (ISE vs Quartus), so such things should find their way into instantiable blocks to keep the rest of the RTL code portable. Such things were likely instantiable blocks at Motorola way back when, but as they were targeting a particular fab process, they may have coded to whatever their memory compiler tool (automated memory block generator to give you a chunk of silicon layout, and whatever connections that comes out of that). One would have to think of a good generic "API" of connections to fit a generic memory into, and then make such a defined wrapper around any particular FPGA inferred or instantiated memory block to keep the parent block nice and generic.

Anyway... Someone needs to define an instruction fetch and parse unit, an ALU block, etc. that is compatible with the 68060 instruction set. Or 68020 or 030 instruction set, whatever is really the best choice.

We tend to say 68060, as we perceive that to be the newest and fastest 68k chip. Motorola made certain decisions that all together concluded with the 68060 implementation. Such decisions would consider how often an instruction has actually been used, how complicated it is (and thus how much it affects die area/cost, power, and max clock rate capability)

We might today want to reconsider certain decisions, such as added or removed instructions compared to earlier 68k family parts, and make some adjustments. Maybe add some instructions back in. Maybe redo the 68040 instruction set with more of a 68060 block diagram. Or an 020, whichever particular set of instructions is most beneficial. If AmigaOS or some target application software, or more recent compiler innovations tend to make frequent use of some instruction that was removed in the 68060, and thus spends a lot of time emulating that instruction on a real 68060 accelerator, then maybe we should put that instruction back in. Putting back certain instructions can make for a better performance of the software running an 020-optimized binary compared to an 060-optimized binary running on an exact-as-possible 060 implementation.

This is not really reverse-engineering. It would be a new engineering of the published 68k instruction set. Maybe a new engineering of a new combination of all 68k family instructions not before seen in any particular silicon from Motorola/Freescale... Or even a superset, adding in some entirely new instructions in addition to whatever we take from the Motorola books. (SIMD anyone??) Apply whatever microprocessor design concepts you like. Make an 020 instruction super-scalar. Or make an 060 instruction set not super-scalar. Whatever floats your boat. I'm not sure how fancy the tg68 is, or 68k00 or any other of the several 68k softcores out there. (ao_68000 etc)

This is an opportunity to do something even more modern in concept than any 68k ever was. Depending on the FPGA, that coud come out at a higher or at a lower clock rate than previous Motorola products. Depending on the price of the FPGA, some particular "possible" performance may or may not be worth achieving. (I'm not going to pay $10000 (ten grand) for a particular FPGA to achieve the highest possible speed, but I'll pay a few hundred maybe to get the best we can from that price range)

And now that we have SoC type FPGAs coming out, with hard-wired ASIC style ARM processors inside them, we have a new possibility. The FPGA part for IO/connetivity interoperation with whatever (060 PCB socket) and/or for Minimig circuitry, and then emulate the 68k in software on the ARM. Interpret it (Cyclone emulation) or jit (might need to be created). I'm not sure how either form of software emulation in the ARM, at the hardwired ARM clock rates, would compare to FPGA implemented softcore 68k processor.

wawrzon · « **Reply #21 on:** April 17, 2014, 09:06:03 PM »

Quote from: NovaCoder;762661

I was just struggling to think of something else for 68k that could make use of more that 130 MIPS

cube? the executable ive here gave me about 2-3 fps on 060/50+voodoo3. alas i dont even remember where it comes from (perhaps from alain) because it has a bug in mouse handling it would be good to correct at least for those who want to run it under uae.

ElPolloDiabl · « **Reply #22 on:** April 17, 2014, 10:23:00 PM »

PowerPC first came out with short pipes. Then it got as large as an x86.
Intel has the bonus of very fast cache, which gives it a serious lead over any amd chip.

040: 8k cache 25mhz
060 16k cache 50mhz (100mhz internal)

The access to RAM probably held back the speed, back then?

Iggy · « **Reply #23 on:** April 18, 2014, 12:40:12 AM »

Frankly, I'd take a 68030 with a full 68882.
Make that as fast as possible and who needs an '060 (unless you enjoy compatibility issues).

SpeedGeek · « **Reply #24 on:** April 18, 2014, 02:57:31 PM »

Quote from: matthey;762628

The internal clock speed of the fpga is not the same as the speed an fpga CPU will run. It will be less even with a deep pipeline.

The same thing can be said about the 68K family variants themselves. The effective CLK speed of internal pipelines and other logic functions is one very big variable. So chip manufacturers rate their chips at a given input CLK speed, otherwise possible effective CLK speed ratings would be practically useless. Do you want to buy a CPU or FPGA with a typical CLK speed rating of 50-200 MHz?

Quote from: billt;762754

Not that simple. When they say an "FPGA runs at 250MHz", what does that actually mean? What is the FPGA configured to do/be for that measurement? Different designs will have different pathways in the FPGA, and thus different results and different max running speeds. You're extremely unlikely to end up 1:1 with whatever that 250MHz claim, or any other clock rate claim is based on in an FPGA. Marketing people at chip compnies feel the need to say things, but IMHO, stating system clock rates for an FPGA chip doesn't make very much sense. If a chip vendor is saying a clock rate based on some reference design they've put into each FPGA, the exact same reference design, then such a clock rate might be a useful general comparison of speed from one FPGA product to another, so you might be able to say that Generation 2 of our product runs at approximately 2x clock rate as our Generation 1 product. But in terms of knowing what clock rate your design (such as tg68 or 68K00 or 68K30) will run at is pretty useless. You learn this when you define a clock period and run synthesis if that speed works or not. (Done by timing analysis of the synthesized and place/routed result in the FPGA tools) If you decide on a target clock speed for your final product based on whatever particular FPGA chip, then you essentially get a yes or no answer from the timing checks. If yes, maybe you can go even faster, and try that. If no, then either work on improving your design and creep closer toward your goal, or realize that you need to change your goal (and datasheet and marketing) or maybe change your FPGA to a faster one. Even if whatever marketing blurb said the chip runs at your target clock rate.

I used to be part of a team that designed FPGA silicon, so that other people could buy those FPGA chips and put their own stuff inside. The architect/team lead once said that the 0.35micron chips could do 350MHz, as long as you only wanted an inverter chain that never left the IO buffers to get into the core of the chip. That wouldn't be very useful. Any design that was inside the core would be slower than 350MHz due to the longer pathways.

A simple implementation of a particular instruction set processor will run at a slower clock rate than a pipelined implementation of the same instruction set thing. A longer pipeline design will run at a faster clock rate than a shorter pipeline. (The "simple design" is essentially a 1-stage pipeline)

But longer pipelines waste more time on a branch (stuff happening in the pipeline that gets thrown out, then refill the pipeline with the branch's stuff) than a shorter pipeline design. So there is a tradeoff between pipeline length, clock speed and branching.

You don't need so much to reverse-engineer the 060 chip. Anything you put into an FPGA will be somewhat different from the 060 silicon anyway. No FPGA place and route tool will give you exactly what's in the 060 silicon, even if it came from the same RTL, which isn't very likely to be the case anyway. You'd have to tool the RTL to the FPGA paradigm, and perhaps to some extent to your particular FPGA and tool (ISE or Quartus, etc.)

What such a design would really be, is an entirely new implementation of the instruction set. That instruction set is published and known publicly. It would be nice to do that as portable-friendly as possible, considering that there may be some differences in inferring memories (FIFOs, RAMs cor cache, etc) from one FPGA architecture to another (spartan3 vs spartan6 vs Cyclone5 etc) or tool to another (ISE vs Quartus), so such things should find their way into instantiable blocks to keep the rest of the RTL code portable. Such things were likely instantiable blocks at Motorola way back when, but as they were targeting a particular fab process, they may have coded to whatever their memory compiler tool (automated memory block generator to give you a chunk of silicon layout, and whatever connections that comes out of that). One would have to think of a good generic "API" of connections to fit a generic memory into, and then make such a defined wrapper around any particular FPGA inferred or instantiated memory block to keep the parent block nice and generic.

Anyway... Someone needs to define an instruction fetch and parse unit, an ALU block, etc. that is compatible with the 68060 instruction set. Or 68020 or 030 instruction set, whatever is really the best choice.

We tend to say 68060, as we perceive that to be the newest and fastest 68k chip. Motorola made certain decisions that all together concluded with the 68060 implementation. Such decisions would consider how often an instruction has actually been used, how complicated it is (and thus how much it affects die area/cost, power, and max clock rate capability)

We might today want to reconsider certain decisions, such as added or removed instructions compared to earlier 68k family parts, and make some adjustments. Maybe add some instructions back in. Maybe redo the 68040 instruction set with more of a 68060 block diagram. Or an 020, whichever particular set of instructions is most beneficial. If AmigaOS or some target application software, or more recent compiler innovations tend to make frequent use of some instruction that was removed in the 68060, and thus spends a lot of time emulating that instruction on a real 68060 accelerator, then maybe we should put that instruction back in. Putting back certain instructions can make for a better performance of the software running an 020-optimized binary compared to an 060-optimized binary running on an exact-as-possible 060 implementation.

This is not really reverse-engineering. It would be a new engineering of the published 68k instruction set. Maybe a new engineering of a new combination of all 68k family instructions not before seen in any particular silicon from Motorola/Freescale... Or even a superset, adding in some entirely new instructions in addition to whatever we take from the Motorola books. (SIMD anyone??) Apply whatever microprocessor design concepts you like. Make an 020 instruction super-scalar. Or make an 060 instruction set not super-scalar. Whatever floats your boat. I'm not sure how fancy the tg68 is, or 68k00 or any other of the several 68k softcores out there. (ao_68000 etc)

This is an opportunity to do something even more modern in concept than any 68k ever was. Depending on the FPGA, that coud come out at a higher or at a lower clock rate than previous Motorola products. Depending on the price of the FPGA, some particular "possible" performance may or may not be worth achieving. (I'm not going to pay $10000 (ten grand) for a particular FPGA to achieve the highest possible speed, but I'll pay a few hundred maybe to get the best we can from that price range)

And now that we have SoC type FPGAs coming out, with hard-wired ASIC style ARM processors inside them, we have a new possibility. The FPGA part for IO/connetivity interoperation with whatever (060 PCB socket) and/or for Minimig circuitry, and then emulate the 68k in software on the ARM. Interpret it (Cyclone emulation) or jit (might need to be created). I'm not sure how either form of software emulation in the ARM, at the hardwired ARM clock rates, would compare to FPGA implemented softcore 68k processor.

As explained above you don't get 1:1 with any of the 68K family variants for most internal operations so why would you expect 1:1 with an FPGA?

There is quite bit more to emulating an 060 than implementing the 68K instruction set. What about the MMU, FPU, bus arbitration, cycle termination, interrupts, exception handling, RESET operation, 1/2 CLK bus speed operation, etc?

psxphill · « **Reply #25 on:** April 18, 2014, 03:53:13 PM »

Quote from: billt;762754

We might today want to reconsider certain decisions

I'm sure someday someone will figure out that every time that happens the project dies, then we can finally have a 68060 cpu/fpu/mmu in an fpga.

Once it's done and open source then everyone can discuss whether it's worth making any changes, without using the barrier to entry as a way of keeping control.

billt · « **Reply #26 on:** April 18, 2014, 05:08:55 PM »

Quote from: psxphill;762832

I'm sure someday someone will figure out that every time that happens the project dies, then we can finally have a 68060 cpu/fpu/mmu in an fpga.

Once it's done and open source then everyone can discuss whether it's worth making any changes, without using the barrier to entry as a way of keeping control.

Well, if certain decisions are made, then it's done, and then we reconsider some things, it could lead to a lot of wasted time. It may be better to do some types of considerations before spending so much time on design and RTL coding. Certain things are significant, and to change them would be to throw it all away and start over, so do those things the first time rather than the second time around for the same project. If we want to do a 68030+68882 instruction set, with 4 integer units and 3 FP units, using an 8 stage pipeline, with the 68060 "native bus", then some of that would be horrible to change or add in later. It would be much easier to do the big things from the start. Let's consider tg68 or 68x00 cores as the "first time through" and not waste anyone's time doing that equivalent again, just to trash it and start over for the "real" design with the more complicated things later.

Starting with a single ALU, and then later on adding two more of them, is not a trivial change.

Starting with integer, and adding FPU later, is not trivial. It's not as huge as going from one ALU to 2 ALUs. But it's significant.

So I only mean to say, if what you really want to do is X, then do not start working on Y, and then someday try to smash Y into something sortof-kindof like X. Start out working on X. Then, when done, you have X, not a hackish X-wannabe.

Quote from: SpeedGeek;762825

There is quite bit more to emulating an 060 than implementing the 68K instruction set. What about the MMU, FPU, bus arbitration, cycle termination, interrupts, exception handling, RESET operation, 1/2 CLK bus speed operation, etc?

Yes, you will also have to read the bus spec and implement that if it plugs into a legacy chip socket on some PCB. The ao_68000 softcore is an example of implementing the instruction set, but it does not have a 68k bus of any flavor, it has a Wishbone bus instead, and so would need additional work to fit it into an existing PCB socket for a Motorola chip. That's likely doable though. For the most part, I take it as implied that things will be made to fit onto whatever target socket/bus there is to do. Some of what you mentioned did need to be done in Majsta's Vampire, as the tg68 at the time he started Vampire was not a real 68000 interface either. Close in most ways, but a few places of artistic license diverged from Motorola's socket waveforms. Yet it got a wrapper and now goes onto a conventional 68000 motherboard...

Understand that while we need to work on the target socket or bus of interest, we can turn that into anything else inside the FPGA. The socket might run at 50MHz, but we can have as fast a CPU core inside that the design will run at in the chip. If an 060 took a 100MHz clock to run the CPU at 50MHz, we don't need to run the FPGA softcore at 50MHz as well, just because Motorola did that. We an choose to diverge from that decision that Motorola made.

Regardless of what someone intends to do, there will be no 100% clone of 68060 or anything else. There will only be a new design, which may or may not be compatible with software or socket.

Here's a divergent idea... Make an FPGA board that fits into a Pentium socket, but has a 680x0 core inside... Then put Minimig chipset on a PCI card. Hilarity ensues...

Or, put an FPGA on one of them Apple ZIF CPU boards, with a 680x0 softcore inside. Or an FPGA on a Com-Express module, for whatever motherboard you want...

I ordered a Zed board the other day, and am interested in playing with one of the 68k softcores in there and trying to fit the ARM-side peripherals onto it. Minimig plus those things. Which means AXI bus. I'm also very interested in buying an SOCkit or DE1-SoC board for much the same reason, with much the same requirements.

I guess that I'm not really even thinking of cloning an 060. I would really prefer what the title of this thread implies, something different than what Motorola made. Faster than 100MHz. More than that. Return some instructions that Motorola removed, but which we'd today rather have back in there. Reconsider how many of which units are present. (One needs to figure out how to make only a single ALU design work anyway, so why not start with figuring out how to make more units work at that time???) We already have a handful of the simpler designs. I'm interested in seeing something beyond those, and something beyond Motorola's products, as specced before doing any design work to change later.

apj · « **Reply #27 on:** April 18, 2014, 06:34:44 PM »

Quote from: psxphill;762666

First person shooters, PCTask/PCx?

There are probably some wild demos that could benefit as well, which usually only run full speed on WinUAE.

Cube 2 game I ported recently would be the most demanding game at the moment.

psxphill · « **Reply #28 on:** April 19, 2014, 12:02:54 AM »

Quote from: billt;762841

Let's consider tg68 or 68x00 cores as the "first time through" and not waste anyone's time doing that equivalent again, just to trash it and start over for the "real" design with the more complicated things later.

Lets not. If you can't guarantee that every piece of software that works on 68060 will work on this and vice versa then it's pointless. The actual implementation is irrelevant to that, so you can debate that until the heat death of the universe if you want, just as long as the ISA is exactly the same.

You seem intent on wasting everybody's time, so I'm out.

AmigaClassicRule · « **Reply #29 from previous page:** April 19, 2014, 12:58:36 AM »

Sorry I opened this discussion. If this was youtube and by the amount of hits I got from opening this topic I would have being rich by now :-P.

Author Topic: Can FPGA 060 run more than 100 Mhz? (Read 12451 times)

NovaCoder

Re: Can FPGA 060 run more than 100 Mhz?

matthey

Re: Can FPGA 060 run more than 100 Mhz?

psxphill

Re: Can FPGA 060 run more than 100 Mhz?

Lurch

Re: Can FPGA 060 run more than 100 Mhz?

NovaCoder

Re: Can FPGA 060 run more than 100 Mhz?

psxphill

Re: Can FPGA 060 run more than 100 Mhz?

billt

Re: Can FPGA 060 run more than 100 Mhz?

wawrzon

Re: Can FPGA 060 run more than 100 Mhz?

ElPolloDiabl

Re: Can FPGA 060 run more than 100 Mhz?

Iggy

Re: Can FPGA 060 run more than 100 Mhz?

SpeedGeek

Re: Can FPGA 060 run more than 100 Mhz?

psxphill

Re: Can FPGA 060 run more than 100 Mhz?

billt

Re: Can FPGA 060 run more than 100 Mhz?

apj

Re: Can FPGA 060 run more than 100 Mhz?

psxphill

Re: Can FPGA 060 run more than 100 Mhz?

AmigaClassicRule

Re: Can FPGA 060 run more than 100 Mhz?