Amiga.org
Amiga computer related discussion => General chat about Amiga topics => Topic started by: AmigaClassicRule on April 15, 2014, 03:45:53 AM
-
Hey,
I am just curious...can someone make an Amiga FPGA 060 that can run at 233 Mhz? Is it possible? How expensive would it be to make it?
-
I don't think it is about expense. The FPGA is a little bit different, so you can't just input the 060 into it. There is some designing involved.
It will get there though. 2 years maybe?
-
I am just curious...can someone make an Amiga FPGA 060 that can run at 233 Mhz? Is it possible? How expensive would it be to make it?
I think it's possible. There are some very expensive fpgas costing thousands of dollars that should be able to do it. An Altera Stratix (high end) probably could and maybe an Arria (mid range) which is approaching affordable (hundreds to thousands of dollars). A CPU in a (low end) Cyclone III-V costing ~$50-$500 would probably not exceed 150-200MHz but I think it may be able to come close to the performance of a 233MHz 68060. Most fpgas have plenty of memory bandwidth but are limited in clock speeds. The trick is using parallel operations to take advantage of the memory bandwidth. I believe it's possible for an fpga CPU to substantially exceed the performance per MHz of a 68060. The memory bandwidth can be better, more parallel pipes/operations may be possible, bigger caches are possible, a link stack helps and ISA improvements should be good. Some things are slower in an fpga so they would need to change from the way the 68060 does them in order to have good speed. All of this would require a CPU more advanced than the TG68 and would take some time and know how to complete.
-
I think it's possible. There are some very expensive fpgas costing thousands of dollars that should be able to do it....
Or to put it another way, has electronic design moved forward in the past 20 years ;)
Personally I'd be very happy too see an FPGA card that offered similar CPU performance as my 80Mhz Blizzard (100 MIPS) while remaining 110% compatible (add back the dropped instructions). It would of course be nice to see a card with even more performance but those extra MIPS wouldn't really get much use.
I think price is more important than performance anyway, it would be awesome to see a new 060 'like' FPGA card offered for less than $200.
The other thing to keep in mind is that because the rest of the Classic hardware is so slow (include the bus), there really isn't much point going above 130 MIPS (100Mhz Apollo).
Come to think of it, only DosBox AGA and NetSurf AGA need more than 130 MIPS!
;)
-
What about the new additions like USB? Or that would be done by the ARM processor?
-
I would be fine with a 68020 at 200-300 mips with fpu emulation too :-D
-
Hey,
I am just curious...can someone make an Amiga FPGA 060 that can run at 233 Mhz? Is it possible? How expensive would it be to make it?
Firstly, it's not about creating an FPGA 68060, it's about creating a compatible implementation of the 68060 ISA (IIRC the 68020 ISA is a better target in terms of instructions to implement) that can achieve an overall higher performance, be that via higher IPC (instructions per clock) or higher clocks, or both.
For example, the FPGA core in the Vampire 600 board has a lower IPC than a vanilla 68000, but it runs a lot faster, so the overall speed is improved.
Anyway, with today's FPGA technology, it should be getting to a point where once someone designs a 68020 compatible core that clocks high enough and has decent IPC, we could be looking at higher overall performance than a classic 68060. But it won't be cheap, but OTOH you won't need to worry about specific uncommon revisions of the 68060.
-
Firstly, it's not about creating an FPGA 68060, it's about creating a compatible implementation of the 68060 ISA (IIRC the 68020 ISA is a better target in terms of instructions to implement)
The advantage of going for the 68060 ISA is that Motorola removed some instructions that aren't used much, so you have work to do. You can make use of the effort people have made to make most/all software run on an 060
The 68060 MMU is simpler than the MMU used with the 68020 too.
-
i rather doubt the future of amiga is fpga. fpga is too slow (not even as fast as 040 on affordable chips), too hard to program (none of the few 68k softcores is even fully 020 compliant, not to speak of fpu and mmu) and to expensive overall.
if there is any future its with amithlon like systems imho. either x86 based optional with some fpga expansion for backwards compatibility or genuine amiga based with x86 accelerator running 68k emu. thats the only options i see, otherwise we need to stick to what we already have.
not telling fpga projects should be given up, but i wouldnt expect too much.
-
Personally I'd be very happy too see an FPGA card that offered similar CPU performance as my 80Mhz Blizzard (100 MIPS) while remaining 110% compatible (add back the dropped instructions). It would of course be nice to see a card with even more performance but those extra MIPS wouldn't really get much use.
Adding the extra instructions back isn't important if the emulation library is loaded before booting and it's upgradeable. The missing 64 bit MUL and DIV instructions should be added back as they are in modern hardware. It was a mistake to take them out as compilers like GCC were already making good use of 64 bit MUL to change immediate divisions into multiplies using invert and multiply with large savings. MOVEP was used by some old games (Sierra for example) but the encoding is bad (it was a kludge implemented for 8 bit support). Trapping it like the 68060 makes sense. Most of the other instructions the 68060 took out of hardware were rarely ever used on the Amiga and don't make sense to add them back.
Larger caches would normally be a problem but a modern fpga CPU would probably use writethrough caching for simplicity instead of copyback caching. It can do bus snooping for old programs that didn't properly flush the caches or used self modifying caches. This is potentially more compatible for caching than a 68040 or 68060.
I think price is more important than performance anyway, it would be awesome to see a new 060 'like' FPGA card offered for less than $200.
That price may be possible (in U.S. or Aussie dollars) but most people would want more than a bare bones fpga with faster CPU. At least 64MB of memory is appropriate. Some kind of I/O HD equivalent avoids a bottleneck and SD or CF is pretty cheap. USB and/or ethernet is nice also but it all adds to the price.
The other thing to keep in mind is that because the rest of the Classic hardware is so slow (include the bus), there really isn't much point going above 130 MIPS (100Mhz Apollo).
Come to think of it, only DosBox AGA and NetSurf AGA need more than 130 MIPS!
The ECS/AGA gfx become a bottleneck and then it makes sense to add gfx on the accelerator and/or move to a new stand alone board or motherboard replacement. A faster CPU and gfx will allow you to port more modern games and use DOSBox for older games instead of porting them ;).
-
It can do bus snooping for old programs that didn't properly flush the caches or used self modifying caches. This is potentially more compatible for caching than a 68040 or 68060.
It's likely that disabling the cache will be fine for those, the problem of old games has been solved already for the real 68060.
Any software that people write now needs to run on a 68060 as well, so really there is no benefit in deviating from the published specification. If people want to wave their dicks at Motorola at how great a cpu they have designed, then it would be really great if they could do that after creating a 100% compatible 68060.
-
250MHz FPGA chips have been available for quite a while now. That's the easy part... the hard part is reverse engineering the 060 and programming the FPGA to correctly emulate an 060 under a fairly large number of different operating conditions.
-
Personally I'd be very happy too see an FPGA card that offered similar CPU performance as my 80Mhz Blizzard (100 MIPS) while remaining 110% compatible (add back the dropped instructions). It would of course be nice to see a card with even more performance but those extra MIPS wouldn't really get much use.
I think price is more important than performance anyway, it would be awesome to see a new 060 'like' FPGA card offered for less than $200.
The other thing to keep in mind is that because the rest of the Classic hardware is so slow (include the bus), there really isn't much point going above 130 MIPS (100Mhz Apollo).
Come to think of it, only DosBox AGA and NetSurf AGA need more than 130 MIPS!
Whole heartedly agree Compatibility and Price are definitely weight in factors when it comes down too it
-
while remaining 110% compatible (add back the dropped instructions).
You can't be 110% compatible, if you deviate then you're less than 100% compatible.
Unless all exceptions are generated the same as the 68060 (unimplemented instruction etc) then it's not compatible.
-
The ECS/AGA gfx become a bottleneck and then it makes sense to add gfx on the accelerator and/or move to a new stand alone board or motherboard replacement. A faster CPU and gfx will allow you to port more modern games and use DOSBox for older games instead of porting them ;).
When you go down that road (GFX card attached to the accelerator) you might as well replace the entire motherboard (FPGA Arcade style). If you want to keep it authentic (e.g. use AGA) then you can do some very impressive things if you have enough MIPS, even the HD transfer rate will benefit if the card had the same kind of memory timing improvements used by the ACA cards.
Hopefully someone will do it one day, maybe the Vampire600 will come back to life and inspire someone to try the same thing for the A1200 and big box Amiga's :)
-
250MHz FPGA chips have been available for quite a while now. That's the easy part... the hard part is reverse engineering the 060 and programming the FPGA to correctly emulate an 060 under a fairly large number of different operating conditions.
The internal clock speed of the fpga is not the same as the speed an fpga CPU will run. It will be less even with a deep pipeline.
You can't be 110% compatible, if you deviate then you're less than 100% compatible.
Unless all exceptions are generated the same as the 68060 (unimplemented instruction etc) then it's not compatible.
Handling more in hardware without an exception shouldn't cause a problem. The way exceptions are handled on a 68k is similar although some 68k processors generate different exceptions and the stack frames used by different 68k processors are different. A new fpga CPU would probably be enough different that some changes and incompatibilities would be necessary in Supervisor mode like the 68060 was over the 68040. Is it better to create a 68060 "copy" that has a limited future or an fpga CPU that can be further developed, improved in performance and possibly adopted for embedded applications as well as retro use?
-
Is it better to create a 68060 "copy" that has a limited future or an fpga CPU that can be further developed, improved in performance and possibly adopted for embedded applications as well as retro use?
It is better to create a 68060 "copy" that has a guaranteed future for retro use because it is the only accurate one.
People using softcores for embedded use don't have any emotional attachment so will go with whatever is the best/cheapest/etc. Starting with a 35 year old ISA is not going to help. Any traction you get today could be lost tomorrow & you're left with something that fits no purpose.
So I will rewrite your loaded question:
Is it better to create a 68060 "copy" that will be used in Amiga/Mac/AtariST fgpa for many years to come or an fpga CPU that will get abandoned when it can't keep up with other designs that aren't limited to a 1978 ISA?
Or do both, but create the 68060 "copy" first because that is where it will get the most use.
-
Come to think of it, only DosBox AGA and NetSurf AGA need more than 130 MIPS!
;)
Netsurf isn't that bad at 100MIPS :-) What makes it appear slower than say ibrowse is the 68k version appears to try and load the entire page first before displaying it.
I use netsurf now and then with RTG quite useable. Of course faster is always better.
-
Netsurf isn't that bad at 100MIPS :-) What makes it appear slower than say ibrowse is the 68k version appears to try and load the entire page first before displaying it.
I use netsurf now and then with RTG quite useable. Of course faster is always better.
I was just struggling to think of something else for 68k that could make use of more that 130 MIPS ;)
OK how about 'The Curse of Monkey Island', that sucker could really use 200+ MIPS :)
-
I was just struggling to think of something else for 68k that could make use of more that 130 MIPS ;)
First person shooters, PCTask/PCx?
There are probably some wild demos that could benefit as well, which usually only run full speed on WinUAE.
-
250MHz FPGA chips have been available for quite a while now. That's the easy part...
Not that simple. When they say an "FPGA runs at 250MHz", what does that actually mean? What is the FPGA configured to do/be for that measurement? Different designs will have different pathways in the FPGA, and thus different results and different max running speeds. You're extremely unlikely to end up 1:1 with whatever that 250MHz claim, or any other clock rate claim is based on in an FPGA. Marketing people at chip compnies feel the need to say things, but IMHO, stating system clock rates for an FPGA chip doesn't make very much sense. If a chip vendor is saying a clock rate based on some reference design they've put into each FPGA, the exact same reference design, then such a clock rate might be a useful general comparison of speed from one FPGA product to another, so you might be able to say that Generation 2 of our product runs at approximately 2x clock rate as our Generation 1 product. But in terms of knowing what clock rate your design (such as tg68 or 68K00 or 68K30) will run at is pretty useless. You learn this when you define a clock period and run synthesis if that speed works or not. (Done by timing analysis of the synthesized and place/routed result in the FPGA tools) If you decide on a target clock speed for your final product based on whatever particular FPGA chip, then you essentially get a yes or no answer from the timing checks. If yes, maybe you can go even faster, and try that. If no, then either work on improving your design and creep closer toward your goal, or realize that you need to change your goal (and datasheet and marketing) or maybe change your FPGA to a faster one. Even if whatever marketing blurb said the chip runs at your target clock rate.
I used to be part of a team that designed FPGA silicon, so that other people could buy those FPGA chips and put their own stuff inside. The architect/team lead once said that the 0.35micron chips could do 350MHz, as long as you only wanted an inverter chain that never left the IO buffers to get into the core of the chip. That wouldn't be very useful. Any design that was inside the core would be slower than 350MHz due to the longer pathways.
A simple implementation of a particular instruction set processor will run at a slower clock rate than a pipelined implementation of the same instruction set thing. A longer pipeline design will run at a faster clock rate than a shorter pipeline. (The "simple design" is essentially a 1-stage pipeline)
But longer pipelines waste more time on a branch (stuff happening in the pipeline that gets thrown out, then refill the pipeline with the branch's stuff) than a shorter pipeline design. So there is a tradeoff between pipeline length, clock speed and branching.
the hard part is reverse engineering the 060 and programming the FPGA to correctly emulate an 060 under a fairly large number of different operating conditions.
You don't need so much to reverse-engineer the 060 chip. Anything you put into an FPGA will be somewhat different from the 060 silicon anyway. No FPGA place and route tool will give you exactly what's in the 060 silicon, even if it came from the same RTL, which isn't very likely to be the case anyway. You'd have to tool the RTL to the FPGA paradigm, and perhaps to some extent to your particular FPGA and tool (ISE or Quartus, etc.)
What such a design would really be, is an entirely new implementation of the instruction set. That instruction set is published and known publicly. It would be nice to do that as portable-friendly as possible, considering that there may be some differences in inferring memories (FIFOs, RAMs cor cache, etc) from one FPGA architecture to another (spartan3 vs spartan6 vs Cyclone5 etc) or tool to another (ISE vs Quartus), so such things should find their way into instantiable blocks to keep the rest of the RTL code portable. Such things were likely instantiable blocks at Motorola way back when, but as they were targeting a particular fab process, they may have coded to whatever their memory compiler tool (automated memory block generator to give you a chunk of silicon layout, and whatever connections that comes out of that). One would have to think of a good generic "API" of connections to fit a generic memory into, and then make such a defined wrapper around any particular FPGA inferred or instantiated memory block to keep the parent block nice and generic.
Anyway... Someone needs to define an instruction fetch and parse unit, an ALU block, etc. that is compatible with the 68060 instruction set. Or 68020 or 030 instruction set, whatever is really the best choice.
We tend to say 68060, as we perceive that to be the newest and fastest 68k chip. Motorola made certain decisions that all together concluded with the 68060 implementation. Such decisions would consider how often an instruction has actually been used, how complicated it is (and thus how much it affects die area/cost, power, and max clock rate capability)
We might today want to reconsider certain decisions, such as added or removed instructions compared to earlier 68k family parts, and make some adjustments. Maybe add some instructions back in. Maybe redo the 68040 instruction set with more of a 68060 block diagram. Or an 020, whichever particular set of instructions is most beneficial. If AmigaOS or some target application software, or more recent compiler innovations tend to make frequent use of some instruction that was removed in the 68060, and thus spends a lot of time emulating that instruction on a real 68060 accelerator, then maybe we should put that instruction back in. Putting back certain instructions can make for a better performance of the software running an 020-optimized binary compared to an 060-optimized binary running on an exact-as-possible 060 implementation.
This is not really reverse-engineering. It would be a new engineering of the published 68k instruction set. Maybe a new engineering of a new combination of all 68k family instructions not before seen in any particular silicon from Motorola/Freescale... Or even a superset, adding in some entirely new instructions in addition to whatever we take from the Motorola books. (SIMD anyone??) Apply whatever microprocessor design concepts you like. Make an 020 instruction super-scalar. Or make an 060 instruction set not super-scalar. Whatever floats your boat. I'm not sure how fancy the tg68 is, or 68k00 or any other of the several 68k softcores out there. (ao_68000 etc)
This is an opportunity to do something even more modern in concept than any 68k ever was. Depending on the FPGA, that coud come out at a higher or at a lower clock rate than previous Motorola products. Depending on the price of the FPGA, some particular "possible" performance may or may not be worth achieving. (I'm not going to pay $10000 (ten grand) for a particular FPGA to achieve the highest possible speed, but I'll pay a few hundred maybe to get the best we can from that price range)
And now that we have SoC type FPGAs coming out, with hard-wired ASIC style ARM processors inside them, we have a new possibility. The FPGA part for IO/connetivity interoperation with whatever (060 PCB socket) and/or for Minimig circuitry, and then emulate the 68k in software on the ARM. Interpret it (Cyclone emulation) or jit (might need to be created). I'm not sure how either form of software emulation in the ARM, at the hardwired ARM clock rates, would compare to FPGA implemented softcore 68k processor.
-
I was just struggling to think of something else for 68k that could make use of more that 130 MIPS ;)
cube? the executable ive here gave me about 2-3 fps on 060/50+voodoo3. alas i dont even remember where it comes from (perhaps from alain) because it has a bug in mouse handling it would be good to correct at least for those who want to run it under uae.
-
PowerPC first came out with short pipes. Then it got as large as an x86.
Intel has the bonus of very fast cache, which gives it a serious lead over any amd chip.
040: 8k cache 25mhz
060 16k cache 50mhz (100mhz internal)
The access to RAM probably held back the speed, back then?
-
Frankly, I'd take a 68030 with a full 68882.
Make that as fast as possible and who needs an '060 (unless you enjoy compatibility issues).
-
The internal clock speed of the fpga is not the same as the speed an fpga CPU will run. It will be less even with a deep pipeline.
The same thing can be said about the 68K family variants themselves. The effective CLK speed of internal pipelines and other logic functions is one very big variable. So chip manufacturers rate their chips at a given input CLK speed, otherwise possible effective CLK speed ratings would be practically useless. Do you want to buy a CPU or FPGA with a typical CLK speed rating of 50-200 MHz?
Not that simple. When they say an "FPGA runs at 250MHz", what does that actually mean? What is the FPGA configured to do/be for that measurement? Different designs will have different pathways in the FPGA, and thus different results and different max running speeds. You're extremely unlikely to end up 1:1 with whatever that 250MHz claim, or any other clock rate claim is based on in an FPGA. Marketing people at chip compnies feel the need to say things, but IMHO, stating system clock rates for an FPGA chip doesn't make very much sense. If a chip vendor is saying a clock rate based on some reference design they've put into each FPGA, the exact same reference design, then such a clock rate might be a useful general comparison of speed from one FPGA product to another, so you might be able to say that Generation 2 of our product runs at approximately 2x clock rate as our Generation 1 product. But in terms of knowing what clock rate your design (such as tg68 or 68K00 or 68K30) will run at is pretty useless. You learn this when you define a clock period and run synthesis if that speed works or not. (Done by timing analysis of the synthesized and place/routed result in the FPGA tools) If you decide on a target clock speed for your final product based on whatever particular FPGA chip, then you essentially get a yes or no answer from the timing checks. If yes, maybe you can go even faster, and try that. If no, then either work on improving your design and creep closer toward your goal, or realize that you need to change your goal (and datasheet and marketing) or maybe change your FPGA to a faster one. Even if whatever marketing blurb said the chip runs at your target clock rate.
I used to be part of a team that designed FPGA silicon, so that other people could buy those FPGA chips and put their own stuff inside. The architect/team lead once said that the 0.35micron chips could do 350MHz, as long as you only wanted an inverter chain that never left the IO buffers to get into the core of the chip. That wouldn't be very useful. Any design that was inside the core would be slower than 350MHz due to the longer pathways.
A simple implementation of a particular instruction set processor will run at a slower clock rate than a pipelined implementation of the same instruction set thing. A longer pipeline design will run at a faster clock rate than a shorter pipeline. (The "simple design" is essentially a 1-stage pipeline)
But longer pipelines waste more time on a branch (stuff happening in the pipeline that gets thrown out, then refill the pipeline with the branch's stuff) than a shorter pipeline design. So there is a tradeoff between pipeline length, clock speed and branching.
You don't need so much to reverse-engineer the 060 chip. Anything you put into an FPGA will be somewhat different from the 060 silicon anyway. No FPGA place and route tool will give you exactly what's in the 060 silicon, even if it came from the same RTL, which isn't very likely to be the case anyway. You'd have to tool the RTL to the FPGA paradigm, and perhaps to some extent to your particular FPGA and tool (ISE or Quartus, etc.)
What such a design would really be, is an entirely new implementation of the instruction set. That instruction set is published and known publicly. It would be nice to do that as portable-friendly as possible, considering that there may be some differences in inferring memories (FIFOs, RAMs cor cache, etc) from one FPGA architecture to another (spartan3 vs spartan6 vs Cyclone5 etc) or tool to another (ISE vs Quartus), so such things should find their way into instantiable blocks to keep the rest of the RTL code portable. Such things were likely instantiable blocks at Motorola way back when, but as they were targeting a particular fab process, they may have coded to whatever their memory compiler tool (automated memory block generator to give you a chunk of silicon layout, and whatever connections that comes out of that). One would have to think of a good generic "API" of connections to fit a generic memory into, and then make such a defined wrapper around any particular FPGA inferred or instantiated memory block to keep the parent block nice and generic.
Anyway... Someone needs to define an instruction fetch and parse unit, an ALU block, etc. that is compatible with the 68060 instruction set. Or 68020 or 030 instruction set, whatever is really the best choice.
We tend to say 68060, as we perceive that to be the newest and fastest 68k chip. Motorola made certain decisions that all together concluded with the 68060 implementation. Such decisions would consider how often an instruction has actually been used, how complicated it is (and thus how much it affects die area/cost, power, and max clock rate capability)
We might today want to reconsider certain decisions, such as added or removed instructions compared to earlier 68k family parts, and make some adjustments. Maybe add some instructions back in. Maybe redo the 68040 instruction set with more of a 68060 block diagram. Or an 020, whichever particular set of instructions is most beneficial. If AmigaOS or some target application software, or more recent compiler innovations tend to make frequent use of some instruction that was removed in the 68060, and thus spends a lot of time emulating that instruction on a real 68060 accelerator, then maybe we should put that instruction back in. Putting back certain instructions can make for a better performance of the software running an 020-optimized binary compared to an 060-optimized binary running on an exact-as-possible 060 implementation.
This is not really reverse-engineering. It would be a new engineering of the published 68k instruction set. Maybe a new engineering of a new combination of all 68k family instructions not before seen in any particular silicon from Motorola/Freescale... Or even a superset, adding in some entirely new instructions in addition to whatever we take from the Motorola books. (SIMD anyone??) Apply whatever microprocessor design concepts you like. Make an 020 instruction super-scalar. Or make an 060 instruction set not super-scalar. Whatever floats your boat. I'm not sure how fancy the tg68 is, or 68k00 or any other of the several 68k softcores out there. (ao_68000 etc)
This is an opportunity to do something even more modern in concept than any 68k ever was. Depending on the FPGA, that coud come out at a higher or at a lower clock rate than previous Motorola products. Depending on the price of the FPGA, some particular "possible" performance may or may not be worth achieving. (I'm not going to pay $10000 (ten grand) for a particular FPGA to achieve the highest possible speed, but I'll pay a few hundred maybe to get the best we can from that price range)
And now that we have SoC type FPGAs coming out, with hard-wired ASIC style ARM processors inside them, we have a new possibility. The FPGA part for IO/connetivity interoperation with whatever (060 PCB socket) and/or for Minimig circuitry, and then emulate the 68k in software on the ARM. Interpret it (Cyclone emulation) or jit (might need to be created). I'm not sure how either form of software emulation in the ARM, at the hardwired ARM clock rates, would compare to FPGA implemented softcore 68k processor.
As explained above you don't get 1:1 with any of the 68K family variants for most internal operations so why would you expect 1:1 with an FPGA?
There is quite bit more to emulating an 060 than implementing the 68K instruction set. What about the MMU, FPU, bus arbitration, cycle termination, interrupts, exception handling, RESET operation, 1/2 CLK bus speed operation, etc?
-
We might today want to reconsider certain decisions
I'm sure someday someone will figure out that every time that happens the project dies, then we can finally have a 68060 cpu/fpu/mmu in an fpga.
Once it's done and open source then everyone can discuss whether it's worth making any changes, without using the barrier to entry as a way of keeping control.
-
I'm sure someday someone will figure out that every time that happens the project dies, then we can finally have a 68060 cpu/fpu/mmu in an fpga.
Once it's done and open source then everyone can discuss whether it's worth making any changes, without using the barrier to entry as a way of keeping control.
Well, if certain decisions are made, then it's done, and then we reconsider some things, it could lead to a lot of wasted time. It may be better to do some types of considerations before spending so much time on design and RTL coding. Certain things are significant, and to change them would be to throw it all away and start over, so do those things the first time rather than the second time around for the same project. If we want to do a 68030+68882 instruction set, with 4 integer units and 3 FP units, using an 8 stage pipeline, with the 68060 "native bus", then some of that would be horrible to change or add in later. It would be much easier to do the big things from the start. Let's consider tg68 or 68x00 cores as the "first time through" and not waste anyone's time doing that equivalent again, just to trash it and start over for the "real" design with the more complicated things later.
Starting with a single ALU, and then later on adding two more of them, is not a trivial change.
Starting with integer, and adding FPU later, is not trivial. It's not as huge as going from one ALU to 2 ALUs. But it's significant.
So I only mean to say, if what you really want to do is X, then do not start working on Y, and then someday try to smash Y into something sortof-kindof like X. Start out working on X. Then, when done, you have X, not a hackish X-wannabe.
There is quite bit more to emulating an 060 than implementing the 68K instruction set. What about the MMU, FPU, bus arbitration, cycle termination, interrupts, exception handling, RESET operation, 1/2 CLK bus speed operation, etc?
Yes, you will also have to read the bus spec and implement that if it plugs into a legacy chip socket on some PCB. The ao_68000 softcore is an example of implementing the instruction set, but it does not have a 68k bus of any flavor, it has a Wishbone bus instead, and so would need additional work to fit it into an existing PCB socket for a Motorola chip. That's likely doable though. For the most part, I take it as implied that things will be made to fit onto whatever target socket/bus there is to do. Some of what you mentioned did need to be done in Majsta's Vampire, as the tg68 at the time he started Vampire was not a real 68000 interface either. Close in most ways, but a few places of artistic license diverged from Motorola's socket waveforms. Yet it got a wrapper and now goes onto a conventional 68000 motherboard...
Understand that while we need to work on the target socket or bus of interest, we can turn that into anything else inside the FPGA. The socket might run at 50MHz, but we can have as fast a CPU core inside that the design will run at in the chip. If an 060 took a 100MHz clock to run the CPU at 50MHz, we don't need to run the FPGA softcore at 50MHz as well, just because Motorola did that. We an choose to diverge from that decision that Motorola made.
Regardless of what someone intends to do, there will be no 100% clone of 68060 or anything else. There will only be a new design, which may or may not be compatible with software or socket.
Here's a divergent idea... Make an FPGA board that fits into a Pentium socket, but has a 680x0 core inside... Then put Minimig chipset on a PCI card. Hilarity ensues... :) Or, put an FPGA on one of them Apple ZIF CPU boards, with a 680x0 softcore inside. Or an FPGA on a Com-Express module, for whatever motherboard you want...
I ordered a Zed board the other day, and am interested in playing with one of the 68k softcores in there and trying to fit the ARM-side peripherals onto it. Minimig plus those things. Which means AXI bus. I'm also very interested in buying an SOCkit or DE1-SoC board for much the same reason, with much the same requirements.
I guess that I'm not really even thinking of cloning an 060. I would really prefer what the title of this thread implies, something different than what Motorola made. Faster than 100MHz. More than that. Return some instructions that Motorola removed, but which we'd today rather have back in there. Reconsider how many of which units are present. (One needs to figure out how to make only a single ALU design work anyway, so why not start with figuring out how to make more units work at that time???) We already have a handful of the simpler designs. I'm interested in seeing something beyond those, and something beyond Motorola's products, as specced before doing any design work to change later.
-
First person shooters, PCTask/PCx?
There are probably some wild demos that could benefit as well, which usually only run full speed on WinUAE.
Cube 2 game I ported recently would be the most demanding game at the moment.
-
Let's consider tg68 or 68x00 cores as the "first time through" and not waste anyone's time doing that equivalent again, just to trash it and start over for the "real" design with the more complicated things later.
Lets not. If you can't guarantee that every piece of software that works on 68060 will work on this and vice versa then it's pointless. The actual implementation is irrelevant to that, so you can debate that until the heat death of the universe if you want, just as long as the ISA is exactly the same.
You seem intent on wasting everybody's time, so I'm out.
-
Sorry I opened this discussion. If this was youtube and by the amount of hits I got from opening this topic I would have being rich by now :-P.
-
Lets not. If you can't guarantee that every piece of software that works on 68060 will work on this and vice versa then it's pointless. The actual implementation is irrelevant to that, so you can debate that until the heat death of the universe if you want, just as long as the ISA is exactly the same.
You seem intent on wasting everybody's time, so I'm out.
I think that's a bit rash, it's a very technical topic and deserves some technical discussion and debating ideas. Just because I think I'm right and you thing I'm wrong doesn't mean there aren't good things to be said in the middle somewhere for both of us.
If all you want is exactly 68060 instruction set, then either add missing instructions to tg68 or something, maybe wait for Suska's 68k30 and start from there, and make it fit an 060 socket. If that's good enough for you, then you and like-minded people can work on that as one project.
Anyone interested in a more advanced design can work on that as another project.
-
Back further 100% compatibility was covered. Some people would prefer a cut down model to improve speed. If you have a cut down model, just put down 95% compatible.
Being a retail product they will aim for 100% compatibility.
-
Why is 100% compatibility so important?
The A1200 could not run some A500 software, this was seen as a small price to pay for a better Amiga, and people expected the incompatible software would be eclipsed by new software that took advantage of the more powerful hardware.
Whilst there are fewer people to (re)write the software now, it is still beneficial to sacrifice some compatibility for a processor that is better in some way or simply available.
At this stage in the Amigas' life we should start learning to program it, as other retro computer users are learning to program their machines, it is the only way new software is going to be written for these machines.
-
Hello,
here are my thoughts about the subject.
First, I think we must have a partially micro-coded CPU instead of a fully hard-wired one.
I have done that with the J68 68000 core and I can achieve a higher clock rate (90 MHz on a Cyclone III, 300 MHz on a Stratix II).
The IPC is quite bad (90 MHz J68 is equivalent to a 30 MHz 68000) but this is mostly due to the instruction decoding not done in parallel and the lack of an address ALU for the EA computation.
Pipelining is good but create a lot of hazards in the pipeline.
Another approach is to create a barrel processor running at 200 MHz with 4 threads.
Then, we need a SMP Exec...
Regards,
Frederic
-
Hello,
here are my thoughts about the subject.
First, I think we must have a partially micro-coded CPU instead of a fully hard-wired one.
I have done that with the J68 68000 core and I can achieve a higher clock rate (90 MHz on a Cyclone III, 300 MHz on a Stratix II).
The IPC is quite bad (90 MHz J68 is equivalent to a 30 MHz 68000) but this is mostly due to the instruction decoding not done in parallel and the lack of an address ALU for the EA computation.
Pipelining is good but create a lot of hazards in the pipeline.
Another approach is to create a barrel processor running at 200 MHz with 4 threads.
Then, we need a SMP Exec...
Regards,
Frederic
I've always wondered what a more capable FPGA could do, but the Stratix II is a little too high end for my wallet. Plus its got a lot of capabilities that seem wasted on a 68K emulation project (like the DSPs).
If you REALLY wanted to get extreme, how about Stratix III L?
-
I've always wondered what a more capable FPGA could do, but the Stratix II is a little too high end for my wallet. Plus its got a lot of capabilities that seem wasted on a 68K emulation project (like the DSPs).
If you REALLY wanted to get extreme, how about Stratix III L?
You can get preatty cheap Stratix I/II NIOS evaluation boards out of ebay.
I have 3 of them : one 1S40, one 2S60ES and one 2S60 ROHS. The last one I got only cost me around 40 bucks (regular price 6 years ago was $1000).
With a friend of mine, we did the cloning of the Atari Jaguar using this board.
Today, the Cyclone V must be as powerful as the Stratis II (ALM architecture with 6-input LUTs).
Price/performance wise, the Lattice ECP3 is not bad either. Plus, it has a DSP block with a dynamic ALU mode that can be very useful in CPU design.
Regards,
Frederic
-
You can get preatty cheap Stratix I/II NIOS evaluation boards out of ebay.
I have 3 of them : one 1S40, one 2S60ES and one 2S60 ROHS. The last one I got only cost me around 40 bucks (regular price 6 years ago was $1000).
With a friend of mine, we did the cloning of the Atari Jaguar using this board.
Today, the Cyclone V must be as powerful as the Stratis II (ALM architecture with 6-input LUTs).
Price/performance wise, the Lattice ECP3 is not bad either. Plus, it has a DSP block with a dynamic ALU mode that can be very useful in CPU design.
Regards,
Frederic
Thanks, I have been focused on the Cyclone series as I was under the impression that it was the best value.
I'll have to check those out as a Stratix II evaluation board could be quite useful.
-
That IPC seems pretty normal for most CPUs. Only the 060 and early PowerPC line got 1 instruction per clock cycle.
How does cache affect performance?
-
Whilst there are fewer people to (re)write the software now, it is still beneficial to sacrifice some compatibility for a processor that is better in some way or simply available.
If you start with that mindset you actually just waste your time debating what compatibility you're willing to sacrifice. With the structure of 100% compatibility you have a non moving target that you will reach quicker and likely come up with more novel ways of solving.
You instantly sidestep second system effect http://en.wikipedia.org/wiki/Second-system_effect, because you're not adding any new features.
You also waste less time at the other end having to patch up existing software. Most software has already been patched up to work on the 68060.
It doesn't mean you can't have a novel method for decoding/despatching instructions etc. Just that whatever the end result is performs identically for all software that is out there (not just Amiga but Mac or Atari and anything that might be using illegal instruction traps for their own purposes).
Anything else is wasted effort. You're not going to be producing the next big CPU for embedded markets, that ship sailed years ago.
-
If you start with that mindset you actually just waste your time debating what compatibility you're willing to sacrifice. With the structure of 100% compatibility you have a non moving target that you will reach quicker and likely come up with more novel ways of solving.
You instantly sidestep second system effect http://en.wikipedia.org/wiki/Second-system_effect, because you're not adding any new features.
You also waste less time at the other end having to patch up existing software. Most software has already been patched up to work on the 68060.
It doesn't mean you can't have a novel method for decoding/despatching instructions etc. Just that whatever the end result is performs identically for all software that is out there (not just Amiga but Mac or Atari and anything that might be using illegal instruction traps for their own purposes).
Anything else is wasted effort. You're not going to be producing the next big CPU for embedded markets, that ship sailed years ago.
Compatible with what?
68000, 68020, 68030, 68030+68882, 68040, 68060 or PPC, if you want compatibility with macs they are now using intel processors.
Does faster execution speed break compatibility?
-
Compatible with what?
68000, 68020, 68030, 68030+68882, 68040, 68060 or PPC, if you want compatibility with macs they are now using intel processors.
Does faster execution speed break compatibility?
Cache breaks compatibility but if you go with unified cache with snooping, self modifying code is even possible (to a certain extent : you have to take into account the instruction prefetch and the pipeline depth).
If the Amiga chipset is implemented inside the FPGA, you can even snoop DMAs and keep cache coherency over Chip RAM.
Due to the way Exec detects CPU, you can have a core with 68000 exception frame and '020 user instructions (long branches, bitfields, 64-bit MUL/DIV and extra EAs).
Regards,
Frederic
-
@ Frederic,
I'm having trouble locating Stratix II based developments boards (and most commonly available Cyclone IV boards are crap).
Stratix II and Cyclone III chips are still quite available (at decent prices for the larger chips).
I like the performance benefits of what you have suggested, but it seems like Altera wants to push higher end apps into the Stratix III or higher (and I am not that impressed with the Cyclone V value vs performance ratio).
So, where did you find the dev. boards you have mentioned?
-
Compatible with what?
68000, 68020, 68030, 68030+68882, 68040, 68060 or PPC
It's a 060 thread and I have mentioned 68060 repeatedly.
"Most software has already been patched up to work on the 68060."
if you want compatibility with macs they are now using intel processors.
How would you like me to differentiate between the different mac models in the future, so that in an 060 thread when all the conversation is about 060 you won't get confused that we might be talking about a PPC/X86/X64.
Does faster execution speed break compatibility?
It shouldn't, but it would be something that would only turn up during testing.
Cache breaks compatibility but if you go with unified cache with snooping, self modifying code is even possible (to a certain extent : you have to take into account the instruction prefetch and the pipeline depth).
Adding snooping to detect self modifying code will make it incompatible. How are you going to run software that overwrites itself in ram but keeps executing from cache?
Due to the way Exec detects CPU, you can have a core with 68000 exception frame and '020 user instructions (long branches, bitfields, 64-bit MUL/DIV and extra EAs).
If you think that is acceptable to base a design on that knowledge then I hope you're never allowed to influence an 060 in FPGA design.
-
@ Frederic,
I'm having trouble locating Stratix II based developments boards (and most commonly available Cyclone IV boards are crap).
Stratix II and Cyclone III chips are still quite available (at decent prices for the larger chips).
I like the performance benefits of what you have suggested, but it seems like Altera wants to push higher end apps into the Stratix III or higher (and I am not that impressed with the Cyclone V value vs performance ratio).
So, where did you find the dev. boards you have mentioned?
On ebay, you have to check regularly.
Right now there is a Stratix III board for a quite decent price (given the performance of the chip):
http://www.ebay.fr/itm/Altera-Stratix-III-EP3SL150F1152-FPGA-Development-Board-/301153136861?pt=LH_DefaultDomain_0&hash=item461e2030dd.
But keep in mind that you need a full version of Quartus II to run synthesis for this chip.
It looks like you are right about the Cyclone V performance : even with the ALM, it is slower than the III.
This document proves it : http://www.altera.com/literature/ds/ds_nios2_perf.pdf
Regards,
Frederic
-
Cache breaks compatibility but if you go with unified cache with snooping, self modifying code is even possible (to a certain extent : you have to take into account the instruction prefetch and the pipeline depth).
Self-modifying code should work with snooping but would be slow. The caches only need to be invalidated if using writethrough caches (writethrough caches are probably not much slower than copyback caches with the fast memory and high memory bandwidth). It should be possible to have self-modifying code and cache compatibility better than the 68040 or 68060 with much larger cache sizes.
If the Amiga chipset is implemented inside the FPGA, you can even snoop DMAs and keep cache coherency over Chip RAM.
Yea, there are several issues with the Amiga chipset that should be solvable with the chipset and CPU in the same fpga. Maybe even multi-threading/SMD would work with a little trickery. It's easy to duplicate an fpga core.
Due to the way Exec detects CPU, you can have a core with 68000 exception frame and '020 user instructions (long branches, bitfields, 64-bit MUL/DIV and extra EAs).
This is basically the way the Phoenix core in the Vampire will work although some of the 68020 features don't fit in the Cyclone II of the Vampire. There is no 64 bit MUL/DIV (although 32 bit longword versions were added) and it's missing most if not all bitfield instructions and most if not all double indirect addressing modes. The specs may change.
By the way, the IPC of Phoenix will be limited by only 1 integer pipe but should be close to 1. Simulation has showed that much more is possible with more pipes (each pipe can be stronger than the 68060). I don't think there is any way that more than 3 integer pipes would be useful. The instruction fetch becomes large, there are too many memory accesses between 3 instructions and the CPU clock speed slows down some with each pipe. Even 3 pipes may not be an advantage although a few tricks may make this possible without OoO execution ;).
-
matt, its been few weeks now that you said apollo core was going to be tested on vampire. no news from that front?
-
matt, its been few weeks now that you said apollo core was going to be tested on vampire. no news from that front?
Gunnar said that the Phoenix core is complete (all units) as of April 16. It required significant downsizing from the Apollo which took time. Testing is taking place in simulation. There was a bring-up party (in the Vampire) planned and then delayed due to scheduling problems. I don't know if they are waiting to get together or what. I don't think the Phoenix core is working in the Vampire yet but I believe they are close. Some bugs may need to be fixed before AmigaOS works. All I can say is be patient. These guys are working in their free time. That's part of the problem with Amiga projects.
-
It should be possible to have self-modifying code and cache compatibility better than the 68040 or 68060 with much larger cache sizes.
It can't be better compatibility if it actually ends up running the code that was written from ram that the 68060 wouldn't, if it's different then it will always be less compatible.
For example on the playstation there is code that wipes itself out of ram and keeps running from the cache. Snooping would make that fail.
-
i thought igor stepped down for the time being, so this is the project on part of gunnar (and whom else? submicron jens?) i take it. no wonder they still do simulation, but do they have vampire hardware at disposal? who will provide hardware instead of majsta of all goeas well? kipper or some other subcontractor?
well im not impatient, i dont have a 600 anymore so practically its all the same to me. but after some years of bold claims what concerns natami core there should be something to show off in the end;)
-
Oh dear God, please tell me this isn't Gunnar von Boehm that you are referring to.
-
@iggy
i am. but lets keep calm on that.
-
@iggy
i am. but lets keep calm on that.
OK, I can give Gunnar the benefit of the doubt, but the word "vitriolic" comes to mind when I think about him.
-
@psxphil, the thread may have 060 in its title, but none of the softcores are intended to be a copy of the 060, even the 050 of the natami was going to have additional instructions and registers.
The 060 instruction set is incompatible with almost all processors currently used in amigas, so any software written for the 060 may not run on most amigas, that is not being compatible.
Are you saying you want to freeze the evolution of the processor at the 060 level, with that kind of attitude we should still all be using 68000s.
The speed at which an FPGA softcore can run is limited, so to get greater performance, the softcore must use parallelism and simd instructions, this is incompatible with any amiga processor but it is worth the sacrifice if we want faster amigas, I know it is never going to be the fastest but we should try to get better speed or the platform will stagnate and die.
-
@psxphil, the thread may have 060 in its title, but none of the softcores are intended to be a copy of the 060, even the 050 of the natami was going to have additional instructions and registers.
The 060 instruction set is incompatible with almost all processors currently used in amigas, so any software written for the 060 may not run on most amigas, that is not being compatible.
Are you saying you want to freeze the evolution of the processor at the 060 level, with that kind of attitude we should still all be using 68000s.
The speed at which an FPGA softcore can run is limited, so to get greater performance, the softcore must use parallelism and simd instructions, this is incompatible with any amiga processor but it is worth the sacrifice if we want faster amigas, I know it is never going to be the fastest but we should try to get better speed or the platform will stagnate and die.
Well said sir!
Even Motorola didn't always push for the '060.
The '040 was still being promoted at the end of the 68K cycle.
I have PCI bridge devices that were designed primarily for that chip.
And the highest level of compatibility would be, as I mentioned before, an enhanced '030.
And why stop development?
Frederic's comment about conversion to a barrel processor is actually pretty good.
FPGAs have plenty of room for the extra registers needed for that.
And the software problems that result from other SMP solutions are lessened by this approach.
-
It can't be better compatibility if it actually ends up running the code that was written from ram that the 68060 wouldn't, if it's different then it will always be less compatible.
For example on the playstation there is code that wipes itself out of ram and keeps running from the cache. Snooping would make that fail.
Snooping with writethrough caches would be more Amiga compatible. Many early Amiga programs didn't flush the caches because the caches were small enough that there was no problem (not so with larger caches) or didn't flush the cash at all. This should always work properly. It's also more 68060 compatible with larger caches because the caches don't have to be manually or properly flushed and larger caches are possible with no problems that I am aware of. Your Playstation example should not happen with snooping and writethrough caches. Writing over existing code will replace it with the new code. No flushing is needing except for code that is already in the CPU pipeline. No dirty caches or inconsistency between caches and memory is otherwise possible.
i thought igor stepped down for the time being, so this is the project on part of gunnar (and whom else? submicron jens?) i take it. no wonder they still do simulation, but do they have vampire hardware at disposal? who will provide hardware instead of majsta of all goes well? kipper or some other subcontractor?
well im not impatient, i dont have a 600 anymore so practically its all the same to me. but after some years of bold claims what concerns natami core there should be something to show off in the end;)
Majsta is back to working on the Amiga with the help of the Apollo Team :). He is planning to create a new Amiga accelerator with a much larger Altera fpga for the full Apollo. I'd rather not give any specifics yet as they may change.
The core fpga programmers are Gunnar von Boehn, Jens Künzer and Christophe Hoehne who have all been active recently. Igor Majstorovic (majsta) has been fairly active recently also. I am the most active non-fpga/non-VHDL programmer and have helped with 68k statistics and ISA/encoding development.
-
@matt
sounds good. thats probably why vampires are on hold, hope they will be able to provide the core for those too, but the decissions must be taken for most efficiency of course.
-
Majsta is back to working on the Amiga with the help of the Apollo Team :). He is planning to create a new Amiga accelerator with a much larger Altera fpga for the full Apollo. I'd rather not give any specifics yet as they may change.
That is really good news. I've been playing around with an Altera development board and I really like their products.
If you hunt for them, there are some spectacular buys available on larger Cyclone III chips.
FPGAs are also the only way I can speed up some 8 bits projects. There are faster 6502 and Z-80 compatible chips, but the fastest 6809 compatible I can find are the Hitachi 6309 3 MHz chips that I can overclock slightly.
With an FPGA 25-40MHz is easily attainable (high if I opt for something in the Stratix line).
And many FPGAs have built-in DSPs. Anyone following A-eon's new sound card know how useful that can be.
It makes me wonder if we could build a video decoder similar to the old Creative Labs DXR3 but specific to Amiga (Zorro II/III or video slot based).
We already have genlocks, but it would be nice to be able to play back files and DVDs directly.
-
The 68060 can't clock higher than 100mhz.
What was required to make it clock higher? Was it something minor, like a longer pipeline. Just a heat issue? Or was a major redesign required?
-
Perhaps if the process to produce the die was smaller/finer and the chip operated on less current higher operating speeds would be possible.
And 100 MHz is an overclocked speed.
Freescale flatly states there were never '60s rated higher than 75 MHz.
Further, they also insist that that "FE133" chip the Chinese are selling does not have a valid Motorola/Freescale ID number.
-
The FE133 have no MMU and no FPU : anyway, I'm looking for one unit...
If anybody have a 68060FE133 for sale, please email me !
:)
-
Their are rumours 68060 has overclocked around 108-120Mhz,but I don't think that top end speed is possible on amiga,will have to look at Atari which use Sdram,but in saying that I posted a link on another website where someone got EDO memory operating on 140Mhz bus.
There is also a small possibility 25ns EDO memory maybe on its way to amiga from the current 28ns.
-
I read somewhere it couldn't scale more than 100mhz. Can't find the article however.
I came across the coldfire upgrade again. It's obvious with so many vital instructions missing it would never work.
Aim for 300mhz. Perfect for retro. There isn't any modern software available, except FPS games and browsing.
-
68060 has already clocked 106-108Mhz by a small number of amiga users. (Using amiga computers)
-
There is one on youtube. A4000 with 060 @ 90mhz. That is fast!
Lots of cooling on his rig.
-
The FE133 have no MMU and no FPU : anyway, I'm looking for one unit...
If anybody have a 68060FE133 for sale, please email me !
:)
Hey everybody, this guy would like a relabeled EC 75MHz processor, and he is saying please.
Remember, I have an email from the manufacturer stating there is no such thing as an FE133.
-
Remember, I have an email from the manufacturer stating there is no such thing as an FE133.
I'm sure the person who emailed couldn't find any evidence of the FE133.
That is rather irrelevant though.
-
The Natami Team have some FE133 : http://amxproject.com/wp-content/uploads/2011/12/NAe60F_1-300x216.jpg
A fake maybe ?
Don't known, but we can check with reading his internal PCR register...
:)
-
FE133 may exist but it may not be made by freescale/motorola. How compatible it is I don't know.
I have here state-of-art PDF documentation on 680xx & PPC processors but their are not made by Freescale/motorola.
When I say state-of-the-art its something like this if I remember correctly,but not sure how compatible it is.
450Mhz PPC with built-in 2Mb L2 cache (possible drop-in replacement for 604E) (same ball count) signals all in the same place,but any unused pins on the 604E are now used on the new Processor.
Can't remember what has change in the 68k series will have to take another look at PDF,but all Processor are classed as "Enhanced" and I do believe all have same pin count,but as stated above some of the unused pins on old processor are now used.
-
I'm sure the person who emailed couldn't find any evidence of the FE133.
That is rather irrelevant though.
And you are delusional, its simple, the only sources are Chinese and the Chinese are known to relabel ICs with fake markings.
They are fakes. Relabeled 75 MHz EC chips.
And as 75 MHz EC chips will overclock to the rates mention (but no where near 133) it ought to be obvious to anyone who isn't trying to kid themselves.