Amiga.org

Amiga computer related discussion => Amiga Hardware Issues and discussion => Topic started by: freqmax on January 03, 2013, 07:40:03 PM

Title: Die space for m68k on FPGA?
Post by: freqmax on January 03, 2013, 07:40:03 PM

I know it has been written in some other thread but how much of the FPGA Spartan 1600 used in the FPGA Replay is used by the plain 68000 CPU ?

Need the number to figure out if another CPU implementation is realistically possible.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 03, 2013, 07:59:31 PM

I am interested in how much space 68020+ takes.

How much space 68050/68070 was taking was written many times on Natami forums. It was always a moving target as things got changed. I can't remember what the numbers were.

There are 3 main numbers:
LE: Logic Elements
SRAM: How many SRAM banks it uses.
Multipliers: Many FGPA chips have built-in multipliers you can use so u don't have to waste LE on them.

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 03, 2013, 08:01:44 PM

68020/0507070 is way more complex and therefore uses way more transistors.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 03, 2013, 08:31:59 PM

020 is just an 000 with 32-bit datapaths, a small L1 cache and a barrel shifter and a few new instructions and a few new addressing modes. It is more complex, sure. But I am still interested to know how many more LE it takes over a plain jane 000

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 04, 2013, 12:37:06 AM

Quote from: ChaosLord;721161

020 is just an 000 with 32-bit datapaths, a small L1 cache and a barrel shifter and a few new instructions and a few new addressing modes. It is more complex, sure. But I am still interested to know how many more LE it takes over a plain jane 000

The 020 onwards can read/write unaligned words and dwords. Again not earth shattering, but it's more complex. An EC030 is also not much more complex.

A 68851 & 68882 would be nice to have in addition, but those are more complex. The 68060 MMU&FPU were simpler, so that would be easier. You could also not bother with all the 68020 stuff.

Title: Re: Die space for m68k on FPGA?
Post by: billt on January 04, 2013, 02:01:58 AM

Quote from: freqmax;721152

I know it has been written in some other thread but how much of the FPGA Spartan 1600 used in the FPGA Replay is used by the plain 68000 CPU ?

Need the number to figure out if another CPU implementation is realistically possible.

Sure it's realistically possible. You'll need to figure out which family of FPGA you are interested, particularly which vendor, and download the tools for that vendor. (Xilinx ISE Webpack or Altera Quartus 2, etc)

Now get the HDL for your baseline, which might be the TG68 VHDL code.

Set up enough of a project to be able to synthesize. As you aren't actually making a product, and only want FPGA utilization report, you may not need to do everything that a full product project may need to do. I'm not sure if you'd need to assign pins in the constraints file or anything or not.

Run synthesis until you've weeded out the errors and important looking warnings, and look at the report. I've not looked at Altera tools yet. Xilinx gives you utilization % of LUTs, flops, clocks, etc. You may have to assign TG68 clocks to FPGA clocks, not sure.

If you can't do that, then someone else should be researching if it's practical to do more than plain 68000, such as Yaqube and Mikej, which I think they are already doing. Or the Natami guys, who may or may not still be doing exactly this project. Or Suska guy. I believe it's very doable, and just a couple days ago started thinking about an FPGA on a PGA carrier to replace 68060 which are so hard to find the good ones now. (or any other 680x0, but earlier ones have 5V issues to fit in too)

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 04, 2013, 06:19:07 AM

I' curious if an 80386 + VGA can be implemented on the existing FPGA Replay.

Other CPU:s like MIPS, ARM, SPARC etc.. could also be of interest. Especially to explore software that are designed for platforms that now are quite hard to find.

Quote from: billt

an FPGA on a PGA carrier to replace 68060 which are so hard to find the good ones now.[/url]

I like the idea. Power through the socket might be an isssue thoe.

Title: Re: Die space for m68k on FPGA?
Post by: Hattig on January 04, 2013, 10:04:37 AM

Quote from: freqmax;721192

I' curious if an 80386 + VGA can be implemented on the existing FPGA Replay.

The FPGAArcade implementation will have RTG (i.e., VGA) as well as AGA. The level of 2D acceleration (i.e., hardware acceleration of graphics.library) in the RTG card might not be high, depending on the space available for implementation, and yaqube's time.

It will also have a CPU core that is roughly 68020/030 compatible (I believe a data cache is the main differentiator between these chips) and running at a higher speed. My fuzzy memory recalls something about 10 MIPS, maybe that was something else.

80386 had under 300,000 transistors, and I believe that the 68030 is comparable. Some of that will be cache or registers, hence using the FPGA SRAM rather than logic units.

Title: Re: Die space for m68k on FPGA?
Post by: yakumo9275 on January 04, 2013, 11:33:04 AM

Quote from: freqmax;721192

I' curious if an 80386 + VGA can be implemented on the existing FPGA Replay.

Other CPU:s like MIPS, ARM, SPARC etc.. could also be of interest. Especially to explore software that are designed for platforms that now are quite hard to find.

go browse http://opencores.org/

there is HEAPS of open source cpu cores and chip emulations in verilog/vhdl.

Go browse projects, processors. There is arm cores, sparc cores, z80's, 8086's, 68k etc. LOTS of cool stuff there.

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 04, 2013, 12:02:23 PM

I know, but how much FPGA resources they use is another..

Title: Re: Die space for m68k on FPGA?
Post by: billt on January 04, 2013, 03:34:34 PM

Quote from: freqmax;721192

I' curious if an 80386 + VGA can be implemented on the existing FPGA Replay.

While I don't know the capacity of the FPGA on the Replay, I'd guess yes. Someone's working on this with a DE1 port, and DE1 does not have a humungous FPGA. Though they do seem to recommend DE2 for more room down the line...

http://zet.aluzina.org

The google has coughed up a couple other things as well, but this Zet one sounds the best.

Title: Re: Die space for m68k on FPGA?
Post by: mikej on January 04, 2013, 08:38:51 PM

Shouldn't be a problem, I had a quick look at their code.
It will be an interesting port, I'll throw the core in and see what the utilization is.
/Mike

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 05, 2013, 05:59:33 AM

Quote from: freqmax;721192

I' curious if an 80386 + VGA can be implemented on the existing FPGA Replay.

DosBox with 68k Dynamic Recompilation should be able to achieve 386 emulation speeds on a fast 68060 or fpga 68k CPU. The bonus is that the Amiga can multitask at the same time kind of like the advantage of ShapeShifter over a real 68k Macintosh. An enhanced fpga 68k CPU could support faster emulation of x86 by providing some useful instructions and addressing modes that the x86 has but the 68k does not. When creating the basic 68k dynamic recompilation, I could see that MVS/MVZ (x86 MOVSX/MOVZX), LEA to a data register, immediate shifts >8, base register update addressing mode, small longword->word compressed immediates, a PERM instruction and fast bitfield instructions would greatly speedup and simplify x86 emulation. Many of these ideas were previously suggested in the 68koolFusion ISA coincidentally. A true 386 fpga core should still be a little faster but then a real 386 DOS machine can be obtained for free.

Quote from: billt

an FPGA on a PGA carrier to replace 68060 which are so hard to find the good ones now.

Quote from: freqmax;721192

I like the idea. Power through the socket might be an issue thoe.

Yea, Interesting idea. An fpga is low power and voltage like the 68060. An fpga core would have to be very similar to a 68060 though.

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 05, 2013, 08:21:00 AM

Quote from: matthey;721296

DosBox with 68k Dynamic Recompilation should be able to achieve 386 emulation speeds on a fast 68060 or fpga 68k CPU. The bonus is that the Amiga can multitask at the same time

Problem is that some of the interesting applications and reason to run 80386 software on real hardware is very low latency. This is very true for parallelport bitbanging DOS software. So it's the same issue as with software emulated Amigas. They can't deal with latency and propagation races properly.

So a very common setup like 80386+VGA+Soundblaster with many lovely parallelports would be just the thing to make use of existing DOS bitbanging software.

@MikeJ, If you succeed to run TurboC in 80286 mode using outportb(0x378,0xF0) etc and some VGA demos with sound I thing the re-implementation can be called a success. ;)

A more specific setup would be:
CPU: 80386 - least complex x86-processor with the greatest software compatibility
FPU: 80387 - IF space is available
Video: VGA -perhaps CGA/EGA as options, again for compatibility
Sound: Soundblaster 16
HDD: P-ATA < 8GB
Floppy: 2x 1,44MB
Serial: 16550 that can dump contents to flash or use Ethernet
Parallel: 8255 with real world I/O
Ethernet: NE2000

Of course OSD options to enable/disable as desired could be used. And loadable setups associated with a bootimage etc.

The main point is 80386+VGA and access to I/O via (many) parallelports such that bitbanging DOS stuff can be made to work. So rather more real world I/O than anything else. By using 80386 instead of 8086/286 one get the benefit of being able to run Unix or Windows. Perhaps one could fix the ten or so 80386 errata (bugs) at the same time.

Title: Re: Die space for m68k on FPGA?
Post by: Fats on January 05, 2013, 02:50:04 PM

Quote from: matthey;721296

DosBox with 68k Dynamic Recompilation should be able to achieve 386 emulation speeds on a fast 68060 or fpga 68k CPU. The bonus is that the Amiga can multitask at the same time kind of like the advantage of ShapeShifter over a real 68k Macintosh.

Why not go for both a m68k and x86 CPU core at the same time on the FPGA :) ?

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 05, 2013, 03:44:16 PM

Quote from: Fats;721327

Why not go for both a m68k and x86 CPU core at the same time on the FPGA :) ?

A 2nd (and larger in the case of x86) decoder would be needed but it is possible for a fpga CPU core internally to be flexible enough to handle most modern CPU instructions, addressing modes and functionality. The same task/process could only execute the code for 1 CPU at once because of encoding conflicts. Multiprocessing should be possible with care though. It would be like emulating an Amiga with a bridgeboard all in one fpga. Personally, I think a 68k and x86 together would be a waste. Enhance the 68k as I suggested and the 68k would be able to do practically all the common functionality of the 386 but to general purpose registers making emulation easier while being much easier to program. More interesting CPU combinations would be 68k+Z80 for emulating several game consoles and 68k+PPC for emulating PPC Amigas although that would require a big fpga and an efficient PPC core. Also, an enhanced 68k+68000 might be good for max compatibility without rebooting/configuring the fpga.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 05, 2013, 03:45:43 PM

Quote from: freqmax;721299

This is very true for parallelport bitbanging DOS software. So it's the same issue as with software emulated Amigas. They can't deal with latency and propagation races properly.

It's ok as long as you have the hardware. The dos support in 32bit windows or if you're running 64 bit then virtualpc or dosbox are pretty good. What is lost in speed probably helps as the software was designed to run on something 20 times slower.

As soon as you have to use a usb serial port/parallel port (not that I have come across a usb parallel port that copes with anything other than printing) then the latency of usb really kills performance.

I have only used laptops for the last 12 years, but I know people who still use desktops with parallel ports that are able to run really old software. My old laptop had a parallel port and only runs 32 bit windows anyway, so that sometimes gets used. But that's for practical reasons and slowly I've been moving all those over to intelligent usb devices. By moving the software onto a cpu on the usb device you can offload the time critical code but still plug it into pretty much any modern computer.

There is probably some people that would get a use for it, not as many as want to run amiga software. The PC doesn't get people as passionate.

Title: Re: Die space for m68k on FPGA?
Post by: xyzzy on January 05, 2013, 04:08:35 PM

Quote from: Fats;721327

Why not go for both a m68k and x86 CPU core at the same time on the FPGA :) ?

Better would be to add specific instructions to the 68k that help with emulation of other processors.

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 05, 2013, 06:12:51 PM

The FPGA used in Replay is not likely to have the die space to handle 386 + m68k at the same time. And emulating 386 on a 68060 on a FPGA is an so inefficient solution I recommend: Think again.
As for PPC it has been discussed before. The size is just too big to be practical. It's way better to use the ASIC PPC until moores law makes it feasable.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 05, 2013, 06:17:33 PM

Quote from: freqmax;721343

As for PPC it has been discussed before. The size is just too big to be practical. It's way better to use the ASIC PPC until moores law makes it feasable.

What makes it too big? The ISA itself shouldn't be. It would be tricky to match performance of a real chip & you'd probably have to leave a lot of complex features out. Most people would only want it for running warpup & powerup based software, it doesn't necessarily need to use the official kernels to do it either.

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 05, 2013, 07:25:41 PM

Quote from: psxphill;721344

What makes it too big? The ISA itself shouldn't be.

The PowerISA is large for a RISC ISA (although much is rarely used and can be trapped but results in poor performance if used) and the caches need to be significantly larger than the 68k for good performance. The code density of the PowerPC is about 1/2 of an enhanced 68k CPU needing 2x the instruction cache to hold the same amount of code, for example. ARM with Thumb 2 gave up a simple and clean RISC ISA and decoder to return to CISC "compressed" code much like the 68k (although a little simpler for decoding but not as compressed or programmer friendly).

Quote from: psxphill;721344

It would be tricky to match performance of a real chip & you'd probably have to leave a lot of complex features out.

It would be nearly impossible to match the performance of the original Amiga PPC cards or the SAM 440 with a PPC core in an affordable fpga. I think an enhanced 68k CPU core could come close though, if assembler optimized code is used. This is feasible on the 68k because it's almost as easy to program as a high level language where programmers enjoy using it but is almost impossible on RISC. The 68k advantages are discounted way too much while it's disadvantages are exaggerated. The 68060 performance proves it and outperformed many early PPC processors. That's why Apple put code in MacOS 8.x that kept the 68060 from working while MacOS 7.x using a 68060 worked great and blew away the PPC macs of the time.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 05, 2013, 08:36:43 PM

Quote from: matthey;721353

It would be nearly impossible to match the performance of the original Amiga PPC cards or the SAM 440 with a PPC core in an affordable fpga.

The slowest phase 5 board was a PowerPC 603e 160, which had 16kb L1 cache & used the 32bit ISA. I'm sure you could get close to that, a sam 440 maybe not. But then a sam 440 doesn't have aga, so nothing is perfect.

It doesn't look to have a particularly more complex instruction set than the 68020 (once you factor in mmu & fpu). All instructions are 32bit, which affects density. But it also simplifies fetching, reducing density is more about ram usage than speed. It's better for performance if all instructions are the same length.

https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2/$file/6xx_pem.pdf

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 05, 2013, 09:51:40 PM

Quote from: psxphill;721361

The slowest phase 5 board was a PowerPC 603e 160, which had 16kb L1 cache & used the 32bit ISA. I'm sure you could get close to that, a sam 440 maybe not. But then a sam 440 doesn't have aga, so nothing is perfect.

That's sounds about right. A 100MHz SuperScaler enhanced 68k CPU in an fpga would probably be like a 200-300MHz low end PPC in performance on average. Some things like a simple memory copy and optimized 68k code would be approaching a low clocked 440. The PPC has some areas where it's strong though too.

Quote from: psxphill;721361

It doesn't look to have a particularly more complex instruction set than the 68020 (once you factor in mmu & fpu). All instructions are 32bit, which affects density. But it also simplifies fetching, reducing density is more about ram usage than speed. It's better for performance if all instructions are the same length.

The PPC instructions set is large but fairly easy to decode. Decoding is one of the slow points of the 68020+ but it's way better than the x86. Constant instruction length is good for performance but so are small instructions that are simple to decode or allow parallel decoding. This allows more instructions in the instruction cache which is much faster and allows more instructions to be fetched in the same amount of time. A bonus is less system memory needed and more flexibility for instructions which results in an easier to program CPU.

Title: Re: Die space for m68k on FPGA?
Post by: wawrzon on January 05, 2013, 10:06:59 PM

what does it matter to make assumptions about all that. you could be right you could be wrong. all that matters is what there is.

Title: Re: Die space for m68k on FPGA?
Post by: Hattig on January 05, 2013, 10:52:56 PM

I believe that you can buy FPGAs that have an on-board PowerPC core - that would seem to me to be the best solution to getting a PowerPC processor alongside the system implemented in the FPGA.

Title: Re: Die space for m68k on FPGA?
Post by: mongo on January 05, 2013, 11:14:48 PM

Quote from: Hattig;721371

I believe that you can buy FPGAs that have an on-board PowerPC core - that would seem to me to be the best solution to getting a PowerPC processor alongside the system implemented in the FPGA.

You can, but they're not cheap.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 05, 2013, 11:56:13 PM

Quote from: matthey;721368

Constant instruction length is good for performance but so are small instructions that are simple to decode or allow parallel decoding.

Fixed length 32 bit instructions (like PPC & MIPS) are the sweet spot for performance.

Thumb is mainly for when you only have 16 bit access to ram, if your ram is 32bit then Thumb is slower (although it will use less ram).

Variable length instructions are a pain to parallel decode, because you have to decode the first one to know where the second one is.

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 06, 2013, 01:01:42 AM

Quote from: psxphill;721373

Fixed length 32 bit instructions (like PPC & MIPS) are the sweet spot for performance.

They're great when there is unlimited resources. The current trend is away from 32 bit fixed length instructions for a CPU though. There is a reason for this. Many knowledgeable engineers thought PPC, MIPS and the original ARM would destroy the x86 and 68k in performance but they don't. I have tried to explain why. Thumb 2 is an attempt to take advantage of smaller instructions and improve code density.

Quote from: psxphill;721373

Thumb is mainly for when you only have 16 bit access to ram, if your ram is 32bit then Thumb is slower (although it will use less ram).

Most modern ARM code is Thumb 2. Yes, it is a little slower in theory but works well with limited resources.

Quote from: psxphill;721373

Variable length instructions are a pain to parallel decode, because you have to decode the first one to know where the second one is.

Parallel decoding isn't even always possible with variable length instructions. In the best case, the decoder would look at one number in the code and know the instruction length. The length of any instruction on the 68k can be determined by looking at the first 32 bits. That's pretty good. The variable length instructions are not that much of a problem in reality while they save cache and improve ease of programming.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 06, 2013, 01:57:21 AM

Quote from: matthey;721378

The length of any instruction on the 68k can be determined by looking at the first 32 bits. That's pretty good. The variable length instructions are not that much of a problem in reality while they save cache and improve ease of programming.

The whole point of parallel decodes is that you can decode the first and second at the exact same time. If you have to look at the first to see what the length is, then you've failed. You need fixed length, then you can split the instruction cache so that odd/even instructions can be accessed simultaneously.

Thumb2 sounds slower:

"The best options for armv7-a, thumb-2 and thumb-1 and overall:

The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best"

With PPC we can run powerup/warpup software, implementing arm is boring.

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 06, 2013, 05:31:22 AM

Quote from: psxphill;721380

The whole point of parallel decodes is that you can decode the first and second at the exact same time. If you have to look at the first to see what the length is, then you've failed. You need fixed length, then you can split the instruction cache so that odd/even instructions can be accessed simultaneously.

The Superscaler 68060 averages better than 1 instruction per cycle. A good assembler programmer should be able to average about 2 instructions per cycle in some code. This means that the 68060 is able to decode in parallel with variable length instructions. Short and simple instructions are the key. ARM with Thumb 2 also uses variable length instructions (Thumb 1 used a 16 bit instruction mode only).

Quote from: psxphill;721380

Thumb2 sounds slower:

...
The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best

...

I agree that Thumb 2 is a little slower. ARM Holdings claimed 15-25% slower but I am guessing that did not consider that the code is in the cache more often. The figure above is more realistic and good enough that Thumb 2 is used most of the time. Thumb 2 is a real ISA that can stand on it's own unlike Thumb 1. Newer ARM processors will likely drop Thumb 1 support and maybe more.

Quote from: psxphill;721380

With PPC we can run powerup/warpup software, implementing arm is boring.

The fpga Arcade has an ARM CPU so there is no need to emulate. PPC would be interesting but fpga PPC CPU performance would be lousy.

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 06, 2013, 06:58:20 AM

The ARM CPU on the FPGA Replay is most likely busy serving the FPGA with disc emulation and doesn't have the code space to do much else. The transfer capacity to the FPGA may also be a serious bottleneck.

Title: Re: Die space for m68k on FPGA?
Post by: danbeaver on January 06, 2013, 09:31:08 AM

I thought I read that FPGA was meant as a way of prototyping a complex circuit so you you don't waste silicon making prototypes that don't work. Once you have your working FPGA you then lay it into a faster more economical silicon. The reason for keeping it in FPGA is that it can be changed if needed.

Any hint as to the truth in this rumor?

Title: Re: Die space for m68k on FPGA?
Post by: Hattig on January 06, 2013, 10:21:56 AM

Quote from: danbeaver;721409

I thought I read that FPGA was meant as a way of prototyping a complex circuit so you you don't waste silicon making prototypes that don't work. Once you have your working FPGA you then lay it into a faster more economical silicon. The reason for keeping it in FPGA is that it can be changed if needed.

Any hint as to the truth in this rumor?

Why do you call it a rumour? It's one of the use cases of an FPGA.

Howevermaking an ASIC is really expensive. For small quantities of logic you may implement using an FPGA even in the end product.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 06, 2013, 01:25:18 PM

Quote from: matthey;721396

The Superscaler 68060 averages better than 1 instruction per cycle. A good assembler programmer should be able to average about 2 instructions per cycle in some code. This means that the 68060 is able to decode in parallel with variable length instructions.

68060 can despatch two instructions at the same time, I don't think it decodes them at the same time.

"The superscalar micro-architecture actually consists of two distinct
parts: a four-stage instruction fetch pipeline (IFP) responsible for
accessing the instruction stream and dual four-stage operand execution
pipelines (OEPs) which perform the actual instruction execution. These
pipeline structures operate in an independent manner with a FIFO instruction
buffer providing the decoupling mechanism."

I don't believe it can sustain 2 instructions per cycle for long before the fetch pipeline runs dry & that is if you can even find worthwhile work to do in instructions that can run in parallel.

Quote from: freqmax;721399

The ARM CPU on the FPGA Replay is most likely busy serving the FPGA with disc emulation and doesn't have the code space to do much else. The transfer capacity to the FPGA may also be a serious bottleneck.

Even so, the ARM is a SOC. If you want to use it for emulation then you'd need to be able to configure it's memory map. Maybe you could do it with MMU tricks, but it's not really in the spirit of the FPGA arcade.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 02:10:17 PM

Quote from: xyzzy;721331

better would be to add specific instructions to the 68k that help with emulation of other processors.

+99999

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 06, 2013, 02:13:41 PM

Which will cause incompatabilities..

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 06, 2013, 02:16:50 PM

Quote from: freqmax;721438

Which will cause incompatabilities..

Yeah, any design that allows you to write software that won't run on a real amiga is very bad thing. Anyone who wants something new would find a PC more suitable, you could even run AROS on it.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 02:19:05 PM

Quote from: psxphill;721432

68060 can despatch two instructions at the same time, I don't think it decodes them at the same time.

The M68060 dispatches, decodes, executes, completes and writes the results of 2 instructions at the same time.

This applies to most of the common simple simple instructions.

It does not apply to gigantic complicated instructions or rare instructions.

Furthermore, it does 3 instructions at the same time, as long as 1 of the instructions is a correctly predicted branch. Loops are common structures of computer programming. The branch at the bottom of the loop will be correctly predicted the 2nd thru the nth times it is executed.

I don't know if it will correctly predict the LOOP branch the 1st time it is encountered. But if you have a loop from 1 to 1000 then it will be correctly predicted 999 times out of 1000 which is a fairly good rate. :)

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 02:24:32 PM

Add some new instructions.

Quote from: freqmax;721438

Which will cause incompatabilities..

Intell adds new instructions all the time.

Yet I never see you posting on Intel forums "omg! Its incompatible!"

p.s. Never ever ever ever buy a Rosewill keyboard. This stupid thing @#?!@>#$ up every single msg I type. GRRRR. It only worked for 25 days. Since then its been complete crap.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 02:29:43 PM

Quote from: psxphill;721439

Yeah, any design that allows you to write software that won't run on a real amiga is very bad thing. Anyone who wants something new would find a PC more suitable,

You said "any design that allows you to write software that won't run on a real amiga is very bad thing."

This means PCs are bad because its a design "that allows you to write software that won't run on a real amiga"

Then you tell ppl to buy a PC.

You are not making any sense.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 02:35:42 PM

The FPGA used in Replay is not likely to have the die space to handle 386 + m68k at the same time.

386s are dirt cheap. NewEgg was selling 2Ghz Celerons with a free springloaded keyboard for $25.00 last week.

As for 386 it has been discussed before. The size is just too big to be practical. It's way better to use the ASIC 386 until moores law makes it feasable.

Title: Re: Die space for m68k on FPGA?
Post by: asymetrix on January 06, 2013, 02:37:47 PM

a quick look on opencores : http://opencores.org/project,ao68000

Quote

Features

CISC processor with microcode,
WISHBONE revision B.3 compatible MASTER interface,
Not cycle exact with the MC68000, some instructions take more cycles to complete, some less,
Uses about 4750 LE on Altera Cyclone II and about 45600 bits of RAM for microcode,
Tested against the WinUAE M68000 software emulator. Every 16-bit instruction was tested with random register contents and RAM contents (Processor verification). The result of execution was compared,
Contains a simple prefetch which is capable of holding up to 5 16-bit instruction words,
Documentation generated by Doxygen (http://www.doxygen.org) with doxverilog patch (http://developer.berlios.de/projects/doxverilog/). The specification is automatically extracted from the Doxygen HTML output.
WISHBONE compatibility

Version: WISHBONE specification Revision B.3,
General description: 32-bit WISHBONE Master interface,
WISHBONE signals described in IO Ports,
Supported cycles: Master Read/Write, Master Block Read/Write, Master Read-Modify-Write for TAS instruction, Register Feedback Bus Cycles as described in chapter 4 of the WISHBONE specification,
Use of ERR_I: on memory access – bus error, on interrupt acknowledge: spurious interrupt,
Use of RTY_I: on memory access – repeat access, on interrupt acknowledge: generate auto-vector,
WISHBONE data port size: 32-bit,
Data port granularity: 8-bits,
Data port maximum operand size: 32-bits,
Data transfer ordering: BIG ENDIAN,
Data transfer sequencing: UNDEFINED,
Constraints on CLK_I signal: described in Clocks, maximum frequency: about 82 MHz.
Use

The ao68000 is used as the processor for the OpenCores aoOCS project - Wishbone Amiga OCS SoC (http://opencores.org/project,aoocs)
It can also be used as a processor in a System-on-Chip booting Linux kernel version 2.6.33.1 up to init program lookup (System-on-Chip example with ao68000 running Linux).
Similar projects

Other free soft-core implementations of M68000 microprocessor include:

OpenCores TG68 (http://www.opencores.org/project,tg68) - runs Amiga software, used as part of the Minimig Core,
Suska Atari VHDL WF_68K00_IP Core (http://www.experiment-s.de/en) - runs Atari software,
OpenCores K68 (http://www.opencores.org/project,k68) - no user and supervisor modes distinction, executes most instructions, but not all.
OpenCores ae68 (http://www.opencores.org/project,ae68) - no files uploaded as of 27.03.2010.
Limitations

Microcode not optimized: some instructions take more cycles to execute than the original MC68000,
TRACE not tested,
The core is still large compared to other implementations.
TODO

Optimize the desgin and microcode,
Count the exact cycle count for every instruction,
Test TRACE,
Write more documentation.
Status

April 2010: Tested with WinUAE software MC68000 emulator,
April 2010: Booted Linux kernel up to init process lookup,
December 2010: Runs as a processor in OpenCores aoOCS project,
January 2011: Core area optimization by over 33% (Thanks to Frederic Requin).
July 2011: Project copied to (https://github.com/alfikpl/ao68000). Further development of ao68000 will continue on github.
Requirements

Icarus Verilog simulator (http://www.icarus.com/eda/verilog/) is required to compile the tb_ao68000 testbench/wrapper,
Access to Altera Quartus II instalation directory (directory eda/sim_lib/) is required to compile the tb_ao68000 testbench/wrapper,
GCC (http://gcc.gnu.org) is required to compile the WinUAE MC68000 software emulator,
Java runtime (http://java.sun.com) is required to run the ao68000_tool (ao68000_tool documentation),
Java SDK (http://java.sun.com) is required to compile the ao68000_tool (ao68000_tool documentation),
Altera Quartus II synthesis tool (http://www.altera.com) is required to synthesise the soc_for_linux System-on-Chip (System-on-Chip example with ao68000 running Linux).

Title: Re: Die space for m68k on FPGA?
Post by: Fats on January 06, 2013, 02:38:32 PM

Quote from: xyzzy;721331

Better would be to add specific instructions to the 68k that help with emulation of other processors.

For the use case of running DOSBox in AmigaOS on an FPGA, I personally think it is more efficient to get rid of the whole complex JIT emulation in the first place. Of course just a matter of opinion.

greets,
Staf.

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 06, 2013, 04:16:09 PM

Quote from: ChaosLord;721444

You said "any design that allows you to write software that won't run on a real amiga is very bad thing."

This means PCs are bad because its a design "that allows you to write software that won't run on a real amiga"

Then you tell ppl to buy a PC.

You are not making any sense.

I'm glad you admitted that you don't understand my point. It proves beyond any doubt just what I've been dealing with.

I'd explain it, but you either couldn't understand or you are trolling.

Quote from: ChaosLord;721440

The M68060 dispatches, decodes, executes, completes and writes the results of 2 instructions at the same time.

It's pipelined, while it can dispatch an instruction in 1 clock cycle and execute an instruction in 1 clock cycle. They aren't the same instruction that it's doing, you don't notice that from the point of view of the program until you get a mis-predicted branch.

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 04:20:11 PM

Quote from: matthey;721296

DosBox with 68k Dynamic Recompilation should be able to achieve 386 emulation speeds on a fast 68060 or fpga 68k CPU. The bonus is that the Amiga can multitask at the same time kind of like the advantage of ShapeShifter over a real 68k Macintosh.

+9999

Quote

An enhanced fpga 68k CPU could support faster emulation of x86 by providing some useful instructions and addressing modes that the x86 has but the 68k does not.

Adding new instructions is easy. Adding new addressing modes.... uhmm... you would just have to say that certain instructions are hardwired for this new addressing mode that you want?

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 06, 2013, 04:35:19 PM

Quote from: ChaosLord;721440

The M68060 dispatches, decodes, executes, completes and writes the results of 2 instructions at the same time.

This applies to most of the common simple simple instructions.

It does not apply to gigantic complicated instructions or rare instructions.

You have been studying ;). It's not all done in parallel but more is than not. Code is sequential (in series) by nature so there are some limitations to parallel operation but the 68060 shows the proper way to do a Superscaler CPU.

Quote from: ChaosLord;721440

Furthermore, it does 3 instructions at the same time, as long as 1 of the instructions is a correctly predicted branch. Loops are common structures of computer programming. The branch at the bottom of the loop will be correctly predicted the 2nd thru the nth times it is executed.

I don't know if it will correctly predict the LOOP branch the 1st time it is encountered. But if you have a loop from 1 to 1000 then it will be correctly predicted 999 times out of 1000 which is a fairly good rate. :)

The 68060 has 4 execution units (2xinteger, fpu and branch) that operate at least partially in parallel. I believe you are correct with the 3 instruction per cycle max though. As far as loops go, there can be a slow down on the first loop iteration if the branch is not in the branch cache. There is a very costly misprediction on the last loop iteration as the branch falls through which can't be avoided. It would be possible to avoid sometimes with hardware help but not on tight loops.

Quote from: psxphill;721432

68060 can dispatch two instructions at the same time, I don't think it decodes them at the same time.

...

I don't believe it can sustain 2 instructions per cycle for long before the fetch pipeline runs dry & that is if you can even find worthwhile work to do in instructions that can run in parallel.

See TCL's answer above and the 68060 documentation. The 68060 can sustain 2 instructions per cycle but it's impractical in most cases. Complex code is going to use some instructions that do not operate in parallel. The 68060 instruction fetch is 32 bits/cycle which is low but enough for 2 small instructions. Exceeding 32 bits of instructions per cycle usually does not cause a slowdown unless it's common. The 68060 is already a great CPU but it would have been a monster if they would have:

made MULS.W, MULU.W and SWAP as 1 cycle pOEP|sOEP (easy)
added MVS, MVZ and small longword immediate to word compression (see 68kF)
added a link stack

I would expect those moves alone would have been good for 1.5+ instructions per cycle in compiled code because they greatly reduce 68060 bottlenecks.

Quote from: psxphill;721468

It's pipelined, while it can dispatch an instruction in 1 clock cycle and execute an instruction in 1 clock cycle. They aren't the same instruction that it's doing, you don't notice that from the point of view of the program until you get a mis-predicted branch.

I thought what TCL described was pipelined processing and correct. I didn't assume that the same 2 instructions were processed in all the pipelined steps at once. I don't understand his English as specifying this information.

Quote from: ChaosLord;721470

Adding new instructions is easy. Adding new instructions modes.... uhmm... you would just have to say that certain instructions are hardwired for this new addressing mode that you want?

No. I want to add the new addressing modes for all 68k effective addresses (EAs). The addressing mode in question and used in DosBox is base register update which the 68k does not have. I represent it in the 68kF docs as (bd,An,Rn*Scale)! with the explanation point at the end specifying to update the base register An with the calculated value. The EA is already calculated so there is little additional overhead. ARM also has this addressing mode and probably some other processors making emulation easier. The 68k addressing mode would be much more flexible and usable than the x86 addressing mode because the 68k has general purpose registers and more of them.

Title: Re: Die space for m68k on FPGA?
Post by: Mrs Beanbag on January 06, 2013, 05:48:48 PM

How would you encode it in the instruction?

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 06:25:43 PM

Quote from: matthey;721474

No. I want to add the new addressing modes for all 68k effective addresses (EAs).

Its been such a long time since I studied the addressing mode bits... I thought they were all used up? How will u encode a new universal addressing mode?

Quote

The addressing mode in question and used in DosBox is base register update which the 68k does not have. I represent it in the 68kF docs as (bd,An,Rn*Scale)! with the explanation point at the end specifying to update the base register An with the calculated value. The EA is already calculated so there is little additional overhead.

"The EA is already calculated so there is little additional overhead."
hmmm.. maybe...
Due to the pipelined structure... it could be troublesome to code that in. I am sure it can be done... with some trickery.

Quote

ARM also has this addressing mode and probably some other processors making emulation easier. The 68k addressing mode would be much more flexible and usable than the x86 addressing mode because the 68k has general purpose registers and more of them.

You have convinced me that it is a good thing to add in.

But the devil is in the details :D

The first thing Jens is going to say is "it messes up the pipeline structure and adds more complexity to the processing and I have to add another secret internal register to hold and forward the results and..."

Title: Re: Die space for m68k on FPGA?
Post by: Mrs Beanbag on January 06, 2013, 06:38:43 PM

move -(An),Dn
for instance, already updates the address register with the calculated value. So it shouldn't be too much trouble. It's the encoding that worries me, however I have the Motorola reference manual in front of me and it states that IS-I/IS values of 0100 and 1100-1111 are "reserved".

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 07:05:00 PM

Quote from: Mrs Beanbag;721500

move -(An),Dn
for instance, already updates the address register with the calculated value. So it shouldn't be too much trouble.

Good point!

They already forward results from the EA unit into an Address register after the instruction is complete.

There may be a limitation of requiring it to be an Address register in order not to stall the pipeline. But no problems.

Quote

It's the encoding that worries me, however I have the Motorola reference manual in front of me and it states that IS-I/IS values of 0100 and 1100-1111 are "reserved".

I seem to have forgotten what IS-I/IS means?

In any case: "Reserved" means "Reserved so we can use them".

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 06, 2013, 07:27:31 PM

Quote from: Mrs Beanbag;721495

How would you encode it in the instruction?

Preliminary 68koolFusion ISA:

OpenOffice Writer
http://www.heywheel.com/matthey/Amiga/68kF_PRM.odt

PDF
http://www.heywheel.com/matthey/Amiga/68kF_PRM.pdf

html
http://www.heywheel.com/matthey/Amiga/68kF_PRM.html

The addressing modes are at the top.

Quote from: ChaosLord;721498

Its been such a long time since I studied the addressing mode bits... I thought they were all used up? How will u encode a new universal addressing mode?

Nope. There is 1 free bit in the full format extension and then available I/IS encodings that were reserved. All new addressing modes are fully backward compatible. There is room to add addressing modes with scale > *8 or even (bd,An*Rn) which would be powerful but they would cause a slowdown, especially on an fpga. They might be fine in silicon if a longer pipeline was chosen although I don't like very long pipelines.

Quote from: ChaosLord;721498

The first thing Jens is going to say is "it messes up the pipeline structure and adds more complexity to the processing and I have to add another secret internal register to hold and forward the results and..."

Only a wee bit more complexity and no secret internal registers (that's needed for LEA EA,Dn which would help x86 emulation also :/ ).

Quote from: Mrs Beanbag;721500

move -(An),Dn
for instance, already updates the address register with the calculated value. So it shouldn't be too much trouble.

Bingo. It does require 2 register writes but that is already needed on the 68k.

Quote from: Mrs Beanbag;721500

It's the encoding that worries me, however I have the Motorola reference manual in front of me and it states that IS-I/IS values of 0100 and 1100-1111 are "reserved".

Reserved is reserved for future ISA changes :).

Title: Re: Die space for m68k on FPGA?
Post by: psxphill on January 06, 2013, 07:35:24 PM

Quote from: ChaosLord;721505

In any case: "Reserved" means "Reserved so we can use them".

No, reserved means you can't use them. Especially for ones that raise invalid instruction exceptions that programs trap. There is software that uses the reserved line-a exceptions for example.

http://forums.sonicretro.org/index.php?showtopic=24409

If you don't care about compatibility then go ahead and add instructions that will make it not work properly, but then why do you want 680x0?

Title: Re: Die space for m68k on FPGA?
Post by: ChaosLord on January 06, 2013, 07:45:48 PM

Quote from: matthey;721508

There is room to add addressing modes with scale > *8

Ah yes! I remember that. I still want my *32 scalefactors! :D
Seriously I do want them.

Title: Re: Die space for m68k on FPGA?
Post by: Mrs Beanbag on January 06, 2013, 10:26:38 PM

I didn't spot bit #3 of extension word... in the docs it is simply '0'!

"Reserved" means Motorola might use them for something later... I guess they might come back and make a 68080 and then we'd be in trouble!

Although personally I think we're getting ahead of ourselves here... make a fully pipelined 680x0 + accelerator first, and worry about extending the ISA later.

Title: Re: Die space for m68k on FPGA?
Post by: matthey on January 07, 2013, 12:12:20 AM

Quote from: psxphill;721510

No, reserved means you can't use them. Especially for ones that raise invalid instruction exceptions that programs trap. There is software that uses the reserved line-a exceptions for example.

http://forums.sonicretro.org/index.php?showtopic=24409

If you don't care about compatibility then go ahead and add instructions that will make it not work properly, but then why do you want 680x0?

There is some 68k software that uses a missing or invalid instruction to trap but they are very few and supporting them means no enhancements are possible. The 68020 ISA broke a fair amount of 68000 code but it was a very good enhancement. The 68kF1 ISA would break way less software than 68000->68020. Even 68kF2 would break less. If 99.999% of code runs then I'm happy. The rest can be patched.

A-line is not currently used by either ISA. The MacOS, Atari ST and others used A-line for system call traps. I don't know of any Amiga software that makes use of A-line except emulators.

Quote from: ChaosLord;721511

Ah yes! I remember that. I still want my *32 scalefactors! :D
Seriously I do want them.

There is room for future enhancements. I want to be sure we don't slow down the EA unit before adding more CPU intensive EA processing. The most power would come from a multiply+add in each EA unit but it is also the most costly in CPU processing although fairly easy to encode. Scale factors greater than *8 are not too CPU intensive in silicon but are challenging to encode. It's doable but may require giving up some options of (bd,An,Rn) like sign extended Rn word sizes and/or not allowing some suppression. The encoding wouldn't be pretty and 68k EA consistency would be ruined or the instruction size would grow. Add to that that the EA unit would be slower in fpga and I'd rather skip it until someone smarter than me tells how and if it should be done.

Quote from: Mrs Beanbag;721525

I didn't spot bit #3 of extension word... in the docs it is simply '0'!

It takes a little studying to figure that out ;). I'm glad that you were able to understand my documentation. The 68000PRM was a little confusing in regards to addressing modes. The powerful addressing modes of the 68k are what gives her so much power.

Quote from: Mrs Beanbag;721525

"Reserved" means Motorola might use them for something later... I guess they might come back and make a 68080 and then we'd be in trouble!

Motorola/Freescale killed the 68k so that it wouldn't compete with PPC and created the low end ColdFire for microcontrollers and simple embedded uses. In the mean time, ARM Holdings created the less powerful, less programmer friendly, less code dense than 68020 variable length instruction ARM with Thumb 2 and it now sells at least 8 billion ARM processors per year and is used in 95% of smartphones, 90% of hard disk drives, 40% of digital televisions and set-top boxes, 15% of microcontrollers and 20% of mobile computers. Oh, Freescale pays license fees to ARM too. Maybe they were a little smarter than C= after all, they aren't bankrupt yet. Tech companies that stop innovating start dying. Freescale, Microsoft, Sony and Amiga Inc. are dying. Apple, Intel, IBM and ARM Holdings keep innovating.

Quote from: Mrs Beanbag;721525

Although personally I think we're getting ahead of ourselves here... make a fully pipelined 680x0 + accelerator first, and worry about extending the ISA later.

Compatible standards need to be in place before there are incompatible products. It also takes time to develop good standards and ISAs. Yes, making a fast 68k CPU is a higher priority but there are already several fpga processors that are quite capable. More than 1 means there is a possibility for incompatible enhancements.

I also try to have fun and innovate with the 68k communities help. There has to be some good ideas that are different than x86 and ARM.

Title: Re: Die space for m68k on FPGA?
Post by: freqmax on January 07, 2013, 04:51:34 AM

The Amiga and 68k platform is dead until someone provides a pathway to ASIC with high volume and low price for consumers. Because m68k on mobile phones would be cool and perhaps way nicer to program.