Author Topic: Die space for m68k on FPGA? (Read 5177 times)

psxphill · « **on:** January 04, 2013, 12:37:06 AM »

Quote from: ChaosLord;721161

020 is just an 000 with 32-bit datapaths, a small L1 cache and a barrel shifter and a few new instructions and a few new addressing modes. It is more complex, sure. But I am still interested to know how many more LE it takes over a plain jane 000

The 020 onwards can read/write unaligned words and dwords. Again not earth shattering, but it's more complex. An EC030 is also not much more complex.

A 68851 & 68882 would be nice to have in addition, but those are more complex. The 68060 MMU&FPU were simpler, so that would be easier. You could also not bother with all the 68020 stuff.

psxphill · « **Reply #1 on:** January 05, 2013, 03:45:43 PM »

Quote from: freqmax;721299

This is very true for parallelport bitbanging DOS software. So it's the same issue as with software emulated Amigas. They can't deal with latency and propagation races properly.

It's ok as long as you have the hardware. The dos support in 32bit windows or if you're running 64 bit then virtualpc or dosbox are pretty good. What is lost in speed probably helps as the software was designed to run on something 20 times slower.

As soon as you have to use a usb serial port/parallel port (not that I have come across a usb parallel port that copes with anything other than printing) then the latency of usb really kills performance.

I have only used laptops for the last 12 years, but I know people who still use desktops with parallel ports that are able to run really old software. My old laptop had a parallel port and only runs 32 bit windows anyway, so that sometimes gets used. But that's for practical reasons and slowly I've been moving all those over to intelligent usb devices. By moving the software onto a cpu on the usb device you can offload the time critical code but still plug it into pretty much any modern computer.

There is probably some people that would get a use for it, not as many as want to run amiga software. The PC doesn't get people as passionate.

psxphill · « **Reply #2 on:** January 05, 2013, 06:17:33 PM »

Quote from: freqmax;721343

As for PPC it has been discussed before. The size is just too big to be practical. It's way better to use the ASIC PPC until moores law makes it feasable.

What makes it too big? The ISA itself shouldn't be. It would be tricky to match performance of a real chip & you'd probably have to leave a lot of complex features out. Most people would only want it for running warpup & powerup based software, it doesn't necessarily need to use the official kernels to do it either.

psxphill · « **Reply #3 on:** January 05, 2013, 08:36:43 PM »

Quote from: matthey;721353

It would be nearly impossible to match the performance of the original Amiga PPC cards or the SAM 440 with a PPC core in an affordable fpga.

The slowest phase 5 board was a PowerPC 603e 160, which had 16kb L1 cache & used the 32bit ISA. I'm sure you could get close to that, a sam 440 maybe not. But then a sam 440 doesn't have aga, so nothing is perfect.

It doesn't look to have a particularly more complex instruction set than the 68020 (once you factor in mmu & fpu). All instructions are 32bit, which affects density. But it also simplifies fetching, reducing density is more about ram usage than speed. It's better for performance if all instructions are the same length.

https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699600719DF2/$file/6xx_pem.pdf

psxphill · « **Reply #4 on:** January 05, 2013, 11:56:13 PM »

Quote from: matthey;721368

Constant instruction length is good for performance but so are small instructions that are simple to decode or allow parallel decoding.

Fixed length 32 bit instructions (like PPC & MIPS) are the sweet spot for performance.

Thumb is mainly for when you only have 16 bit access to ram, if your ram is 32bit then Thumb is slower (although it will use less ram).

Variable length instructions are a pain to parallel decode, because you have to decode the first one to know where the second one is.

psxphill · « **Reply #5 on:** January 06, 2013, 01:57:21 AM »

Quote from: matthey;721378

The length of any instruction on the 68k can be determined by looking at the first 32 bits. That's pretty good. The variable length instructions are not that much of a problem in reality while they save cache and improve ease of programming.

The whole point of parallel decodes is that you can decode the first and second at the exact same time. If you have to look at the first to see what the length is, then you've failed. You need fixed length, then you can split the instruction cache so that odd/even instructions can be accessed simultaneously.

Thumb2 sounds slower:

"The best options for armv7-a, thumb-2 and thumb-1 and overall:

The best is -O3 -funroll-loops -marm -march=armv5te -mtune=cortex-a8
The best armv7-a is -O3 -funroll-loops -marm -march=armv7-a -mtune=cortex-a8 at 95.2 % of overall best
The best Thumb-2 is -O3 -funroll-loops -mthumb -march=armv7-a -mtune=cortex-a8 at 88.7% of overall best
The best Thumb-1 is -O2 -mthumb -march=armv5te -mtune=cortex-a8 at 64.4% of overall best"

With PPC we can run powerup/warpup software, implementing arm is boring.

psxphill · « **Reply #6 on:** January 06, 2013, 01:25:18 PM »

Quote from: matthey;721396

The Superscaler 68060 averages better than 1 instruction per cycle. A good assembler programmer should be able to average about 2 instructions per cycle in some code. This means that the 68060 is able to decode in parallel with variable length instructions.

68060 can despatch two instructions at the same time, I don't think it decodes them at the same time.

"The superscalar micro-architecture actually consists of two distinct
parts: a four-stage instruction fetch pipeline (IFP) responsible for
accessing the instruction stream and dual four-stage operand execution
pipelines (OEPs) which perform the actual instruction execution. These
pipeline structures operate in an independent manner with a FIFO instruction
buffer providing the decoupling mechanism."

I don't believe it can sustain 2 instructions per cycle for long before the fetch pipeline runs dry & that is if you can even find worthwhile work to do in instructions that can run in parallel.

Quote from: freqmax;721399

The ARM CPU on the FPGA Replay is most likely busy serving the FPGA with disc emulation and doesn't have the code space to do much else. The transfer capacity to the FPGA may also be a serious bottleneck.

Even so, the ARM is a SOC. If you want to use it for emulation then you'd need to be able to configure it's memory map. Maybe you could do it with MMU tricks, but it's not really in the spirit of the FPGA arcade.

psxphill · « **Reply #7 on:** January 06, 2013, 02:16:50 PM »

Quote from: freqmax;721438

Which will cause incompatabilities..

Yeah, any design that allows you to write software that won't run on a real amiga is very bad thing. Anyone who wants something new would find a PC more suitable, you could even run AROS on it.

psxphill · « **Reply #8 on:** January 06, 2013, 04:16:09 PM »

Quote from: ChaosLord;721444

You said "any design that allows you to write software that won't run on a real amiga is very bad thing."

This means PCs are bad because its a design "that allows you to write software that won't run on a real amiga"

Then you tell ppl to buy a PC.

You are not making any sense.

I'm glad you admitted that you don't understand my point. It proves beyond any doubt just what I've been dealing with.

I'd explain it, but you either couldn't understand or you are trolling.

Quote from: ChaosLord;721440

The M68060 dispatches, decodes, executes, completes and writes the results of 2 instructions at the same time.

It's pipelined, while it can dispatch an instruction in 1 clock cycle and execute an instruction in 1 clock cycle. They aren't the same instruction that it's doing, you don't notice that from the point of view of the program until you get a mis-predicted branch.

psxphill · « **Reply #9 on:** January 06, 2013, 07:35:24 PM »

Quote from: ChaosLord;721505

In any case: "Reserved" means "Reserved so we can use them".

No, reserved means you can't use them. Especially for ones that raise invalid instruction exceptions that programs trap. There is software that uses the reserved line-a exceptions for example.

http://forums.sonicretro.org/index.php?showtopic=24409

If you don't care about compatibility then go ahead and add instructions that will make it not work properly, but then why do you want 680x0?

Author Topic: Die space for m68k on FPGA? (Read 5177 times)

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?

psxphill

Re: Die space for m68k on FPGA?