Author Topic: Motorola 68060 FPGA replacement module (idea) (Read 242607 times)

Hattig · « **Reply #134 on:** January 11, 2013, 09:59:13 AM »

The fact that he has got a 68k core running on an FPGA connected into a standard 68k socket and being seen as a standard 68k processor by the system is a great achievement.

As he says, the core can be clocked faster, and it's likely there are a few tweaks to get the per-clock speed up as well. First steps first.

Iggy · « **Reply #135 on:** January 11, 2013, 10:09:43 AM »

Quote from: JimDrew;722060

Endian is not an issue. You can swap with the FPGA.

My 68040 core handles everything without needing the endian reversed, but I am sure it would be significantly faster without having to do that.

The only thing I don't do in my code is instruction cycle counting. That could be done, but I never bothered. The FPGA could be used to trigger an event to denote the end of the instruction cycle (where a process loop just waited for this to occur). So, based on the speed of your x86 CPU, you could reliably have cycle exact timing at a speed limited to by your fastest instruction (nop). I know my Mac emulation has no JIT type of stuff, is 100% assembly, and is frightenly fast on modern PC hardware. I will have to test it on my Sandy Bridge setup to see how fast of a 040 Mac it is.

Jim, you're an inspiration. The real trick will be moving this to a system that generates the corrects signals at the appropriate pins.
That's how I envision the FPGA being used to connect the X86 to the system.
I even think that additional X86 cores to run alternate OS' simultaneously (Windows, Linux, AROS, etc.)

psxphill · « **Reply #136 on:** January 11, 2013, 01:45:37 PM »

Quote from: JimDrew;722060

My 68040 core handles everything without needing the endian reversed, but I am sure it would be significantly faster without having to do that.

The standard way is to treat memory as a an array of little endian dwords and then:

byte: xor the address with 3
word: for aligned xor the address with 2, for unaligned either one or two accesses depending on how it's split and then shift
dword: for aligned do nothing, for unaligned you have to split it up into two aligned accesses and then shift

There isn't really an alternative to that & that's as fast as it gets

The fpga can't know when to swap as it requires knowledge about what the cpu is trying to do with the data, plus it won't even be asked about data once it's cached. So if you read memory as a dword and then as a byte, it will not work.

Mrs Beanbag · « **Reply #137 on:** January 11, 2013, 02:03:52 PM »

supposing one does a movem.w (SP)+,D0-D7 for instance... urgh

psxphill · « **Reply #138 on:** January 11, 2013, 03:12:32 PM »

Quote from: Mrs Beanbag;722087

supposing one does a movem.w (SP)+,D0-D7 for instance... urgh

movem.w is heavy without the endian, you need to look at the bitmask to determine which 68k registers need writing out and then load them from ram into x86 registers before writing them out.

You could probably have an aligned and an unaligned version (whether it's aligned to a word boundary is not going to change by writing another word).

Doesn't movem.w (SP)+ add 4 to the stack pointer for each write? at least on some processors I'm sure the stack pointer gets 4 byte aligned.

polyp2000 · « **Reply #139 on:** January 11, 2013, 03:48:18 PM »

I noticed his website has been hacked - i hope this doesnt hinder his progress.

http://www.majsta.com/

Hattig · « **Reply #140 on:** January 11, 2013, 04:05:26 PM »

Quote from: polyp2000;722095

I noticed his website has been hacked - i hope this doesnt hinder his progress.

http://www.majsta.com/

I just don't understand why someone would do that in this community.

I hope that it was just a random script poking random URLs that found a flaw in the platform he was using for his website.

freqmax · « **Reply #141 on:** January 11, 2013, 04:07:23 PM »

So fücking unecessary to mess up his website, perhaps a scriptkiddie?

Mrs Beanbag · « **Reply #142 on:** January 11, 2013, 04:15:26 PM »

dunno I did just tweet a link to it so maybe it's my fault

some people will just hack anything though just because they can

Mrs Beanbag · « **Reply #143 on:** January 11, 2013, 06:16:28 PM »

the site seems to be back online now.

JimDrew · « **Reply #144 on:** January 12, 2013, 05:19:29 AM »

I know for fact you can do the byte swap/word swap with the FPGA in real time, but depending on how fast the x86 is it could slow down the operation of the 680x0 emulation. By the way, the fastest way to swap bytes and words is through a big lookup table. It's faster than trying to convert it by hand using shifts, ANDs, and XORs. If you run everything backwards in memory you don't need to do the byte swaps, but the x86 doesn't cache backwards, and anything requiring DMA (like audio) won't work. It's also interesting to note that there are certain x86 addressing modes that although reduce the size of the code, significantly reduce the throughput of CPU instruction lookup. I learned quite a bit about the Intel and AMD architectures when I did the port of FUSION to the PC. It is always faster to ADD an offset twice and branch than it is to use a *2 in a branch instruction. The *2 stalls the pipeline while calculating the address. When I created FUSION-PC, I made a histogram of the instructions used during the Mac boot up. I was surprised that fewer than 10% of all of the available instructions were used when the Mac booted. I would bet the Amiga boot up is very similar.

freqmax · « **Reply #145 on:** January 12, 2013, 08:57:32 AM »

Is it this software?
WP-en: VMware Fusion

Mrs Beanbag · « **Reply #146 on:** January 12, 2013, 03:46:12 PM »

Quote from: psxphill;722094

Doesn't movem.w (SP)+ add 4 to the stack pointer for each write? at least on some processors I'm sure the stack pointer gets 4 byte aligned.

The programmer's reference manual doesn't say so. It says "the address is incremented by the operand length (2 or 4)".

Fats · « **Reply #147 on:** January 13, 2013, 10:12:53 AM »

Quote from: JimDrew;722177

I know for fact you can do the byte swap/word swap with the FPGA in real time, but depending on how fast the x86 is it could slow down the operation of the 680x0 emulation.

How do you know if you need to swap or not ? For example a memory copy function that uses 32bit transfers but may be copying strings that may not be byte swapped ?

greets,
Staf.

psxphill · « **Reply #148 on:** January 13, 2013, 12:01:17 PM »

Quote from: Mrs Beanbag;722213

The programmer's reference manual doesn't say so. It says "the address is incremented by the operand length (2 or 4)".

Ok. It's bytes that affect a7 by 2, but everything else by 1.

"Indirect addressing with postincrement

Assembler syntax:

(An)+

Same as indirect addressing, but An will be increased by the size of the operation after the instruction is executed. The only exception is byte operations on A7 - this register must point to an even address, so it will always increment by at least 2. Example:"

Quote from: Fats;722286

How do you know if you need to swap or not ? For example a memory copy function that uses 32bit transfers but may be copying strings that may not be byte swapped ?

My only thought would be to access bytes/words & dwords at different address ranges & disable all the caches. You could implement a cache inside the fpga, although it would probably still be slower than the on chip cache because it would be limited to the fsb speed. You'd have to do it to figure out which came out better or worse.

Most of the time the 68k code would be doing aligned accesses, so that should show up in the branch prediction. So a test for whether it's aligned and then the xor is likely to not have much impact at all.

I know Mike Coates spent a long time optimising his 68k core back in the day, it was used in MAME back in the late 1990's & early 2000's. Obviously some of the optimisations are likely to be deoptimisations on recent cpu's, but the clock speed outweighs the effort required. Even musashi (the C core that MAME now uses) would probably be more than sufficient.

Or if someone does an ARM board instead then there is always http://notaz.gp2x.de/cyclone.php

My thought on using x86 (or even better x64) is that using a VM on the card could allow a bridge board style PC emulator to run at the same time as the 68k. Crazy idea I guess, but heh these ideas are all crazy.

Mrs Beanbag · « **Reply #149 from previous page:** January 13, 2013, 02:27:52 PM »

How about this for a crazy idea, an accelerator with an Arm CPU and an FPGA, the FPGA can function as a 68k CPU if set up as such, so it could run like the PPC accelerator boards. BUT you install AROS for ARM ROM chips and use the Arm as the main CPU, and allow the FPGA to be reconfigured by the Arm chip, so then you could develop your 68k core "live", and install updates through software.

Author Topic: Motorola 68060 FPGA replacement module (idea) (Read 242607 times)

Hattig

Re: Motorola 68060 FPGA replacement module (idea)

Iggy

Re: Motorola 68060 FPGA replacement module (idea)

psxphill

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

psxphill

Re: Motorola 68060 FPGA replacement module (idea)

polyp2000

Re: Motorola 68060 FPGA replacement module (idea)

Hattig

Re: Motorola 68060 FPGA replacement module (idea)

freqmax

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

JimDrew

Re: Motorola 68060 FPGA replacement module (idea)

freqmax

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

Fats

Re: Motorola 68060 FPGA replacement module (idea)

psxphill

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)