Author Topic: Motorola 68060 FPGA replacement module (idea) (Read 187065 times)

Mrs Beanbag · « *Last Edit: January 21, 2013, 02:01:38 PM by Mrs Beanbag* »

There is this information from the Megadrive:
http://emu-docs.org/CPU%2068k/68kstat.txt

although it might be more instructive to see which are the most common addressing modes for these instructions, too.
For instance, rate of "add Dx,Dy" vs "add (Ax),Dy" and "add Dx,(Ay)".

bloodline · « **Reply #465 on:** January 21, 2013, 02:50:59 PM »

Quote from: Mrs Beanbag;723423

There is this information from the Megadrive:
http://emu-docs.org/CPU%2068k/68kstat.txt

although it might be more instructive to see which are the most common addressing modes for these instructions, too.
For instance, rate of "add Dx,Dy" vs "add (Ax),Dy" and "add Dx,(Ay)".

Brilliant!! I kinda figured the branch, move and compare instructions would be the more popular

matthey · « **Reply #466 on:** January 21, 2013, 03:19:35 PM »

Quote from: psxphill;723410

I used mame (arcade game emulator), typed the hex into memory and then disassembled and executed it. It's not just the disassembler, the emulation consumed the same number of bytes. So that needs looking at, can you post the exe you assembled?

http://www.heywheel.com/matthey/Amiga/test68020

Are you involved with developing or testing mame?

Quote from: psxphill;723410

There is mention in the manual about some instructions being split over two pipelines, it might do that by splitting it into two FIFO entries. With the result of the ea fetch from the primary pipeline getting forwarded to the secondary pipeline so it can get stored.

Right. The OEPs are locked together and each OEP performs 1/2 of the ea for a move ,. This is the only 68k instruction that allows 2 EAs by the way.

Quote from: psxphill;723410

Have you tried running this encoding on a real 68060?

It's not safe as it writes memory but I have never had a problem with double memory indirect modes before. Some compiler versions of GCC and SAS/C will use them. ThoR's 68060.library uses them because they save saving and reloading a register on the stack for short functions. They do need to be at least trapped in an fpga 68020+ CPU or compatibility will not be good.

Quote from: bloodline;723432

Brilliant!! I kinda figured the branch, move and compare instructions would be the more popular

I hope MOVE is a popular instruction as it takes almost 1/4 (actually 3/16 but who's counting) of the 68k encoding space

.

Mrs Beanbag · « **Reply #467 on:** January 21, 2013, 03:22:59 PM »

So keeping the pipeline relatively short is probably a more effective strategy than making sure all instructions are single cycle. We can afford a few 2-cycle instructions if we can shorten the pipeline by at least one stage, I reckon.

Also I have been thinking of a way to make the instruction translation do branch predication in the case a conditional branch skips only a few instructions.

Mrs Beanbag · « **Reply #468 on:** January 21, 2013, 03:58:23 PM »

Actually something just occurred to me. If the most common instruction is "tst", it should be possible to know whether a branch will be taken or not some time in advance. Because "tst" only looks at a single register, the contents of that register must have been determined some time before. So you could look ahead in the instruction queue for a "tst/bcc", and inform the branch predictor well in advance. "tst" instruction then takes effectively NO cycles.

psxphill · « **Reply #469 on:** January 21, 2013, 03:59:26 PM »

Quote from: matthey;723433

http://www.heywheel.com/matthey/Amiga/test68020

Are you involved with developing or testing mame?

Developing mainly, although I've not had much to do with the 680x0 side.

Quote from: matthey;723433

Right. The OEPs are locked together and each OEP performs 1/2 of the ea for a move ,.

The OEPS are always locked together, the manual hints at how move , works:

"pOEP-until-last Many of the non-standard instructions represent a combination of
multiple “standard” operations. As an example, consider the
memory-to-memory MOVE instruction. This instruction is decomposed
into two standard operations: first, a standard read cycle followed by a
standard write cycle. This class allows a standard single-cycle
instruction to be dispatched from the sOEP during the last cycle of its

pOEP execution."

It seems to say that two entries are written to the FIFO and the second entry in the FIFO sits waiting until the primary is about to finish before despatching to the secondary. Although I'd have thought it would despatch earlier so it could calculate the EA.

Quote from: Mrs Beanbag;723435

Actually something just occurred to me. If the most common instruction is "tst", it should be possible to know whether a branch will be taken or not some time in advance. Because "tst" only looks at a single register, the contents of that register must have been determined some time before. So you could look ahead in the instruction queue for a "tst/bcc", and inform the branch predictor well in advance. "tst" instruction then takes effectively NO cycles.

Apart from the cycles it takes to look ahead in the instruction stream every time you hit a tst instruction, and it will get complex to even follow the code as you would have to follow branches as well. Basically to avoid the cycles when a branch happens, you'll end up going through the same overhead as running the code after every tst instruction (tst isn't the only instruction that affects branches).

It also won't help a branch directly after a branch because it will already have started progressing through the pipeline.

Mrs Beanbag · « **Reply #470 on:** January 21, 2013, 04:50:21 PM »

Quote from: matthey;723433

Right. The OEPs are locked together and each OEP performs 1/2 of the ea for a move ,. This is the only 68k instruction that allows 2 EAs by the way.

Not strictly true. Can also do "cmp (Ax)+,(Ay)+"

addx, subx, abcd and sbcd can use predecrement for both operands.

All of these are two cycle instructions.

Quote from: psxphill;723436

Apart from the cycles it takes to look ahead in the instruction stream every time you hit a tst instruction, and it will get complex to even follow the code as you would have to follow branches as well. Basically to avoid the cycles when a branch happens, you'll end up going through the same overhead as running the code after every tst instruction (tst isn't the only instruction that affects branches).

Instructions are read into a buffer ahead of time, so can detect a tst/bcc when it is first read in. I wouldn't bother following branches, to be able to predict only the next branch would still help. Yes it would only work if the branch follows a tst, but if the profiles from the Megadrive are anything to go by, that is the most common case. Basic RISC principle, "make the common case fast"!

matthey · « **Reply #471 on:** January 21, 2013, 05:33:03 PM »

Quote from: Mrs Beanbag;723434

Also I have been thinking of a way to make the instruction translation do branch predication in the case a conditional branch skips only a few instructions.

Be careful with the predication on the 68k. It might be possible to get it to work as 1 conditional instruction sometimes. It doesn't work well with multiple instructuctions, multicycle instructions or addressing modes that update the base register like (An)+ and -(An). The data to be predicated ends up having to be examined for suitability. IMO, this would only be worthwhile with very common code. Image handling this:

Code: [Select]

   beq skip
   movem.l d0-d7/a0-a6,-(sp)
skip:
   move.l d0,-(sp)

The N68k fpga CPU is supposedly conditional 3 op internally making predication easier. There were enough problems on the 68k that we decided adding SBcc and SELcc were easier. Even this takes some logic but the 68k already has Scc which is handled much the same way.

Quote from: Mrs Beanbag;723435

Actually something just occurred to me. If the most common instruction is "tst", it should be possible to know whether a branch will be taken or not some time in advance. Because "tst" only looks at a single register, the contents of that register must have been determined some time before. So you could look ahead in the instruction queue for a "tst/bcc", and inform the branch predictor well in advance. "tst" instruction then takes effectively NO cycles.

The 68000 (16 bit) code in a console is going to be very different from 68060 optimized code for a dynamic OS today. I very much doubt TST is going to be number 1 any more. I expect MOVE to be #1. MOVE sets the condition codes so a TST should not be needed too often with optimized code. Folding a TST, CMP, or SUB/SUBQ with a branch is something the 68060 does to help achieve 0 cycle branch prediction although I don't know which specifically it does. TST has a higher likely hood of testing a register that has not been modified for a time than MOVE which sets the cc. Many processors do try to determine the branch rather than predict it. The PPC is especially good at this. It also provides several cc's that can be selectively set and branched on later. Most PPC processors have a fairly short pipeline too so branching on a condition set 3 or 4 instructions ago or testing and immediately branching on an instructions that hasn't changed recently may be enough to determine the branch without prediction. It probably helps, especially if the compilers can generate good code, but it obviously hasn't helped PPC destroy x86 like was predicted 20 years ago

.

Quote from: Mrs Beanbag;723441

Not strictly true. Can also do "cmp (Ax)+,(Ay)+"

addx, subx, abcd and sbcd can use predecrement for both operands.

All of these are two cycle instructions.

Yes, they are more complex on the 68060 but no they don't use 2 EAs. They are special cases that do not calculate even 1 EA. The plus of (An)+ is added after the EA is used and is not part of the calculation.

ChaosLord · « **Reply #472 on:** January 21, 2013, 06:17:16 PM »

I think the thing with TST being the #1 instruction in SEGA games is either:

A: All those games were compiled with either SASC or GCC which generates silly wasted TST instructions all the time.

B: The Sega Genesis uses PIO (Polled IO) for some things so it has to constantly TST a certain memory location all the time in a loop.

C: All of the above.

psxphill · « **Reply #473 on:** January 21, 2013, 07:38:03 PM »

Quote from: Mrs Beanbag;723441

Instructions are read into a buffer ahead of time, so can detect a tst/bcc when it is first read in. I wouldn't bother following branches, to be able to predict only the next branch would still help. Yes it would only work if the branch follows a tst, but if the profiles from the Megadrive are anything to go by, that is the most common case. Basic RISC principle, "make the common case fast"!

The basic risc principle is keep instructions simple so that you can use the spare space for large register sets and caches.

It wouldn't help at all when the branch follows the test, because you're going to have to flush all the following instructions from the pipeline. If you're going to remove the pipeline completely or a significant number of stages then you'll have a huge number of instructions taking multiple cycles and the overhead of incorrectly predicted branches is going to be so insignificant that it won't be worth doing.

Mrs Beanbag · « **Reply #474 on:** January 21, 2013, 07:53:28 PM »

Quote from: psxphill;723460

It wouldn't help at all when the branch follows the test, because you're going to have to flush all the following instructions from the pipeline. If you're going to remove the pipeline completely or a significant number of stages then you'll have a huge number of instructions taking multiple cycles and the overhead of incorrectly predicted branches is going to be so insignificant that it won't be worth doing.

The following instructions wouldn't be in the pipeline yet, at the point you make the prediction, that's the whole point, to avoid having to flush the pipeline when you get to the branch.

I honestly don't know what you mean here. When you say "when the branch follows the test", when would the branch ever not follow the test? There wouldn't be much point doing a test and then not having a conditional branch after it.

I wonder if you understood my idea properly, so I'll try explaining it again. The instruction stream is read into a FIFO (which I believe is a fairly normal thing to do) and as soon as a test followed by a branch is read in, it can do the test immediately (which is a very simple operation) and predict the branch based on that. So as long as the register doesn't change by the time the branch instruction comes out of the other end of the FIFO the branch will have been predicted correctly.

billt · « **Reply #475 on:** January 21, 2013, 08:30:08 PM »

Quote from: Mrs Beanbag;723463

The following instructions wouldn't be in the pipeline yet, at the point you make the prediction, that's the whole point, to avoid having to flush the pipeline when you get to the branch.

What exactly happens when you have to flush? I imagine it being a mux at the opcode register at each stage of the pipeline, and if that stage gets flushed then flip the mux to bring in a nop rather than the opcode from the previous stage on the next clock edge. Then as this now-a-NOP propogates down the pipeline, whatever other things are on other stage control regs such as register file addresses, ALU input selects, bypass opportunities, etc. just get ignored. I don't think the flush really needs to be particularly time consuming. yea, those NOPs need to propogate out, but that's really more an observation of new instructions propogating in, and that's going to happen either way.

Or are there better ways of doing this?

Quote

I wonder if you understood my idea properly, so I'll try explaining it again. The instruction stream is read into a FIFO (which I believe is a fairly normal thing to do) and as soon as a test followed by a branch is read in, it can do the test immediately (which is a very simple operation) and predict the branch based on that. So as long as the register doesn't change by the time the branch instruction comes out of the other end of the FIFO the branch will have been predicted correctly.

The test may not always be able to be done immediately. Might it not depend on the writeback of an instruction ahead of it but still in the pipeline and not yet finished? You may not yet have the right thing there to test just yet. Such as decrementing a loop counter might be right ahead of the test for 0...

Mrs Beanbag · « **Reply #476 on:** January 21, 2013, 08:48:14 PM »

Quote from: billt;723471

The test may not always be able to be done immediately. Might it not depend on the writeback of an instruction ahead of it but still in the pipeline and not yet finished? You may not yet have the right thing there to test just yet. Such as decrementing a loop counter might be right ahead of the test for 0...

Yes it would be a prediction, the prediction isn't always necessarily right, but as long as it's right more than 50% of the time it will help.

In the case of a loop, even if the decrement is right before the branch, the prediction will be right up until the very last iteration.

It would also be possible, in many cases, for a coder or compiler to optimise for it by re-ordering the instructions.

matthey · « **Reply #477 on:** January 21, 2013, 10:10:56 PM »

Quote from: Mrs Beanbag;723472

Yes it would be a prediction, the prediction isn't always necessarily right, but as long as it's right more than 50% of the time it will help.

Not necessarily. The default BTFN (backward taken forward not taken) logic is ~65% correct and doesn't slow down loops with miss predictions. The 68060 2 bit saturation prediction is good for ~90% prediction accuracy. The x86 can have branch prediction up to 95% accurate and can even predict patterns, but the logic needed is large and the prediction is a little slower which is bad for tight loops (some have other optimizations for tight loops).

psxphill · « **Reply #478 on:** January 21, 2013, 11:21:14 PM »

Quote from: Mrs Beanbag;723463

and as soon as a test followed by a branch is read in, it can do the test immediately (which is a very simple operation) and predict the branch based on that. So as long as the register doesn't change by the time the branch instruction comes out of the other end of the FIFO the branch will have been predicted correctly.

What you're suggesting will break I/O, which is the major use of TST. You can only perform the read once & you can't do the read until all the registers are correct, or you could be reading from anywhere. You can't change the order of memory accesses, you'll have to wait until any instructions that access memory have been run.

You also can't run the EA Fetch for an instruction after a branch in the pipeline, until you've resolved whether it's going to branch or not. I am assuming the 68060 pipeline length enforces that, I haven't checked it out too carefully.

Mrs Beanbag · « **Reply #479 on:** January 22, 2013, 06:28:52 PM »

Quote from: psxphill;723488

What you're suggesting will break I/O, which is the major use of TST. You can only perform the read once & you can't do the read until all the registers are correct, or you could be reading from anywhere.

Good point. I was only thinking of tests on registers.

Author Topic: Motorola 68060 FPGA replacement module (idea) (Read 187065 times)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

bloodline

Re: Motorola 68060 FPGA replacement module (idea)

matthey

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

psxphill

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

matthey

Re: Motorola 68060 FPGA replacement module (idea)

ChaosLord

Re: Motorola 68060 FPGA replacement module (idea)

psxphill

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

billt

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)

matthey

Re: Motorola 68060 FPGA replacement module (idea)

psxphill

Re: Motorola 68060 FPGA replacement module (idea)

Mrs Beanbag

Re: Motorola 68060 FPGA replacement module (idea)