Welcome, Guest. Please login or register.

Author Topic: Motorola 68060 FPGA replacement module (idea)  (Read 188153 times)

Description:

0 Members and 16 Guests are viewing this topic.

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #404 from previous page: January 18, 2013, 06:32:32 PM »
Quote from: psxphill;723054
Motorola removed some of the instructions added to the 020 and some of the FPU instructions to save space, that could be used for making it run quicker.
 
By only supporting the 060 instructions then you've saved space in the FPGA and the time taken to implement them.

For the most part, the 68060 chose good instructions to remove from hardware. One big exception is the integer 32x32=64. This was already used by compilers to turn a divide by a constant into a multiply saving a huge number of cycles.

The .library should go in flash so it's available very early for bootable games.

Quote from: psxphill;723068
page 3-1
 
http://cache.freescale.com/files/32bit/doc/ref_manual/MC68060UM.pdf
 
The first 4 stages are for fetching and assigning the instruction to an integer unit. The next 4 stages are the dual integer unit, then the last two stages are completing the instructions.
 
It's quite a simple design.

I wouldn't say it's simple although it may be compared to some modern processor designs (e.g. x86). There is an instruction buffer in between the pipeline stages that is very costly (muxes) on an fpga. There is also a translation from 16 bit variable length CISC to a fixed length 16 bit RISC in there. I don't think Motorola released the encoding format of their internal fixed length RISC making it difficult to duplicate. There is 6 bytes of data with each 16 bit fixed length RISC word and I don't know if, for example, a MOVEA.W #,A0 immediate is extended when decoding or in the OEP. I believe the instruction becomes pOEP only if there is >6 bytes of data from extension words but what if there is more than 12 bytes of extension word data (up to 18 bytes is possible)? If you think this is all simple, I volunteer you to do the VHDL programming of the replacement 68060 :P.

Quote from: psxphill;723068
It doesn't evenly distribute instructions between integer pipelines, it only uses the second integer pipeline when the first is running an instruction that can be run at the same time. Whether it can will depend on the instruction as not all can even be run on the second pipeline and the registers involved. If the instruction in the primary pipeline changes a register used in the next instruction then the next instruction also has to be put on the primary pipeline.

Also, in some cases the OEPs are locked together to process an instruction together.

Quote from: psxphill;723068
I don't know if the pipelines will get starved if you're continuously using both integer pipelines for instructions that only take 1 clock cycle to execute. It's not something that you can achieve in real world examples, however as a 32bit value can contain two instructions then it might be possible. There isn't much explained as to how this works though. They do say it's "capable of sustained execution rates of < 1 machine cycle per instruction of the M68000 instruction set". But if it could sustain 2 instructions per machine cycle then I would have thought they would have claimed that.

Long instructions (lots of extension words) are more of a problem than 1 cycle instructions for fetch starvation. The 68060 doesn't have a low fetch bottleneck with most 68020 code because it's short (the 020/030 has a serious fetch bottleneck). A 68060 fetch bottleneck can be seen in artificial tests. Gunnar did some continuous work in a mini bench test program he made (on the Natami forum) that used longword immediates continuously which did show a substantial slowdown (1/4-1/3 slowdown as I recall). The 68060 needs longword data to be efficient but can slow down fetching it very often. Most longword immediates are <16 bits and extending data is low overhead even in fpga (ARM uses shift which is high overhead in fpga). This is how MOVEA.W #,An and ADDA.W #,An work already. The same could be done for data registers also, as we found, which would be even more common. Also, adding MVS and MVZ would have helped.

Quote from: psxphill;723068
The branch executing in zero cycles doesn't seem to be very well documented. I can't tell whether they are over-exaggerating what it does or not. My original thought was that the branch is in the primary pipeline and the secondary pipeline has the target or next instruction (depending on what is predicted). This doesn't actually cause it to execute in 0 cycles when looking at the pipeline as a whole, but when looking at the branch on it's own it does have a 0 cycle overhead.
 
What is odd is that they claim different for predicted correctly taken and predicted correctly not taken

Different timing for predicted correctly taken and predicted correctly not taken is normal with a pipelined processor. Branches predicted backward with the branch target in the branch cache are effectively 0 cycles for loops which is awesome as loop unrolling is mostly not needed improving code density. Branches that fall through eat a cycle in the pOEP but a sOEP instruction can execute simultaneously if available (also awesome). Note that the branch unit is a separate unit that can do processing in parallel and that the branch target must be in the branch cache to get the 0 cycle branch taken. That means there is usually some additional overhead the first time executing code. I believe the 68060 does some kind of instruction folding/fusing of the branch with CMP/TST/SUBQ in order to make the 0 cycle branches happen. Very few modern processors have effectively free branches. Jens and Gunnar (Natami) didn't even have all the magic figured out. Joe Circello and the 68060 team had this all figured out back in the 90s and the Motorola marketing guys killed it for PPC. Pencil pusher power!

Quote from: psxphill;723068
So it would imply that the branch doesn't hit the execute stage of the pipeline, but then the document goes on to say it does.

I think the branch instruction does go through the pOEP. The branch unit looks at it very early, makes a prediction and starts speculative execution. The pOEP still has to verify that the prediction is correct at execution time or flush the pipe and continue executing the other branch path.

Quote from: Mrs Beanbag;723058
The only 68020 features I ever used are longword multiplies and divides, and scale factors on indexed addressing modes.

No EXTB.L or TST.W/L An? No misaligned reads or writes? The misaligned reads and writes are a huge saver when not sure of the alignment. Compilers often can't guess the alignment so they bloat up the code and slow down the CPU to align the data before reading or writing.

The 68020+ has some other niceties but they are more advanced.

Quote from: ChaosLord;723082
And Branches >128 bytes :angel:

I think you mean Bcc.L and BSR.L. Branches up to 16 bit were supported on the 68000. The longword branches are big savers but only on fairly large programs. Not too many assembler programmers create programs >65k.

Quote from: freqmax;723088
I presume 16-bit branching is the same as that if a certain flag is set then one can conditionally jump 65536 memory positions?

It's signed so plus or minus ~32k.

Quote from: freqmax;723088
I have some memory that x86 is limited to 128 position limit on branching? or perhaps it's 6502 ;)
How about ARM?

x86 branches are so screwed up with the early segmentation crap that you really have to define which x86 ISA and then don't ask me. The ARM 32 bit ISA is better but still has some limitations as I recall. I believe it only allow 24 bit addressing, too. It's quite old but the 68k was one of the first to have full 32 bit position independent code done right. An assembly programmer doesn't have to worry about the size with a modern optimizing assembler like vasm. It will automatically generate the most efficient encoding (for more than branching as 68020+ allows) including forward and backward branch optimization. The 68020+ enhancements removed a lot of limitations and can be used or optimized transparently which is great. They should have left the double memory indirect modes away though.
« Last Edit: January 18, 2013, 06:47:33 PM by matthey »
 

Offline psxphill

Re: Motorola 68060 FPGA replacement module (idea)
« Reply #405 on: January 18, 2013, 06:49:42 PM »
Quote from: matthey;723091
There is also a translation from 16 bit variable length CISC to a fixed length 16 bit RISC in there. I don't think Motorola released the encoding format of their internal fixed length RISC making it difficult to duplicate.

Is there any evidence to show they remap the opcodes at all? They might just store each opcode +operands within the fixed width fifo.
 
Maybe the early decode just figures out how long each instruction is and whether the next instruction is valid to go in the secondary pipeline.
 

Offline Mrs Beanbag

  • Sr. Member
  • ****
  • Join Date: Sep 2011
  • Posts: 455
    • Show only replies by Mrs Beanbag
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #406 on: January 18, 2013, 06:51:08 PM »
Quote from: matthey;723091
No EXTB.L or TST.W/L An? No misaligned reads or writes?
Ah, you got me. I do use EXTB.L, on occasion. Although I could easily do without.

I can't honestly say if I use TST.L An or not, off the top of my head. Pretty sure I never do TST.W An though, can't think of much use for that.

I'm actually pretty careful not to do misaligned access, it just seems wrong, somehow. Just because you can, doesn't mean you should!
Signature intentionally left blank
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #407 on: January 18, 2013, 07:29:46 PM »
@matthey, Nice insight! ;)

Do you think it's feasable to create something that can get near 50 MHz 68060 in FPGA ?

I was thinking on Intel 80386 ISA in protected mode (kernel and user) regarding branching. As for 8086 and segments.. yuck ;)

Quote from: psxphill;723092
Is there any evidence to show they remap the opcodes at all? They might just store each opcode +operands within the fixed width fifo.


Perhaps another reverse engineering approach is to figure out from other parts what you need to make your duplicate to work.
A 68060 functionally duplicate won't have to be designed the same way. Just interact with software code in way that the original programmer intended.

Quote from: psxphill;723092
Maybe the early decode just figures out how long each instruction is and whether the next instruction is valid to go in the secondary pipeline.


So there is a a kind of selection process such that instructions that doesn't depend on sequent instructions could be done in parallel while the rest is single pipeline?


Btw, Is there any ISA that is neater and more straightforward than m68k? ;)
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #408 on: January 18, 2013, 07:32:49 PM »
Quote from: psxphill;723092
Is there any evidence to show they remap the opcodes at all? They might just store each opcode +operands within the fixed width fifo.


No, but logically something has to happen to 32 bit instructions to fit in 16 bits. It's possible (even logical) that the 2nd word of a 32 bit instruction becomes part of the 6 bytes of "data" per OEP. I don't know if that is possible for all 32 bit instructions though.

Quote from: Mrs Beanbag;723093

I can't honestly say if I use TST.L An or not, off the top of my head. Pretty sure I never do TST.W An though, can't think of much use for that.


You never do:

   movea.l myptr,a0
   tst.l a0
   beq .nullptr

or

   jsr (-$xxx,a6)
   movea.l d0,a0
   tst.l a0
   beq .nullptr

Of course the latter is better sometimes:

   jsr (-$xxx,a6)
   tst.l d0
   movea.l d0,a0
   beq .nullptr

Some 68k processors could reduce the branch overhead on this one on a piplelined CPU although be careful that the a0 is not an input to an EA calculation right after the branch or the first option was better.

Not many have used TST.W An but be careful. It actually operates on a word and not a longword as many OPA.W instructions do. Vasm and PhxAss were doing an optimization to TST.W which was wrong for many years until recently found and fixed. Most of the time it would not cause a problem but could lead to very rare random crashes.

Quote from: Mrs Beanbag;723093

I'm actually pretty careful not to do misaligned access, it just seems wrong, somehow. Just because you can, doesn't mean you should!


Good. Treat is like credit. Don't use it when you don't need it and don't abuse it when you do need it.
 

Offline Mrs Beanbag

  • Sr. Member
  • ****
  • Join Date: Sep 2011
  • Posts: 455
    • Show only replies by Mrs Beanbag
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #409 on: January 18, 2013, 08:04:45 PM »
I do such things like:

move.l myptr(PC),D0
beq .nullptr
move.l D0,A0

in the 2nd example could always use tst.l D0 anyway.

flags are set for free when moving to an address register. Also note the first line, I always write relocatable code.

In other news, I've been thinking about a RISC instruction set for internal use in a 68k core for some time. I think we can identify a few obvious simplifications:
1. tread An and Dn identically (use extra instructions if different behaviour is required)
2. only MOVE can use as either source or destination operand (load/store architecture)
3. all other instructions register-register, or "quick" short-constant source operands
4. spare "temporary" registers for internal use.
we could map 68k instructions to short sequences of internal instructions, and design those instructions to give the shortest sequences.
Signature intentionally left blank
 

Offline psxphill

Re: Motorola 68060 FPGA replacement module (idea)
« Reply #410 on: January 18, 2013, 08:15:51 PM »
Quote from: freqmax;723099
So there is a a kind of selection process such that instructions that doesn't depend on sequent instructions could be done in parallel while the rest is single pipeline?

Yeah, you can't execute in parallel if the first instruction modifies a register that the second uses: for example
 
MOVEQ #0,D0
TST.W D0
 
The secondary pipeline can't execute all instructions either. Floating point instructions can only be dispatched from the primary pipeline for instance.
 
From the description it would seem that it checks in the DS stage whether it can be executed in parallel, which implies there is one fifo for both execution units. I'd have thought that would make it tricker than a fifo for each pipeline, but the documentation is what you'd have to go on for a pure clone.
 
The manual is largely vague on the FIFO:
 
"The instruction is pre-decoded for pipeline control information"
 
"The MC68060 variable-length instruction system is internally decoded into a fixed-length representation and channeled into an instruction buffer.


 
There are 96 bytes for the FIFO. Someone claims it's 16 entries of 6 bytes each, but the longest instruction is 10 bytes and there is no way you're going to squeeze an MOVE $10000,$20000 instruction into 6 bytes. It's more likely to be 6 entries of 16 bytes or 4 entries of 24 bytes. I can't find anything that suggests that instructions are split into multiple "micro ops", like Intel does.
 
The 68060 cannot execute out of order and doesn't do anything complex like register renaming that Intel did on the Pentium pro. It really is the simplest design for dual issue that you can possibly do.
 
There is no reason why you have to 100% duplicate the functionality exactly. However if there is documentation available then it might make sense to do it the same as they probably spent a while designing it, so it's probably good.
« Last Edit: January 18, 2013, 08:41:23 PM by psxphill »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #411 on: January 18, 2013, 08:28:09 PM »
Quote from: freqmax;723099

Do you think it's feasable to create something that can get near 50 MHz 68060 in FPGA?


In an affordable fpga? Yes, but I think some different techniques would be better in an fpga than used on the 68060. It is probably easier to make a non-superscaler (more 68040 like) CPU that is clocked higher at first. It should be possible to achieve 100MHz+ in a sufficiently pipelined 68020+ processor. A Link stack, more code fusing/folding and new instructions could make up for some of the disadvantages of the fpga processor vs the 68060. You will probably not get to 68060@100MHz performance until fpga's get cheaper. A CPU, FPU and MMU in fpga will probably push the logic capacity of affordable fpga's also.

Quote from: freqmax;723099

So there is a a kind of selection process such that instructions that doesn't depend on sequent instructions could be done in parallel while the rest is single pipeline?


The selection process is described in the MC68060UM Section 10 "Instruction Execution Timing".

Quote from: freqmax;723099

Btw, Is there any ISA that is neater and more straightforward than m68k? ;)


Yes. There are simpler ISAs but most are less powerful. Motorola/Freescale have liked the simple clean ISAs favoring RISC since the 68k. The 88k is the 68k RISC replacement before being abandoned for PPC. It's a simple and clean classic RISC but a little weak compared to the 68k. The 96k DSP is an interesting RISC/CISC hybrid borrowing much from the 68k that is quite powerful and fairly clean but more difficult to use. The ColdFire was an attempt to simplify the 68k but in doing so made it inconsistent (more difficult to program but still relatively easy) and less powerful even though some late enhancements brought some of the power back. The MCORE is a 16 bit fixed length (very simple) RISC that was meant to compete with ARM. It competed in power efficiency but it has to be one of the weakest modern 32 bit processors I have ever seen. It looks straight forward to program but looks very tedious. Note that the PPC is not a Motorola/Freescale design. It is not very simple for a RISC (but fairly consistent), not easy to program and is very powerful.
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #412 on: January 18, 2013, 08:29:51 PM »
I suspect there is no documentation except the usual datasheet ;)

Perhaps someone could interview some of the original engineers?

What's a "Link stack" ..?

Have you looked at the Actel FPGAs?, they are way faster than any competitor last time I checked. Of course they are slightly more expensive.

As for ISA, my thinking were if the ISA of ARM, Transmeta, PDP-11, MIPS, Sparc, DEC Alpha, PA-RISC, etc is easier to deal with. Without sacrificing performance.
« Last Edit: January 18, 2013, 08:41:02 PM by freqmax »
 

Offline psxphill

Re: Motorola 68060 FPGA replacement module (idea)
« Reply #413 on: January 18, 2013, 09:00:57 PM »
Quote from: freqmax;723104
What's a "Link stack" ..?

It might be what the later coldfire has, which is an on chip stack which allows the target of an rts instruction to be predicted.
 
As well as writing to the a7 stack, the jsr stores the program counter in the on chip stack. When the rts instruction is fetched it assumes the next instruction is the value off the on chip stack. If when it executes the value is different then it flushes the pipeline.
 
At the moment the rts basically blocks the pipeline until it executes, which is why it's such a slow instruction.
 
It's only a four entry stack though and as I don't think it can support re-fetching lower return values, then it's probably not that great apart from simple subroutines being called from a loop.
 
An 060 MMU in an FPGA isn't going to take a huge amount of space. An FPU on the other hand probably will.
 
In the manual it says it transfers 16 bits for the opcode and 2 x 32 bit operands from the fifo, which is 10 bytes and sounds pretty much like the 68000 instruction set converted to fixed length. So it might be 8 instructions of 12 bytes each, with 2 bytes used for "pipeline control". For whatever this means in the manual: "The instruction is pre-decoded for pipeline control information"
« Last Edit: January 18, 2013, 09:13:12 PM by psxphill »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #414 on: January 18, 2013, 09:24:22 PM »
Quote from: Mrs Beanbag;723101
I do such things like:

move.l myptr(PC),D0
beq .nullptr
move.l D0,A0

in the 2nd example could always use tst.l D0 anyway.


68000 style ;). Needs a scratch data register but it's usually available. TST.L An is not needed here but there are some places it's useful.

Quote from: Mrs Beanbag;723101

flags are set for free when moving to an address register. Also note the first line, I always write relocatable code.


Mind swap on the address register :). I let the assembler do the PC relative and the MOVE.L ,An -> MOVEA.L ,An even though I didn't for clarity in my examples.

Quote from: Mrs Beanbag;723101

In other news, I've been thinking about a RISC instruction set for internal use in a 68k core for some time. I think we can identify a few obvious simplifications:
1. tread An and Dn identically (use extra instructions if different behaviour is required)


That's nice for simplification but not good for code density. Are you looking at a fixed 16 bit or 32 bit RISC encoding?

Quote from: Mrs Beanbag;723101

2. only MOVE can use as either source or destination operand (load/store architecture)


Ok, but now you have to divide up CISC instructions into multiple RISC instructions. Your instruction stream just grew big time.

Quote from: Mrs Beanbag;723101

3. all other instructions register-register, or "quick" short-constant source operands

4. spare "temporary" registers for internal use.
we could map 68k instructions to short sequences of internal instructions, and design those instructions to give the shortest sequences.


http://en.wikipedia.org/wiki/Microcode

I have heard a rumor that as much as 1/3 of the 68060 is microcode. It's generally slower though. The 68060 bit field instructions are a good example. They can be done in 1-3 cycles (data in cache) on an fgga but they take 2x-3x that long on the 68060.


Quote from: psxphill;723102
Yeah, you can't execute in parallel if the first instruction modifies a register that the second uses: for example
 
MOVEQ #0,D0
TST.W D0


Actually, this may work in parallel. Some very simple instructions are retired early and the longword (only) result made available early. This is not specifically stated but the result is made available early from these types of instructions for change/use stalls and are probably also available early for the other OEP although it's not specifically stated. These early retirement instructions include:

   lea
   move.l #,Rn
   moveq
   clr.l Dn
 

Offline Mrs Beanbag

  • Sr. Member
  • ****
  • Join Date: Sep 2011
  • Posts: 455
    • Show only replies by Mrs Beanbag
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #415 on: January 18, 2013, 09:40:31 PM »
Quote from: matthey;723118
Mind swap on the address register :).
Oops I meant data register.

Quote
That's nice for simplification but not good for code density. Are you looking at a fixed 16 bit or 32 bit RISC encoding?
Code density doesn't matter here as it would only used internally, external 68k code translated into internal code in some kind of buffer. Fixed length but the number of bits could be anything, it's not actually stored in the RAM so it doesn't even need to be 16 or 32.

Quote
I have heard a rumor that as much as 1/3 of the 68060 is microcode. It's generally slower though. The 68060 bit field instructions are a good example. They can be done in 1-3 cycles (data in cache) on an fgga but they take 2x-3x that long on the 68060.
I would rather optimise for 68000 instructions and provide the rest just for compatibility. How common are the bitfield instructions in real code? I never use them.

Of course see what fits on an FPGA first and maybe we can add more bits in later.
Signature intentionally left blank
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #416 on: January 18, 2013, 10:12:34 PM »
Actually a default model could be to provide just a few instructions and have the rest as trapped instructions. That means one has something workable fast. Then one could make the architecture correct. And then add the full instruction set.

If one start with the instructions and then try to impose the correct architecture.. well it could be messy ;)
 

Offline billt

  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 910
    • Show only replies by billt
    • http://www.billtoner.net
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #417 on: January 18, 2013, 10:19:41 PM »
Quote from: freqmax;723130
Actually a default model could be to provide just a few instructions and have the rest as trapped instructions. That means one has something workable fast. Then one could make the architecture correct. And then add the full instruction set.

If one start with the instructions and then try to impose the correct architecture.. well it could be messy ;)


That's a big part of why "they" moved away from hardwired control units in favor of microcoded control units. My own education thus far was about hardwired style, which is very dependent on the instruction set. I was hopin gto take the advanced followup course now, but it wasn't on the schedule. I'm trying to go through the Coursera one now, which is pretty advanced. Not sure if they explain microcoding or if that assumes you already know it. Going to try and find some time to read up on it more regardless.
Bill T
All Glory to the Hypnotoad!
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Motorola 68060 FPGA replacement module (idea)
« Reply #418 on: January 18, 2013, 10:21:28 PM »
Quote from: freqmax;723104

What's a "Link stack" ..?


psxphill got it although the link stack can be different sizes. It should make RTS 2 cycles instead of 7 cycles on the 68060.

Quote from: freqmax;723104

Have you looked at the Actel FPGAs?, they are way faster than any competitor last time I checked. Of course they are slightly more expensive.


No. I have only heard. I haven't played around with any fpga's although I have looked at some VHDL code for a 68k CPU or 2 :).

Quote from: freqmax;723104

As for ISA, my thinking were if the ISA of ARM, Transmeta, PDP-11, MIPS, Sparc, DEC Alpha, PA-RISC, etc is easier to deal with. Without sacrificing performance.


I don't think that any RISC processors are going to be as easy as the 68k. ARM probably comes the closest and MIPS is also logical and usable in assembler from my limited exposure. They both look way easier than PPC despite PPC having as many instructions as many CISC processors. PPC is as bad about using acronyms as the U.S. military.

The PDP-11 should have been very easy to program, possibly easier than the 68k. The performance would be limited by the encodings but it would be interesting to see someone try to implement a modern version in fpga. The instructions are powerful but would probably require a lot of microcode above a RISC core. It's too bad that students will probably not be able to see how easy to program a processor can be. Even the 68k is all but dead.

Quote from: Mrs Beanbag;723123

I would rather optimise for 68000 instructions and provide the rest just for compatibility. How common are the bitfield instructions in real code? I never use them.


It varies. Most old code doesn't use them much but GCC started using them heavily from about GCC 3.x on, even when the timing for them was slower. It's often faster not to use them on the 68060 because it can do a shift and and in the same cycle. They are often worthwhile on the 68020-68040 and are good for code density and fairly intuitive. They have 32 bit results which is good for 32 bit register forwarding and make efficient use of registers. They are very useful for processing streams of data in memory (with caches) which the register memory architecture of the 68k can do well. The only draw back is a little bit more complexity than the average instruction. If they were fast, they would be used a lot more. Implementing them would help the performance of GCC where trapping them would slow these newer GCC compiled programs to a crawl. You get faster smaller programs with and slower bigger programs without. It's a not so tough choice for me.
 

Offline psxphill

Re: Motorola 68060 FPGA replacement module (idea)
« Reply #419 on: January 18, 2013, 11:26:40 PM »
Quote from: matthey;723118
Actually, this may work in parallel. Some very simple instructions are retired early and the longword (only) result made available early. This is not specifically stated but the result is made available early from these types of instructions for change/use stalls and are probably also available early for the other OEP although it's not specifically stated. These early retirement instructions include:
 
lea
move.l #,Rn
moveq
clr.l Dn

The list is unclear, I'm assuming the first 4 are the primary and the last 2 are secondary.
 
"Certain instructions have been optimized to ensure no change/use stall occurs on
subsequent instructions. The destination register of the following instructions is available
for subsequent instructions:
lea
mov.l&imm,Rn
movq
clr.lDn,
any op(An)+
any op–(An)
as a base register for address calculation with no stall, or as an index register for
address calculation with no stall, if Xi.l*{1,4}. If the index register used is Xi.l*2, Xi.l*8,
or Xi.w, then the previously described 3 cycle stall occurs."

It doesn't have to retire it early, the second pipeline could look in the primary pipeline. Mips has a similar handling for lwl/lwr opcodes, it pulls the register value from the pipeline and stops the register being updated at all. The register doesn't actually get updated until you stop executing lwl/lwr opcodes.
 
This one is also vague:
 
"The MC68060 provides another change/use optimization for a commonly encountered
construct—when an address register is loaded from memory and then used in an operand
address calculation, the OEP experiences a one cycle stall.
 
mov.l,An
"
 
I guess they both enter the pipelines at the same time, the primary goes through ea fetch and then on the next clock the secondary goes through ea fetch. I'm assuming that the ea on the second register is literally ea and not adjusted by an immediate or register. It can't advance the pipelines, it must change the state it's in. The primary pipeline might be translated into a move.l immediate once the value is available, to make the short circuiting common.
 
Quote from: freqmax;723130
If one start with the instructions and then try to impose the correct architecture.. well it could be messy ;)

With any processor emulation, it's always worth starting small and bringing it up an instruction at a time until you know you're on the right track.
 
[/FONT][/SIZE][/FONT]
« Last Edit: January 19, 2013, 12:15:06 AM by psxphill »