Welcome, Guest. Please login or register.

Author Topic: Full 68060 implementation?  (Read 8411 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #14 on: December 13, 2012, 06:25:27 PM »
Quote from: billt;718885

Though I myself would not try to do an exact 68060 clone. The 060 has some instructions removed that were present in 68040 for example. If I were to make an FPGA 680x0, I'd put them all back in, and avoid trapping to software emulation.


It's not necessary to put all the trapped instructions back in. It made sense to get rid of CAS2, CHK2, CMP2 and MOVEP. Getting rid of the integer 64 bit result MULx was a mistake as it's used commonly by compilers to do an invert and multiply by a constant instead of a divide by a constant. It wouldn't have been so bad if they would at least have defined and allowed a MULx where Sz=1 (64 bit result) and Dl (result register low) = Dh (result register high) giving the upper 32 bits of the result (like PPC MULH). This is worth while to define even with 64 bit results as it saves trashing a register when the lower 32 bits are not needed. See MULS and MULU here:

http://www.heywheel.com/matthey/Amiga/68kF_PRM.pdf

The integer 64 bit result DIVx is used much less commonly but it is much slower to do in software using the shift method. MOVEP is used in older Amiga software (mostly games) but it is poorly encoded, has limited usefulness and most patched games have already removed them. CAS2 and CHK2 are uncommon supervisor instructions. CMP2 is user mode but has limited usefulness as designed and it's very rare. WHDload mentions only 1 known game and 1 demo as I recall. It would be better to install the 68060.library from flash or kickstart so that traps are never a problem. Some instructions and addressing modes are better trapped and simplifying gives gains elsewhere. Notice that I removed (for trapping) the double indirect addressing modes that used the outer displacement in the 68kF ISA pdf above as there was little advantage (only useful when no free registers) and simplifies the decoder. If all remaining full extension word format addressing modes could be 1 cycle faster then it would be worth it.

The SWAP instruction in the 68060 should have worked in both integer units which was an oversight. The result is longword for forwarding and is common. The bitfield instructions should have been made faster which is possible and they have 32 bit results for forwarding. Of course bigger caches, a link stack and instruction combining like the ColdFire has would be done at minimum if modernizing the 68060.

Quote from: billt;718885

I just did a very small microprocessor design for a Computer Architecture class that finished last night. Very small as in a total of 6 instructions, including load, store, and two kinds of branching, leaving only two ALU operations, add and subtract of BCD numbers. 8bit instruction and 16bit memory address bus, and four registers. But it was nice to learn how this stuff works and the fundamentals of how to approach designing a processor. I actually wrote some assembly language code for what I'd want such a terribly simple thing to do before doing the logic design. You break down the instructions into stages (not pipeline stages, but individual steps taken to complete a single instruction at a time), and then you can extract your hardware logic design based on that. I found it very interesting, and I was surprised at how complicated it was NOT, even for my uselessly simple thing. Sure, a more complete and useful design like a 680x0 will be bigger and more complex than this, but it's not absurdly complicated as I would have imagined previously. That said, it would still be a great deal of work. Could be fun for someone with the time.


Sounds like fun :). It's not rocket science and it's very logical but I imagine complex and time consuming when doing more advanced design and coding.

Quote from: billt;718885

TG and Yaqube have taken some steps in improving the TG68 sortof in that direction. I thought I'd seen something about the Suska guy taking his 68000 core up to 68020 or 030 but I didn't see anything available last I checked. I think the aoocs guy has a 68000 core as well, not sure what his plans for the future of that are. There's also closed-source Natami CPU, but I'm not sure what's happening with that anymore. But they are things you can look at for inspiration.


68020+ support should be the minimum for the Amiga IMO. It makes programming much easier than the 68000 while offering significantly better speed and code density. The Natami CPU (N68050) is not finished and was only partially supporting the 68020 last I understood but it is fairly advanced as far as cache and pipeline design. I don't think Jens is working on it much if any anymore. He has talked about making it open source though. Gunnar is working on a soft CPU based on it but he is not very reliable. He claims to be experimenting with a 200MHz softcore although he increased the pipeline length significantly in order to do it. This increases branch penalties and can cause other stalls much like a highly clocked DSP, GPU or x86 CPU. Even if I could believe him, it's experimental at best and Gunnar has a history of not completing much.
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #15 on: December 13, 2012, 06:27:20 PM »
I hereby appoint billt to dezine us a superfast 680x0 cpu.  We can call it the 68070.

You are hired. :D
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #16 on: December 13, 2012, 06:51:43 PM »
Quote from: matthey;718899
Getting rid of the integer 64 bit result MULx was a mistake as it's used commonly by compilers to do an invert and multiply by a constant instead of a divide by a constant. It wouldn't have been so bad if they would at least have defined and allowed a MULx where Sz=1 (64 bit result) and Dl (result register low) = Dh (result register high) giving the upper 32 bits of the result (like PPC MULH). This is worth while to define even with 64 bit results as it saves trashing a register when the lower 32 bits are not needed. See MULS and MULU here:

http://www.heywheel.com/matthey/Amiga/68kF_PRM.pdf


Philosophically I agree with u.

But from an engineering standpoint I simply don't know how to implement it.
If it can't be done as fast as the other multiplies then u hafta stop the whole pipeline from moving while u spend 2-4 cycles to complete the instruction.

Ok now that I think about it there MUST be a mechanism for stopping the pipleine because all those weirdo addressing modes in multiword instructions take multiple cycles to complete.

So ok, I agree it was a total mistake to take that out.

Maybe the problem was that its bitpattern overlapped the bitpattern of a normal multiply?  I donno.  But if that is what messed them up they could have invented a NEW completely different bitpattern for large multiplies.



Quote

It would be better to install the 68060.library from flash or kickstart so that traps are never a problem.

+2


Quote

 Some instructions and addressing modes are better trapped and simplifying gives gains elsewhere. Notice that I removed (for trapping) the double indirect addressing modes that used the outer displacement in the 68kF ISA pdf above as there was little advantage (only useful when no free registers) and simplifies the decoder. If all remaining full extension word format addressing modes could be 1 cycle faster then it would be worth it.

I used to tell Gunnar all the time that its ok to trap those weirdo hypercomplicated addressing modes.  But he said they worked out an optimized way for the address unit to handle it without needing to trap.

Quote

The SWAP instruction in the 68060 should have worked in both integer units which was an oversight.

I wonder if they had a reason?



Quote

68020+ support should be the minimum for the Amiga IMO. It makes programming much easier than the 68000 while offering significantly better speed and code density.

+1

Quote

 The Natami CPU (N68050) is not finished and was only partially supporting the 68020 last I understood but it is fairly advanced as far as cache and pipeline design. I don't think Jens is working on it much if any anymore.

:(((((


Quote

 He has talked about making it open source though.

Dear God I hope so.

They cooked up really kewl trix that can make a really fast 680x0 cpu!

Making a 680x0 softcore is all about how many clever tricks you can come up with and combine them all together to make something fast.

Without clever optimizations u won't get 68060 speeds using affordable FPGA chips.




Quote

 Gunnar is working on a soft CPU based on it but he is not very reliable. He claims to be experimenting with a 200MHz softcore although he increased the pipeline length significantly in order to do it. This increases branch penalties and can cause other stalls much like a highly clocked DSP, GPU or x86 CPU. Even if I could believe him, it's experimental at best and Gunnar has a history of not completing much.

If someone would give him some medication to make him calm down and just write code without attacking everybody all the time then he could write good code and finish everything.  He could start by trying Lorazepam.  If that doesn't work for him then he could try something else.

p.s  If someone would clone me then my other me would very happily spend 20 hours a day for 2 years to cook up an awesome 68070.  With Jens, Matt, Phil and some other asm guys helping I know we could do it.   Sadly I am only 1 person and I just can't devote that kind of time this project. :(

Once I start trying to solve a puzzle I get obsessed and can't stop.  So I must stay away from puzzles that I know are large and complex since I have other responsibilities in my life that I must tend to. :(
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline alexh

  • Hero Member
  • *****
  • Join Date: Apr 2005
  • Posts: 3644
    • Show only replies by alexh
    • http://thalion.atari.org
Re: Full 68060 implementation?
« Reply #17 on: December 13, 2012, 07:08:03 PM »
IMO, today you'd never get an FPGA which is big enough and fast enough and cheap enough to rival a real MC68060RC50 chip.

You'd either have to do an FPGA-Arcade philosophy (System on a chip, full system recreations) where in the long term you could charge much more.

Or wait 5-10 years for technology to catch up.

That doesn't mean that as a community you couldn't start now. Converting the specifications into HDL for logic synthesis. Working on the cache, MMU, FPU, branch prediction and super scalar architecture. Using the free tools to targeting some of today's FPGA's and see what target frequencies you can reach and area you would need in the future.

I mean you can already get fully open source HDL for 68000, 8088, 8051, 6502, 65816, Z80, ARM, MIPs and countless other CPU's.

But I thought that is what the NatAmi team SAID they were doing with their 68050 design?? I take it they never released anything? Not even design specifications?
« Last Edit: December 13, 2012, 07:12:20 PM by alexh »
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #18 on: December 13, 2012, 07:17:46 PM »
Quote from: mikej;718876
Yes, but they are older revisions and cannot be clocked up - and have bugs.


What are these bugs that everyone is always speaking of?

Is there a list somewhere?

Are they serious?  Or mainly just technical?  Or ?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #19 on: December 13, 2012, 07:26:52 PM »
Quote from: alexh;718906

But I thought that is what the NatAmi team SAID they were doing with their 68050 design?? I take it they never released anything? Not even design specifications?


They were working on it.  They did a lot of stuff using simulation software and ran lots of tests and made progress.

But it is true that all the code is topsecret and never got released.

After Gunnar blew up, everything just stopped AFAIK.

If there is a way for Jens to test his 68050 on an FPGAreplay then I think we could motivate him to resume work on it.

And if we could get him to team up with someone who is emotionally stable and helpful then that would speed things up too.

Is there a way that Jens can test the 050 on the FPGAreplay?

I hope so.  But the FPGA chip is so small that I don't know for fact it would fit. :(
 
In any event some of the tricks they were doing won't work on the FPGAreplay so it WILL be slower than it was planned for Natami.  FPGAreplay does not have all the SRAM blocks and stuff that Natami FPGA has.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline Blinx123

  • Sr. Member
  • ****
  • Join Date: Jul 2006
  • Posts: 383
    • Show only replies by Blinx123
Re: Full 68060 implementation?
« Reply #20 on: December 13, 2012, 07:29:36 PM »
Is there any particular reason for people still calling the Natami CPU a 68050?
Last I read, it was called a 68070.
Sam: \\"You crack me up little buddy\\"
Max: \\"I love you Sam\\"
 

Offline billt

  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 910
    • Show only replies by billt
    • http://www.billtoner.net
Re: Full 68060 implementation?
« Reply #21 on: December 13, 2012, 08:04:41 PM »
Quote from: ChaosLord;718900
I hereby appoint billt to dezine us a superfast 680x0 cpu.  We can call it the 68070.

You are hired. :D


Thank you for your faith in ability, but there's more to learn before I'd take it up. There's a followup course that I'd like to do, but it's not on the schedule this spring like I had expected, so I don't know when that might happen. I'd also like to take the followup class to the FPGA/VHDL class I took last spring. And I lack time for something like this right now anyway. I took the class because I'm interested in tinkering with TG68 and things like that, and wanted to understand this stuff. But I always struggle to find any time for any projects, and I have some other project ideas that are a lot higher on my priority list.
Bill T
All Glory to the Hypnotoad!
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #22 on: December 13, 2012, 08:13:47 PM »
Quote from: ChaosLord;718904

But from an engineering standpoint I simply don't know how to implement it.
If it can't be done as fast as the other multiplies then u hafta stop the whole pipeline from moving while u spend 2-4 cycles to complete the instruction.

Ok now that I think about it there MUST be a mechanism for stopping the pipleine because all those weirdo addressing modes in multiword instructions take multiple cycles to complete.


Motorola said they removed the 64 bit integer MULx and DIVx to simplify the pipeline while Gunnar said they were no problem to implement. I think they do add complexity to the pipeline which can slow the CPU and this was seen as not being worthwhile for the benefit. The engineers likely did not realize the benefit or did evaluations before compilers started using the invert and multiply for divide trick and before 64 bit operations and CPUs became more common. I am not a hardware design guy that could dig out the truth so this is my best guess.

Quote from: ChaosLord;718904

Maybe the problem was that its bitpattern overlapped the bitpattern of a normal multiply?  I donno.  But if that is what messed them up they could have invented a NEW completely different bitpattern for large multiplies.


I don't think it would have been a problem to use this existing undefined bit pattern but a completely new encoding of MULH would have been acceptable also.

Quote from: ChaosLord;718904

I used to tell Gunnar all the time that its ok to trap those weirdo hypercomplicated addressing modes.  But he said they worked out an optimized way for the address unit to handle it without needing to trap.


Maybe he had an idea at one point but he did not have the double indirect modes implemented in the Apollo. Then again, he changes his experimental designs quite often so maybe it would work one day and not the next.

Quote from: Matt Hey

The SWAP instruction in the 68060 should have worked in both integer units which was an oversight.


Quote from: ChaosLord;718904

I wonder if they had a reason?


There were several instructions that were left out of the 2nd integer unit (sOEP) probably to save space from less commonly used instructions but this one is common and very simple. It wouldn't have been as bad if the immediate shift worked with greater than 8 shifts but as is shift is 2x as fast as swap in the 68060. A modern 68060 would probably have a more balanced sOEP as CPU size is not as important now. Some instructions would still be sOEP only as they are uncommon or don't make sense to execute in parallel.

Quote from: ChaosLord;718904

Without clever optimizations u won't get 68060 speeds using affordable FPGA chips.


True.

Quote from: ChaosLord;718907
What are these bugs that everyone is always speaking of?

Is there a list somewhere?

Are they serious?  Or mainly just technical?  Or ?


http://cache.freescale.com/files/32bit/doc/errata/MC68060DE.pdf

The most serious bugs can be worked around but need the 68060.library.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #23 on: December 13, 2012, 08:28:25 PM »
Quote from: Blinx123;718910
Is there any particular reason for people still calling the Natami CPU a 68050?
Last I read, it was called a 68070.


Jens coded the N68050 that was far enough along that he was trying to adapt it to work in the Natami fpga up to several months ago. The N68070 was Gunnar's design (based on the N68050 and possibly with Jen's help) of a superscaler (2 integer units) CPU which became the Apollo CPU when he split from the Natami.
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #24 on: December 13, 2012, 08:30:18 PM »
Quote from: matthey;718929

Maybe he had an idea at one point but he did not have the double indirect modes implemented in the Apollo.

I wish u wouldn't call it Apollo.  Wayyyyy to confusing.

We should agree to call it 070.  Or N68070 to be precise.

Quote

 Then again, he changes his experimental designs quite often so maybe it would work one day and not the next.

Good point.





Quote

http://cache.freescale.com/files/32bit/doc/errata/MC68060DE.pdf

The most serious bugs can be worked around but need the 68060.library.

My brain just melted.

My work around is never use those first 2 mask sets.

Is that the problem with the $20.00 060s?   They are the buggy prototype versions?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #25 on: December 13, 2012, 08:36:10 PM »
Quote from: matthey;718931
J The N68070 was Gunnar's design (based on the N68050 and possibly with Jen's help) of a superscaler (2 integer units) CPU which became the Apollo CPU when he split from the Natami.

He/they actually started adding a 2nd integer unit??

Last I remember the superscalar stuff was "something to do in the future".

In fact, where I left off at, the L1 cache was not working yet or had been broken or something.

I really wish someone would finish it.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #26 on: December 13, 2012, 09:12:52 PM »
Quote from: ChaosLord;718932
I wish u wouldn't call it Apollo.  Wayyyyy to confusing.

We should agree to call it 070.  Or N68070 to be precise.


It's confusing but accurate. Gunnar made lots of changes to the N68070 design after he left so it's not really the same anymore. Then again, I can't really separate what's hype and what's real when we are talking about Gunnar so maybe it would be best not to refer to them at all :/.

Quote from: ChaosLord;718932

My brain just melted.

My work around is never use those first 2 mask sets.

Is that the problem with the $20.00 060s?   They are the buggy prototype versions?


They are not prototype masks. They just didn't have the bugs fixed yet. The 1f43g mask has all the bugs listed. I had one of these marked 50MHz and the system was reliable although it didn't even overclock to 60MHz which was bad luck. The 1g65v mask fixes 1 bug and the one I had (marked 50MHz) overclocked to 60MHz reliably but not 66MHz which I hear a few will do. There are buggy masks that have higher clock ratings like 60MHz. The 0e41J mask has all known bugs fixed (all on the errata) and can generally be clocked between 90MHz and 105MHz. It had a die shrink that allows for this. Most if not all are still marked 50MHz and there are fakes with the mask changed. All other masks are not full 68060s with MMU and FPU which I do not recommend. The 68060 may not be reliable in an Amiga without an MMU.


Quote from: ChaosLord;718934
He/they actually started adding a 2nd integer unit??

Last I remember the superscalar stuff was "something to do in the future".


That's what Gunnar claimed anyway. The 2nd integer unit was actually what he called a cheater or helper integer unit at first. It could not do calculations, only immediates and register direct which is still good. That's when the Apollo fit inside of a normal sized fpga. He has since made it bigger with 2 units targeting larger fpgas. He considers the Natami dead and no longer a potential target.
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Full 68060 implementation?
« Reply #27 on: December 14, 2012, 01:20:29 AM »
Is there any software that use 68060 (or 030, 040) specific "features" that make them incompatible if the processor isn't cycle exact and uses the same behaviour?
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #28 on: December 14, 2012, 02:34:27 AM »
Quote from: freqmax;718967
Is there any software that use 68060 (or 030, 040) specific "features" that make them incompatible if the processor isn't cycle exact and uses the same behaviour?

Programmers relied less and less on CPU timing with the later 68k. On the 68000, it was possible to map out the exact state of the CPU and what it is doing from cycle to cycle depending on the code. The pipeline was very short, there was practically no cache and the memory speed from Amiga to Amiga was pretty close. Relying on CPU timing was generally safe with the exception of the processor clock speed between NTSC and PAL which caused a few games to fail. The later addition of true fast memory was enough of a timing change to kill a fair number of games. The 68020 and 68030 were a little more difficult to count cycles (overlapping cycles) with a little longer pipeline, small caches and a difference in memory timing. Programmers started to learn it wasn't such a good idea to make timing assumptions and they had the benefit of testing on a much wider variety of CPUs with widely different timing. The 68040 and 68060 didn't break much with timing differences. Incompatibilities are usually due to other reasons like a difference in cache size (self modifying code) or not having their CPU libraries installed which allows them to have similar behavior to previous CPUs with widely different timing but it wasn't so much of a problem by this time. The 68060 is almost impossible to predict timing. The execution of integer code can vary from one execution to the next depending on which integer unit is currently used, how much instruction memory has prefetched, what's in the caches including now branch cache, instruction folding, etc. The Motorola engineers did not release all the information needed to make a cycle exact 68060 from the documentation. Testing could reveal more info but it's really rather pointless. In theory, there isn't any software that relies on the timing of the 68060. In reality, there are probably a few demos and games that would fail if the timing varies very much. They would probably fail from other CPU enhancements like faster clock speed, faster memory and faster and bigger caches or custom chip enhancements like a faster blitter and CIA timing changes first but it's not enough of a problem that we hear Amigans with 68060@100MHz complaining (the old bugs can be patched too).
« Last Edit: December 14, 2012, 02:38:52 AM by matthey »
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Full 68060 implementation?
« Reply #29 from previous page: December 14, 2012, 03:03:15 AM »
Is there any CPU specific behaviors other than what the assembler instructions specify. That any software is dependent on?

(Like instruction XX flipping register bit Y when in Z mode etc..)