Welcome, Guest. Please login or register.

Author Topic: Full 68060 implementation?  (Read 8460 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« on: December 13, 2012, 01:24:34 AM »
Sure, if you want to spend the years to write one.  FPGA size is not the problem.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #1 on: December 13, 2012, 03:23:12 AM »
Quote from: matthey;718839
A 100% logic equivalent is not possible in fpga but a fully functionally equivalent soft CPU is possible. Note that muxes are much slower and bigger in an fpga than in silicon which messes up the timing as other logic elements are forced to wait. A fast fpga can more than make up for this deficiency and others but an optimal design will be different and optimized for an fpga, likely even for a particular fpga.

+1



Quote

 Even the older RC revisions have an MMU, FPU and less bugs than any 68k fpga CPU yet.
:roflmao: good point!
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #2 on: December 13, 2012, 06:27:20 PM »
I hereby appoint billt to dezine us a superfast 680x0 cpu.  We can call it the 68070.

You are hired. :D
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #3 on: December 13, 2012, 06:51:43 PM »
Quote from: matthey;718899
Getting rid of the integer 64 bit result MULx was a mistake as it's used commonly by compilers to do an invert and multiply by a constant instead of a divide by a constant. It wouldn't have been so bad if they would at least have defined and allowed a MULx where Sz=1 (64 bit result) and Dl (result register low) = Dh (result register high) giving the upper 32 bits of the result (like PPC MULH). This is worth while to define even with 64 bit results as it saves trashing a register when the lower 32 bits are not needed. See MULS and MULU here:

http://www.heywheel.com/matthey/Amiga/68kF_PRM.pdf


Philosophically I agree with u.

But from an engineering standpoint I simply don't know how to implement it.
If it can't be done as fast as the other multiplies then u hafta stop the whole pipeline from moving while u spend 2-4 cycles to complete the instruction.

Ok now that I think about it there MUST be a mechanism for stopping the pipleine because all those weirdo addressing modes in multiword instructions take multiple cycles to complete.

So ok, I agree it was a total mistake to take that out.

Maybe the problem was that its bitpattern overlapped the bitpattern of a normal multiply?  I donno.  But if that is what messed them up they could have invented a NEW completely different bitpattern for large multiplies.



Quote

It would be better to install the 68060.library from flash or kickstart so that traps are never a problem.

+2


Quote

 Some instructions and addressing modes are better trapped and simplifying gives gains elsewhere. Notice that I removed (for trapping) the double indirect addressing modes that used the outer displacement in the 68kF ISA pdf above as there was little advantage (only useful when no free registers) and simplifies the decoder. If all remaining full extension word format addressing modes could be 1 cycle faster then it would be worth it.

I used to tell Gunnar all the time that its ok to trap those weirdo hypercomplicated addressing modes.  But he said they worked out an optimized way for the address unit to handle it without needing to trap.

Quote

The SWAP instruction in the 68060 should have worked in both integer units which was an oversight.

I wonder if they had a reason?



Quote

68020+ support should be the minimum for the Amiga IMO. It makes programming much easier than the 68000 while offering significantly better speed and code density.

+1

Quote

 The Natami CPU (N68050) is not finished and was only partially supporting the 68020 last I understood but it is fairly advanced as far as cache and pipeline design. I don't think Jens is working on it much if any anymore.

:(((((


Quote

 He has talked about making it open source though.

Dear God I hope so.

They cooked up really kewl trix that can make a really fast 680x0 cpu!

Making a 680x0 softcore is all about how many clever tricks you can come up with and combine them all together to make something fast.

Without clever optimizations u won't get 68060 speeds using affordable FPGA chips.




Quote

 Gunnar is working on a soft CPU based on it but he is not very reliable. He claims to be experimenting with a 200MHz softcore although he increased the pipeline length significantly in order to do it. This increases branch penalties and can cause other stalls much like a highly clocked DSP, GPU or x86 CPU. Even if I could believe him, it's experimental at best and Gunnar has a history of not completing much.

If someone would give him some medication to make him calm down and just write code without attacking everybody all the time then he could write good code and finish everything.  He could start by trying Lorazepam.  If that doesn't work for him then he could try something else.

p.s  If someone would clone me then my other me would very happily spend 20 hours a day for 2 years to cook up an awesome 68070.  With Jens, Matt, Phil and some other asm guys helping I know we could do it.   Sadly I am only 1 person and I just can't devote that kind of time this project. :(

Once I start trying to solve a puzzle I get obsessed and can't stop.  So I must stay away from puzzles that I know are large and complex since I have other responsibilities in my life that I must tend to. :(
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #4 on: December 13, 2012, 07:17:46 PM »
Quote from: mikej;718876
Yes, but they are older revisions and cannot be clocked up - and have bugs.


What are these bugs that everyone is always speaking of?

Is there a list somewhere?

Are they serious?  Or mainly just technical?  Or ?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #5 on: December 13, 2012, 07:26:52 PM »
Quote from: alexh;718906

But I thought that is what the NatAmi team SAID they were doing with their 68050 design?? I take it they never released anything? Not even design specifications?


They were working on it.  They did a lot of stuff using simulation software and ran lots of tests and made progress.

But it is true that all the code is topsecret and never got released.

After Gunnar blew up, everything just stopped AFAIK.

If there is a way for Jens to test his 68050 on an FPGAreplay then I think we could motivate him to resume work on it.

And if we could get him to team up with someone who is emotionally stable and helpful then that would speed things up too.

Is there a way that Jens can test the 050 on the FPGAreplay?

I hope so.  But the FPGA chip is so small that I don't know for fact it would fit. :(
 
In any event some of the tricks they were doing won't work on the FPGAreplay so it WILL be slower than it was planned for Natami.  FPGAreplay does not have all the SRAM blocks and stuff that Natami FPGA has.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #6 on: December 13, 2012, 08:30:18 PM »
Quote from: matthey;718929

Maybe he had an idea at one point but he did not have the double indirect modes implemented in the Apollo.

I wish u wouldn't call it Apollo.  Wayyyyy to confusing.

We should agree to call it 070.  Or N68070 to be precise.

Quote

 Then again, he changes his experimental designs quite often so maybe it would work one day and not the next.

Good point.





Quote

http://cache.freescale.com/files/32bit/doc/errata/MC68060DE.pdf

The most serious bugs can be worked around but need the 68060.library.

My brain just melted.

My work around is never use those first 2 mask sets.

Is that the problem with the $20.00 060s?   They are the buggy prototype versions?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #7 on: December 13, 2012, 08:36:10 PM »
Quote from: matthey;718931
J The N68070 was Gunnar's design (based on the N68050 and possibly with Jen's help) of a superscaler (2 integer units) CPU which became the Apollo CPU when he split from the Natami.

He/they actually started adding a 2nd integer unit??

Last I remember the superscalar stuff was "something to do in the future".

In fact, where I left off at, the L1 cache was not working yet or had been broken or something.

I really wish someone would finish it.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #8 on: December 14, 2012, 03:15:18 AM »
Quote

The 68060 is almost impossible to predict timing. The execution of integer code can vary from one execution to the next depending on which integer unit is currently used, how much instruction memory has prefetched, what's in the caches including now branch cache, instruction folding, etc. The Motorola engineers did not release all the information needed to make a cycle exact 68060 from the documentation.


Several years ago I coded a special fx.  It does some gfx calculations then does a full screen rotation.  Obviously it burns a lot of cycles so I timed it often to see how slow it was.

Each time I do the fx, I either get time A or I get a totally different and much slower Time B.  I never get anything in between.  Its very very confusing to me.

Sometimes the routine goes at the speed I want and other times much slower.  It makes no sense.  Its like it has 2 gears it can run in.

I just assumed it was either:
A: When I run the exe, the code happens to load in such a way that the code for that fx is somehow compatible with the pipeline, and other times it is somehow misaligned so that it runs much slower.

B: When I run the exe, the memory for the fx gets allocated in a manner that somehow makes a dramatic difference to the speed of the routine.  Maybe a certain memory alignment works better/worse with the cache.

I don't specifically know how either of these is possible.  Its just my best guess.

What I should have done was picked up the phone and said "Hey Matt why is my code acting wonky?" :biglaugh:

All my timing tests, data and documentation of this anomaly were in my history file which was lost in a hard drive gitch / brownout. :(
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #9 on: December 14, 2012, 03:17:08 AM »
Quote from: freqmax;718973
Is there any CPU specific behaviors other than what the assembler instructions specify. That any software is dependent on?


I guess u mean like undocumented opcodes?

I havent heard of undocumented opcodes doing anything since the C64 days.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #10 on: December 15, 2012, 09:37:45 AM »
Quote from: matthey;719021
The 68060 does a fantastic job of handling misaligned data, especially on reads. By the documentation, a cycle is lost hear and there but I have found no measurable speed difference by aligning code (I-cache) for example. This is in contrast to the 68020/68030 where aligning branch targets to a longword can result in ~5% speedup.

At one point I was thinking of going back thru all my asm code and massaging it so that all my popular branch targets were longword aligned...  But then I never did it.  What is the command for doing that in Devpac?

I haven't written any real asm in 4 years.  I'm forgetting everything.

Maybe I didn't bother to do the massage because there is no benefit on 040 or 060?

Does 040 get any benefit from longword aligned branch targets?

And by branch targets, does that include bne as well as bsr/jsr and JMP?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #11 on: December 15, 2012, 01:38:20 PM »
Quote from: matthey;719199
Most assemblers will accept:

CNOP 0,4  ;longword align

Thanx!  I just remembered CNOP 0,4 right before I clicked ur msg.  It just popped into my head.  Sometimes I hafta tilt my head a little to get the fluids over to the dry part of my brain. :)



Quote

The 040 handles mis-alignment well like the 060. It's probable that a cycle is saved from time to time by aligning code but aligning code can result in less code in a cache line which can cost a cycle from time to time. It may still be effective to align the start of commonly used code to a longword (maybe even cache line with CNOP 0,16 but that starts to become wasteful of memory if not extremely common) which is easy enough and doesn't waste cache. Even on the 040/060 where there is no penalty to read any part of a cache line, it still takes 2x as long to load 2 cache lines as 1. The 060 at least, does such a good job of code alinement and code caching that I was unable to time a significant difference by aligning code. Many modern processors can't do this.

Now u have reminded me why I didn't do it.  Too complicated.  I might make things worse.  Its easier to just tell ppl to buy an 060 card. :D
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #12 on: December 15, 2012, 01:58:26 PM »
Calling it 68050 was never cool.

It had new instructions (addressing modes) so if I wrote code for it things would get really messed up!

I could say "This game requires 68050+" but that would be a lie because it would not work on 060.

So I would need to say "This game requires 68050 or 68070+ but NOT 68060"
Its just dumb.

If you add new instructions you should just call it a 68070.  Then all programs written for that instruction set can say "Requires 68070+"

It saves thousands of hours of confusion from 10,000s of Amiga users.

I tried to explain this years ago but as usual, Gunnar would not listen.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #13 on: December 15, 2012, 04:59:24 PM »
Quote from: freqmax;719214
Can one skip out-of-order execution, super scalar, etc.. and still run 68060 code?

What are the minimum feature set that has to be implemented? (albeit slow..)


u can run normal 68060 code on 68020.
Or 68020+68881 FPU

And there is no Out-of-order execution in 68060.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show all replies
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #14 on: December 15, 2012, 05:44:37 PM »
Quote from: matthey;719229
There can be multiple CPU lines by multiple manufacturers. You could say it requires N68050+ (The Motorola line was M68k).

That would be very confusing to many "regular Joes".
I don't want to spend time answering emails explaining the cpu naming system.


Quote

 Another option would be to specify the ISA like "68kF1+". That is usually what happens with ARM where there are *many* manufacturers and processor names.

That is better than calling it N68050.

Or we could just all it N68070 and be done with it :)

Altho the FPGAReplay guys will have theirs out first so it will be F68070 or R68070 :D
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA