Welcome, Guest. Please login or register.

Author Topic: Full 68060 implementation?  (Read 8399 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Full 68060 implementation?
« Reply #29 from previous page: December 14, 2012, 03:03:15 AM »
Is there any CPU specific behaviors other than what the assembler instructions specify. That any software is dependent on?

(Like instruction XX flipping register bit Y when in Z mode etc..)
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #30 on: December 14, 2012, 03:15:18 AM »
Quote

The 68060 is almost impossible to predict timing. The execution of integer code can vary from one execution to the next depending on which integer unit is currently used, how much instruction memory has prefetched, what's in the caches including now branch cache, instruction folding, etc. The Motorola engineers did not release all the information needed to make a cycle exact 68060 from the documentation.


Several years ago I coded a special fx.  It does some gfx calculations then does a full screen rotation.  Obviously it burns a lot of cycles so I timed it often to see how slow it was.

Each time I do the fx, I either get time A or I get a totally different and much slower Time B.  I never get anything in between.  Its very very confusing to me.

Sometimes the routine goes at the speed I want and other times much slower.  It makes no sense.  Its like it has 2 gears it can run in.

I just assumed it was either:
A: When I run the exe, the code happens to load in such a way that the code for that fx is somehow compatible with the pipeline, and other times it is somehow misaligned so that it runs much slower.

B: When I run the exe, the memory for the fx gets allocated in a manner that somehow makes a dramatic difference to the speed of the routine.  Maybe a certain memory alignment works better/worse with the cache.

I don't specifically know how either of these is possible.  Its just my best guess.

What I should have done was picked up the phone and said "Hey Matt why is my code acting wonky?" :biglaugh:

All my timing tests, data and documentation of this anomaly were in my history file which was lost in a hard drive gitch / brownout. :(
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #31 on: December 14, 2012, 03:17:08 AM »
Quote from: freqmax;718973
Is there any CPU specific behaviors other than what the assembler instructions specify. That any software is dependent on?


I guess u mean like undocumented opcodes?

I havent heard of undocumented opcodes doing anything since the C64 days.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Full 68060 implementation?
« Reply #32 on: December 14, 2012, 03:32:23 AM »
It goes further than opcodes. As the I/O connections and modes may screw around with execution behaviour, or even bus response.
 

Offline Iggy

  • Hero Member
  • *****
  • Join Date: Aug 2009
  • Posts: 5348
    • Show only replies by Iggy
Re: Full 68060 implementation?
« Reply #33 on: December 14, 2012, 05:02:32 AM »
Quote from: ChaosLord;718975
I guess u mean like undocumented opcodes?

I havent heard of undocumented opcodes doing anything since the C64 days.


The Hitachi 6309, now that was the king of hidden opcodes.
An extra 16bit accumulator.
A 32bit accumulator.
I better stop there because the list takes up about a page.
"Not making any hard and fast rules means that the moderators can use their good judgment in moderation, and we think the results speak for themselves." - Amiga.org, terms of service

"You, got to stem the evil tide, and keep it on the the inside" - Rogers Waters

"God was never on your side" - Lemmy

Amiga! "Our appeal has become more selective"
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #34 on: December 14, 2012, 05:50:51 AM »
Quote from: freqmax;718973
Is there any CPU specific behaviors other than what the assembler instructions specify. That any software is dependent on?

(Like instruction XX flipping register bit Y when in Z mode etc..)


The behavior of existing instructions and addressing modes is already defined in earlier 68k processors and if it doesn't match, it's a bug and will likely crash. I do know of some software that avoids bugs that might be in the 68060 but it will run on 68060s without the bug as well as other 68k processors and does not depend on the behavior of the 68060. I have also seen documentation of the 68060 changed because it was incorrect (but not a bug). This would be more likely to affect early hardware developed for the 68060 but could affect drivers (software) that rely on a particular timing of such early hardware. I don't foresee many incompatibility problems (and those can be fixed in fpga) when running 68060 code on a non-cycle exact advanced 68k fpga CPU.

Quote from: ChaosLord;718974
Several years ago I coded a special fx.  It does some gfx calculations then does a full screen rotation.  Obviously it burns a lot of cycles so I timed it often to see how slow it was.

Each time I do the fx, I either get time A or I get a totally different and much slower Time B.  I never get anything in between.  Its very very confusing to me.

Sometimes the routine goes at the speed I want and other times much slower.  It makes no sense.  Its like it has 2 gears it can run in.


My guess would be that some cache (could be the branch cache also) gets flushed. It could show like that if inadvertently synced to a task switch. You could try turning off different caches individually or disabling multitasking to see if it makes the timing closer. I often find minor differences in speed myself. It's like the 68060 is alive but chaos is ordered once the complexity is understood ;).
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Full 68060 implementation?
« Reply #35 on: December 14, 2012, 12:50:00 PM »
Have look at alignment issues for those timing issues. Especially in combination with pre-fetch (L-cache).
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #36 on: December 14, 2012, 01:48:37 PM »
Quote from: freqmax;719017
Have look at alignment issues for those timing issues. Especially in combination with pre-fetch (L-cache).


The 68060 does a fantastic job of handling misaligned data, especially on reads. By the documentation, a cycle is lost hear and there but I have found no measurable speed difference by aligning code (I-cache) for example. This is in contrast to the 68020/68030 where aligning branch targets to a longword can result in ~5% speedup.
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #37 on: December 15, 2012, 09:37:45 AM »
Quote from: matthey;719021
The 68060 does a fantastic job of handling misaligned data, especially on reads. By the documentation, a cycle is lost hear and there but I have found no measurable speed difference by aligning code (I-cache) for example. This is in contrast to the 68020/68030 where aligning branch targets to a longword can result in ~5% speedup.

At one point I was thinking of going back thru all my asm code and massaging it so that all my popular branch targets were longword aligned...  But then I never did it.  What is the command for doing that in Devpac?

I haven't written any real asm in 4 years.  I'm forgetting everything.

Maybe I didn't bother to do the massage because there is no benefit on 040 or 060?

Does 040 get any benefit from longword aligned branch targets?

And by branch targets, does that include bne as well as bsr/jsr and JMP?
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Full 68060 implementation?
« Reply #38 on: December 15, 2012, 12:56:06 PM »
Quote from: ChaosLord;719184
At one point I was thinking of going back thru all my asm code and massaging it so that all my popular branch targets were longword aligned...  But then I never did it.  What is the command for doing that in Devpac?


Most assemblers will accept:

CNOP 0,4  ;longword align

Quote from: ChaosLord;719184

Maybe I didn't bother to do the massage because there is no benefit on 040 or 060?

Does 040 get any benefit from longword aligned branch targets?


The 040 handles mis-alignment well like the 060. It's probable that a cycle is saved from time to time by aligning code but aligning code can result in less code in a cache line which can cost a cycle from time to time. It may still be effective to align the start of commonly used code to a longword (maybe even cache line with CNOP 0,16 but that starts to become wasteful of memory if not extremely common) which is easy enough and doesn't waste cache. Even on the 040/060 where there is no penalty to read any part of a cache line, it still takes 2x as long to load 2 cache lines as 1. The 060 at least, does such a good job of code alinement and code caching that I was unable to time a significant difference by aligning code. Many modern processors can't do this.

Quote from: ChaosLord;719184

And by branch targets, does that include bne as well as bsr/jsr and JMP?


Yes, I believe this includes Bcc branch targets. All instruction fetches on the 020 are longword and the 020 is delayed by fetches over a word. I would avoid using NOP instructions for alignment in any code that is executed.
 

Offline AJCopland

Re: Full 68060 implementation?
« Reply #39 on: December 15, 2012, 12:59:21 PM »
Quote from: Blinx123;718910
Is there any particular reason for people still calling the Natami CPU a 68050?
Last I read, it was called a 68070.

Yeah, whilst working on it everyone was keen for it to be fully super-scalar, OoO, dual-issue etc so it would be architecturally like a 68060 but more advanced. In the end that was deemed a little impractical for a first attempt so instead it would be more like a 68040, but a bit more advanced hence 68050, the "N" was to separate it from the Motorola series.

EDIT: Oh yes and as was pointed out by Matthey, the "N68070" idea eventually became the Apollo thing that Gunnar is off doing now! Very confusing. So it's probably best discussed as a timeline!
1. the "N68070" was the original target,
2. became the more achievable "N68050", ran code in a simulator,
3. "N68070" would be the limited-OoO, dual-issue future version of the "N68050",
4. Gunnar left and started the "Apollo" project which is basically the "N68070".

Erm... the only one I know existed is the N68050 which Jens & Gunnar had running in a simulator whilst I was still involved with Natami. Dunno what happened after that.
« Last Edit: December 15, 2012, 01:07:02 PM by AJCopland »
Be Positive towards the Amiga community!
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #40 on: December 15, 2012, 01:38:20 PM »
Quote from: matthey;719199
Most assemblers will accept:

CNOP 0,4  ;longword align

Thanx!  I just remembered CNOP 0,4 right before I clicked ur msg.  It just popped into my head.  Sometimes I hafta tilt my head a little to get the fluids over to the dry part of my brain. :)



Quote

The 040 handles mis-alignment well like the 060. It's probable that a cycle is saved from time to time by aligning code but aligning code can result in less code in a cache line which can cost a cycle from time to time. It may still be effective to align the start of commonly used code to a longword (maybe even cache line with CNOP 0,16 but that starts to become wasteful of memory if not extremely common) which is easy enough and doesn't waste cache. Even on the 040/060 where there is no penalty to read any part of a cache line, it still takes 2x as long to load 2 cache lines as 1. The 060 at least, does such a good job of code alinement and code caching that I was unable to time a significant difference by aligning code. Many modern processors can't do this.

Now u have reminded me why I didn't do it.  Too complicated.  I might make things worse.  Its easier to just tell ppl to buy an 060 card. :D
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: Full 68060 implementation?
« Reply #41 on: December 15, 2012, 01:58:26 PM »
Calling it 68050 was never cool.

It had new instructions (addressing modes) so if I wrote code for it things would get really messed up!

I could say "This game requires 68050+" but that would be a lie because it would not work on 060.

So I would need to say "This game requires 68050 or 68070+ but NOT 68060"
Its just dumb.

If you add new instructions you should just call it a 68070.  Then all programs written for that instruction set can say "Requires 68070+"

It saves thousands of hours of confusion from 10,000s of Amiga users.

I tried to explain this years ago but as usual, Gunnar would not listen.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline psxphill

Re: Full 68060 implementation?
« Reply #42 on: December 15, 2012, 02:35:05 PM »
Quote from: ChaosLord;719204
Calling it 68050 was never cool.
 
It had new instructions (addressing modes) so if I wrote code for it things would get really messed up!

Adding new instructions or addressing modes is not cool. We need something that implements an 060, fpu & mmu in an fpga. I don't care if it's super scalar or supports out of order execution.
 

Offline freqmaxTopic starter

  • Hero Member
  • *****
  • Join Date: Mar 2006
  • Posts: 2179
    • Show only replies by freqmax
Re: Full 68060 implementation?
« Reply #43 on: December 15, 2012, 04:02:31 PM »
Can one skip out-of-order execution, super scalar, etc.. and still run 68060 code?

What are the minimum feature set that has to be implemented? (albeit slow..)
 

Offline bloodline

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Full 68060 implementation?
« Reply #44 on: December 15, 2012, 04:14:13 PM »
May I just remind everyone that the 68070 was a licenced clone of the 68k by Philips, with a few extra bits of hardware on the silicon. :)