Author Topic: Do you approve of PPC (in some form) as the future of Amiga? (Read 29486 times)

matthey · « **on:** October 14, 2010, 05:11:47 PM »

Quote from: Krashan;584685

That's true, but then you are sentenced to stay below 100 MHz forever. Some may accept it, some not.

Wrong! A medium sized and priced fpga should allow an enhanced 68k processor to achieve around 150MHz. That is what the Natami team is expecting and that is what similar complexity and sized processors are running in fpga. That should compare nicely to a single core PPC at more than 2x the clock speed and possibly approaching 3x the clock speed. FPGA's are getting cheaper faster than PPC processors are getting faster. The 68k is easier to program and much more tolerant of poorly optimized code like Hyperion is famous for. It's also much easier to debug than PPC or x86.

Disclosure: I love 68k, like PPC and hate x86. I only own classic Amigas but I am supportive of all Amiga users.

matthey · « **Reply #1 on:** October 14, 2010, 09:59:02 PM »

Quote from: Heiroglyph;584709

Wow, a whole 150Mhz?

You have to start some where. Creating a CPU in fpga is not trivial. This will ramp up quite quickly if there is a market. Fpga's will become faster and if a CPU could be burned then a very powerful for CISC 500MHz-1GHz would be possible. If the N68k could become a popular processor again, then more is possible. There is nothing besides money that keeps the 68k from x86 performance. The 68k has several advantages over x86 as far as making a chip. It's simpler, has smaller code (translates to less caches needed) and has less baggage that needs to be supported because it has a good design to start with.

Quote

Assuming 060 performance, not 020/030 as most fpgas currently are:
x86 is clearly fastest per dollar now and will be for the foreseeable future.

I agree. No contest. Games push the high end desktop market and the x86 duopoly is able to crush any and all competition. The mobile multimedia market, laptop/netbook and enbedded markets are huge and growing. This is more becoming ARM territory but an enhanced 68k has advantages over ARM. It is easier to program and has smaller code. ARM's advantage was simplicity but it lost most of that with enhancements.

The first Natami fpga CPU, the N68050, should be more powerful than the 68060 at the same clock speed. The superscaler N68070 should be able to average several complex instructions per clock. The N68k will have fairly modern enhancements like larger caches and DDR2 memory. The 68060 performs very well for it's day. It was probably the best processor of the day but was never clocked up and killed for marketing reasons. Motorola had decided to go PPC and was selling everyone that it was the future. The 68060 outperformed the PPC processors for the first couple of years. It did more with less.

Quote

I'm not sure why anyone would want to hobble Amiga with slow CPUs, they used the 68000 because it was the fastest for the money at the time.

The 68000 was a good fit. It was powerful but also flexible. I don't think the AmigaOS would be as efficient or as good if x86 had been used. The Amiga was never about having the fastest CPU. It was about having several processors working in parallel. It was ahead of it's time because that is where processors are headed today, integrating processors like SIMD and GPU to the CPU (faster & saves power). That is also what Natami is trying to do.

Quote

Before Amiga went under they were already looking for alternative CPUs, so I'm not sure why there is such a strong attachment to a basic component that was near end of life already.

That was because Motorola decided to kill the 68k in favor of the PPC. Motorola wasn't going to produce any new 68k processors.

matthey · « **Reply #2 on:** October 15, 2010, 03:59:28 AM »

Quote from: minator;584759

Assuming they can actually get it to 150MHz, I expect it'll have trouble matching a PPC at the same clock speed never mind exceeding it.

If they don't get 150MHz this year, they probably will next as the speed of fpgas increase and the price drops. There are fpgas that are fast enough now but they cost a lot of money. The 68k did often outperform the early PPC processors. Even the 68040 outperformed the early PPC processors on the early MACs. Amiga users with 68060 and Shapeshifter or Fusion had the fastest MAC for about a year. Apple made the later MAC OS incompatible with the 68060 because of it. MAC OS 7 worked great with the 68060 and then MAC OS 8 didn't for some odd reason. An fpga N68k isn't going to wow people or steal x86 market share but it will probably be fast enough to impress Amiga users still using the classic and fast enough for general computer needs. That's enough for me. I'm not opposed to PPC Amigas. I would like to see 68k for the low end and laptops and PPC for the high end and desktops. The attitude of Hyperion has turned me off though. I prefer the openness of Natami and AROS (but don't care for the x86 focus of AROS).

Here's what an IBM engineer has to say about the 68060 and PPC...

"With 2 instructions per clock and excellent multiplication and branch performance, the 68060 performs very good. Depending on the workload the 68060 can even outperform similar clocked 60x/G2/G3 PowerPC CPU."

"Actually the 68060 is faster in multiplication than many PowerPC.
The PPC G2 (603) and G3 CPU need 5 cycles for a multiplication.
Which means a 100 Mhz 68060 achieves the same multiplication performance as an 250 Mhz G3."

http://www.natami.net/knowledge.php?b=4¬e=2418

Let's look at some 68k code to see what is so great about the 68k. Let's take a simple 68k memory copy with size (longwords) in d0...

.loop:
move.l (a0)+,(a1)+
subq.l #1,d0
bcc.b .loop

Let's say we don't know the alignment of the data either. This copies 1 longword/cycle with data aligned and is 6 bytes. If data is unaligned this is still pretty good. Now write that on PPC with anywhere near the performance. Don't let the old outdated 68060 with tiny little cache and only 4 bytes of instruction fetch/cycle DESTROY. I'll even give you a few hints. You better align the data first or the performance is really bad. You will need twice as many instructions to duplicate what's above. You will need to use an unrolled loop (wasting more code) and preload the cache. If you do all that optimally, you are still likely slower than the 68060

. No wonder PPC needs all those GHz.

matthey · « **Reply #3 on:** October 15, 2010, 03:34:57 PM »

Quote from: Karlos;584834

This example isn't really good at anything other than demonstrating the 68K is forgiving of lazy programmers.

It's not just lazy programmers. Good programmers mess up simple loops too. The above code is excellent for aligned data on the 68060 and small unaligned copies. This will also be the case for the N68050+. This code can be inlined because it's so small saving even more. Most processors require the unrolled loop and are very particular about alignment. Get it wrong and performance is a fraction of what it could be. The 68060 and N68050+ are just as fast if you use the unrolled loop and in some cases will be a little bit faster if you align the data. Use a movem.l copy loop and there is very little slow down too. Code optimized for any 68k processor will run very well on the 68060 and N68k. Is forgiving, easy to program, and small code bad?

Quote

The alignment is never an unknown property;

Check out exec.library/CopyMem(). It will copy memory of any alignment. This function is used way more than exec.library/CopyMemQuick() which does longword aligned copies. Never say never

.

Quote

all you need to do is test the least significant bit(s) of the source and destination operands. You can also test the count and build a nice duff's device loop in assembler and only handle the trailing bytes before and after. On the 060, you might even be able to use move16, under the right circumstances; even when source and destination is not 16-byte aligned, you can often read (or write) via a temporary cache line in an appropriately aligned bit of stack.

Duff's device loop is slower on the 68060. Modern processors don't like the unpredictable jmp statement this generates and the 68060 doesn't need an unrolled loop. This is what the old SAS/C copy routine uses though. Does it perform poorly on the 68060? Not too bad. Are you a lazy programmer because you chose a slower copy routine that is ten times bigger? Nope. You're covered because the 68k is forgiving

.

You are right about testing the least significant bits to align the data except you only need to align the destination. Reading unaligned data is not as bad as writing unaligned data. Move16 is barely worth it on the Amiga at all. You have to copy several thousand bytes to make it worthwhile. Copies this large are rare on 68k Amigas. Here is a patch I wrote for CopyMem() and CopyMemQuick() on the 68060 and 68040...

http://aminet.net/util/boot/CopyMem.readme (readme)
http://aminet.net/util/boot/CopyMem.lha

It has assembler sources, timings, a Snoopy script that will show you the sizes and alignments of CopyMem() and CopyMemQuick() copies and more info. The 68060 optimized code is only 35% faster than the AmigaOS 3.9 functions which use a poorly implemented movem.l loop. They should have used an unrolled loop for the best performance on 68020-68060 and considering most copies are small. More lazy programmers?.

Author Topic: Do you approve of PPC (in some form) as the future of Amiga? (Read 29486 times)

matthey

Re: Do you approve of PPC (in some form) as the future of Amiga?

matthey

Re: Do you approve of PPC (in some form) as the future of Amiga?

matthey

Re: Do you approve of PPC (in some form) as the future of Amiga?

matthey

Re: Do you approve of PPC (in some form) as the future of Amiga?