Amiga.org

Amiga computer related discussion => Amiga Hardware Issues and discussion => Topic started by: KennyR on December 31, 2003, 03:55:33 PM

Title: Some interesting Altivec figures
Post by: KennyR on December 31, 2003, 03:55:33 PM
Dunno if people are aware of this already, but... Wanna see how powerful Altivec is?

RC5 isn't a general benchmark, let's get that straight first off. It's not even a particularly useful one. It takes no account of system speed, and only uses raw number-crunching abilities of the CPU and its internal cache and registers.

But anyway, my Pegasos G3/600 can do around 2 million keys per second. My Athlon 1.3GHz PC can do around 4 million. Pentiums are considerably weaker, and maybe more modern Athlons are too.

But from what I've heard, Pegasos-2 G4/1GHz using the Altivec core can manage 10 million keys per second. Presumably an AmigaONE/G4 can do the same. G4 Macs certainly can (http://volker.preil.bei.t-online.de/bench/bench-rc5-72.htm).

Note that this is not an open invitation for another boring x86 vs PPC thread (since this info has little practicality in everyday use), and I just thought people would be interested in seeing the power of Altivec, when used for what it was really designed for - raw maths.
Title: Re: Some interesting Altivec figures
Post by: KennyR on December 31, 2003, 04:11:32 PM
More precise benchmarks (http://www.iki.fi/sintonen/dnetc-peg2-vec.txt).
Title: Re: Some interesting Altivec figures
Post by: AntonioX on December 31, 2003, 07:08:10 PM
May be i have to start using it again for team amiga, as I have a PowerMac duel 867  :-D
Title: Re: Some interesting Altivec figures
Post by: bloodline on December 31, 2003, 07:42:59 PM
Quote

KennyR wrote:
But anyway, my Pegasos G3/600 can do around 2 million keys per second. My Athlon 1.3GHz PC can do around 4 million. Pentiums are considerably weaker, and maybe more modern Athlons are too.



Well you show me some Athlon64 benchmarks and cheer me up :-D

Anyway, happy new year, and what the hell happened to your boings and Rank?!?! :-o
Title: Re: Some interesting Altivec figures
Post by: KennyR on December 31, 2003, 07:57:14 PM
Problem with more modern x86 is that they removed an instruction used by rc5, to streamline the core, or so I'm told - I don't know much about it. So Athlon64 wouldn't necessarily mean faster rc5 cracking, although it would beat the proverbial crap out of G4 every other way.

Isn't there an Athlon64 listed in that 2nd url I gave, too? Is it the same one you mean? If so it's about 6 million keys at 1.6GHz.

Quote
Anyway, happy new year, and what the hell happened to your boings and Rank?!?! :-o


I trancended. Happy new year, mortal. :-D
Title: Re: Some interesting Altivec figures
Post by: Lo on December 31, 2003, 08:14:00 PM
Quote
I trancended. Happy new year, mortal.


Ah,  hence the avatar.  Did you have to do in the DemiGod Santa Claus?   :-P
Title: Re: Some interesting Altivec figures
Post by: Aragorn on December 31, 2003, 08:53:20 PM
http://n0cgi.distributed.net/speed/
OGR
my AMD Athlon XP Barton 2600+
14,937,961 nodes/sec

PowerPC 744x/745x G4 1000
10,680,517 nodes/sec

Title: Re: Some interesting Altivec figures
Post by: KennyR on December 31, 2003, 08:57:35 PM
@Aragorn

That's OGR nodes. We're talking RC5-72 keys.

Edit: though it doesn't matter, sorry. I didn't see you were comparing different CPUs fairly.
Title: Re: Some interesting Altivec figures
Post by: bloodline on December 31, 2003, 09:12:31 PM
Quote

Aragorn wrote:
http://n0cgi.distributed.net/speed/
OGR
my AMD Athlon XP Barton 2600+
14,937,961 nodes/sec

PowerPC 744x/745x G4 1000
10,680,517 nodes/sec



That looks about right, and what one would expect if we were to compare Clock Speed of the AthlonXP (rather than AMD PR ratings) and the G4.

That seems to support my argument that clock for clock the modern PPC and Athlons are virtually identical in performance :-)
Title: Re: Some interesting Altivec figures
Post by: AmFreak on December 31, 2003, 10:48:22 PM
Quote

bloodline wrote:
Quote

Aragorn wrote:
http://n0cgi.distributed.net/speed/
OGR
my AMD Athlon XP Barton 2600+
14,937,961 nodes/sec

PowerPC 744x/745x G4 1000
10,680,517 nodes/sec



That looks about right, and what one would expect if we were to compare Clock Speed of the AthlonXP (rather than AMD PR ratings) and the G4.

That seems to support my argument that clock for clock the modern PPC and Athlons are virtually identical in performance :-)



Thats because OGR doesnt use altivec ...
Title: Re: Some interesting Altivec figures
Post by: Piru on December 31, 2003, 11:47:48 PM
Quote
PowerPC 744x/745x G4 1000
10,680,517 nodes/sec

Wow. What a poor result. Dunno if that's from old OGR core or something, but I get:

13,016,835 nodes/sec.

PowerPC 7447 G4 1000. That's 21.9% more than the result on the page. MacOS overhead perhaps? :-)
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 01, 2004, 02:02:21 AM
Quote
RC5 isn't a general benchmark, let's get that straight first off. It's not even a particularly useful one. It takes no account of system speed, and only uses raw number-crunching abilities of the CPU and its internal cache and registers.

RC5 is not the only benchmark to test for raw number-crunching abilities of the CPU and its internal cache and registers.

Why not OpenSSL benchmarks (it should fit within full size L2 cache)?

Quote

But anyway, my Pegasos G3/600 can do around 2 million keys per second. My Athlon 1.3GHz PC can do around 4 million. Pentiums are considerably weaker,

Why not try it with Intel "Pentium M @1.3Ghz"?...

Quote

and maybe more modern Athlons are too.

In general terms, the Thunderbird  core (Model 4)  is considered weaker than Barton core (Model 10).

My old AMD K7 AThlon XP (Palomino Core) @1.5Ghz/FSB266/NT5.1, yields
 RC5-72: [5,640,341 keys/sec]

I may try it on the other Athlon XP 2.2Ghz/400FSB/nForce2 400 Ultra and/or HP’s spec Athlon XP 2.33Ghz later…
Title: Re: Some interesting Altivec figures
Post by: Dr_Bombcrater on January 01, 2004, 03:52:46 AM
RC5 depends heavily on specific bitwise rotation instructions, which are rarely used by anything else. The figures on that page don't tell you anything other than how fast those chips can run the RC5 client.

As I recall the old AMD K5 was a superstar at RC5 despite being an otherwise lacklustre processor simply because the instructions that RC5 executes repeatedly worked very quickly on the K5 core.

On the vast majority  of tasks a modern x86 processor will crush a G4, if only because PPC systems tend to have poor infrastructure surrounding the CPU (PC133 memory, slow busses, etc).

I think we need to accept that and move on. CPU speed is no longer a critical factor in how useful a machine is anyway.
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 01, 2004, 03:54:43 AM
Quote
More precise benchmarks.

Some minor issues;

How does one get an AMD Athlon XP with "Palomino" core at 3200MHz?

My "Palomino" core max'ed out at 1.7Ghz with 1.8 core volts.

Some minor issues;

AMD Athlon XP (Barton)@ 2500MHz doesn't exist.
AMD Athlon MP (Core??)@ 2600MHz doesn't exist.
AMD Athlon XP (Barton)@ 2400MHz doesn't exist. The fastest Athlon XP with a Barton core is HP's Athlon XP 3200+ @2.33Ghz.
AMD Athlon XP (Thoroughbred) 2600MHz doesn't exist.

AMD Athlon's so-called rating is just model numbers.

I use dnetc v2.9003-481-GTR-03030111 for Win32.

PS; My AMD K7 Athlon XP 2600+ @ 2.08Ghz with Thoroughbred-B (256KB L2) core yields ~7,900,000 (with other server applications in operation).

Please note that there are several “Athlon XP 2600+” types in the market i.e.
1. Thoroughbred-A/B @ 2.16 Ghz 266FSB.
2. Thoroughbred-B @ 2.08Ghz 333FSB.
3. I’m not aware of a Barton core with a “2600+. model” number. Who knows what AMD can think of next?
4. Barton @ 1.8Ghz has a model 2500+.
Title: Re: Some interesting Altivec figures
Post by: Kronos on January 01, 2004, 04:07:09 AM
@bloodline

?????
The number for the AthlonXP is about 40% higher than for the G4, and I have
a hard time believing that an 2600+ would actually runs at only 1.4GHz, the
same speed that my (1st gen) 1600XP runs at  :-o
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 01, 2004, 04:09:51 AM
Quote
On the vast majority of tasks a modern x86 processor will crush a G4, if only because PPC systems tend to have poor infrastructure surrounding the CPU (PC133 memory, slow busses, etc).

One could compare G5 vs K8 vs K7 vs PIV EE vs PIV-C 3.2Ghz.

Note that AMD Athlons XP 2600+ can be still be installed on MSI-6330 V5 (i.e. VIA KT133A/PC133 based).
Title: Re: Some interesting Altivec figures
Post by: Aragorn on January 01, 2004, 04:48:20 AM
3. I’m not aware of a Barton core with a “2600+. model” number. Who knows what AMD can think of next?

My Athlon XP Barton is a 2600+
it runs at 1.92GHz
Title: Re: Some interesting Altivec figures
Post by: Piru on January 01, 2004, 07:18:14 AM
I've updated the KKS 7450 core to latest version. The RC5-72 result is now 10,678,428 keys/sec. The old core did 10,002,868 keys/sec.
Title: Re: Some interesting Altivec figures
Post by: KennyR on January 01, 2004, 11:15:37 AM
Quote
Hammer wrote:
RC5 is not the only benchmark to test for raw number-crunching abilities of the CPU and its internal cache and registers.

Why not OpenSSL benchmarks (it should fit within full size L2 cache)?


This isn't really an exercise to prove any magical superiority of PPC over x86, simply to show how powerful Altivec was in its element. RC5 just happened to be around. As Dr_Bombcrater said, it just happens to depend on how good the CPU core is for certain instructions. Thats why x86 are relatively weak at RC5 - they had many normally obscure instructions like those used for rc5 removed for better overall speed.

Oh, and isn't L2 cache external? Seems to me you can get a lot more number-crunching speed by not using external cache at all, and I think the RC5 core does fit in L1.

Anyway...

Since you know a lot about CPUs, can you answer this - why was an equivalent of Altivec not implemented in x86 cores? Was it a marketing issue (with Altivec speed boost being 'invisible' to consumers, and higher clock speed being very visible)? Was it not possible to implement? Or was it just useless?
Title: Re: Some interesting Altivec figures
Post by: Aragorn on January 01, 2004, 04:31:54 PM
"Oh, and isn't L2 cache external? Seems to me you can get a lot more number-crunching speed by not using external cache at all, and I think the RC5 core does fit in L1."

Depends on the processor AMD k6-2 have external level2 old pentium had external. Those slot pentium3
have sort of external level2. And PPC 604 and 603 have external level2 if they have any at all.
Dont know how it is with new PPC chips.
k6-3, Athlon(XP,64), duron, opteron, p4 all have level2 in the processor. In the case of the k6-3 since it used normal super socket7 like the k6-2 it used the cache on the moderboard as level3
 :-o
Title: Re: Some interesting Altivec figures
Post by: bloodline on January 01, 2004, 04:49:21 PM
Quote

Since you know a lot about CPUs, can you answer this - why was an equivalent of Altivec not implemented in x86 cores? Was it a marketing issue (with Altivec speed boost being 'invisible' to consumers, and higher clock speed being very visible)? Was it not possible to implement? Or was it just useless?
 


It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2... :-)
Title: Re: Some interesting Altivec figures
Post by: KennyR on January 01, 2004, 04:52:17 PM
Quote
Bloodline wrote:
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2...


Boo, they were marketing gimmicks that only slowed the CPU down by adding more instructions to increase the instruction decode time per cycle. Their effect was negligible. Altivec's obviously isn't.
Title: Re: Some interesting Altivec figures
Post by: bloodline on January 01, 2004, 05:01:56 PM
Quote

KennyR wrote:
Quote
Bloodline wrote:
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2...


Boo, they were marketing gimmicks that only slowed the CPU down by adding more instructions to increase the instruction decode time per cycle. Their effect was negligible. Altivec's obviously isn't.


The Altivec is just an FPU that is designed to perform vector math very fast... that is what those "Marketing Gimmicks" are too...

The original Intel version of MMX sucked as it used the x87 registers, but AMD and all later varients of thses units add their own registers.
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 12:32:29 AM
Quote

KennyR wrote:
Quote
Bloodline wrote:
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2...


Boo, they were marketing gimmicks that only slowed the CPU down by adding more instructions to increase the instruction decode time per cycle. Their effect was negligible. Altivec's obviously isn't.

Actually, it does make a difference IF it done right for a certain X86 processor i.e. to remain competitive with AMD Athlon XP, Intel’s Pentium IV have rely on SSE2 code more than X87 code.

Note that PowerPC 970 has to decode or “crush” its PowerPC instructions for the relatively  new out-of-order post-RISC core.  

Pentium MMX’s design has almost zero relation to the modern  RISC86 cores.
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 12:49:54 AM
Quote
This isn't really an exercise to prove any magical superiority of PPC over x86, simply to show how powerful Altivec was in its element

One could also use PPC optimised CineBench (Beta)(MacOS) for such things.

Quote
they had many normally obscure instructions like those used for rc5 removed for better overall speed.

They could be saving on the transistor count...

Quote
Oh, and isn't L2 cache external?

Ever since Celeron 300A and most modern X86 cores has integrated L2 cache and these are;
- Pentium III
- Pentium M
- Pentium IV
- K8 Opteron/AthlonFX/Athlon64
- K7 Athlon Thunderbird (not Athlon Classic)
- K7 Athlon XP
- K7 Duron
- K6-III

One could include Intel’s Pentium Pro since it has full speed L2 cache but with two dies.  
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 12:55:17 AM
Quote

Aragorn wrote:
3. I’m not aware of a Barton core with a “2600+. model” number. Who knows what AMD can think of next?

My Athlon XP Barton is a 2600+
it runs at 1.92GHz

Sounds logical enough...
Title: Re: Some interesting Altivec figures
Post by: Blitter on January 02, 2004, 01:01:09 AM
It sure seems logical.  Heck my Athlon 64 FX-51 3200+ runs at a cool 2.0Ghz.

Yes it's definitly as fast as a Barton core Athlon running at 3.2Ghz like the name would describe.  but that's just in 32bit operations.  Move it to 64bit oprations and there's nothing Intel has to offer that will touch it.  The Itanium is a cludge,,, just look at the coding reviews on it.  Hell M$ of all ppl won't even embrace it and told Intel to change or suck it, basically.
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 01:22:06 AM
Quote

Blitter wrote:
It sure seems logical.  Heck my Athlon 64 FX-51 3200+ runs at a cool 2.0Ghz.

Note that "Athlon FX-51" (Sledge Hammer core) runs at 2.2Ghz, while "Athlon 64 3200+"/"3000+"(Claw Hammer core) runs at 2.0Ghz.
Title: Re: Some interesting Altivec figures
Post by: Blitter on January 02, 2004, 01:25:05 AM
Quote
Note that "Athlon FX-51" (Sledge Hammer core) runs at 2.2Ghz, while "Athlon 64 3200+"/"3000+"(Claw Hammer core) runs at 2.0Ghz.


You are correct, my mental fudge.
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 01:30:31 AM
Quote
Hell M$ of all ppl won't even embrace it and told Intel to change or suck it, basically.

MS Windows Anvil (AMD64 edition) is not quite ready (currently at Beta stage) for RTM status.

MS Windows Anvil is quite different to MS Windows XP Itanium Edition since Anvil is geared towards legacy and high performance gaming. MS Windows XP Itanium Edition is just geared towards PC workstations (e.g.  Itanium Deerfield base systems) type activities.
Title: Re: Some interesting Altivec figures
Post by: Blitter on January 02, 2004, 01:37:48 AM
Quote
MS Windows Anvil (AMD64 edition) is not quite ready (currently at Beta stage) for RTM status.

MS Windows Anvil is quite different to MS Windows XP Itanium Edition since Anvil is geared towards legacy and high performance gaming. MS Windows XP Itanium Edition is just geared towards PC workstations (e.g. Itanium Deerfield base systems) type activities.


I guess that was my point.  The Itamium, which Itel was actaully going to market as a destop CPU... is far from that.  Not only did it NOT meet expectations in the workstation/server market, but it also floundered as a desktop CPU.  Hence the reason why I mention it as being a "Kludge" of a CPU.  Okay, I may have mis-quoted myself there, but I stil stand buy the architecture of the Itanium being ####e!

But that's just my oppinion and I'm entitled to it. :-P

Happy New Year!
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 01:50:26 AM
Quote
Yes it's definitly as fast as a Barton core Athlon running at 3.2Ghz like the name would describe. but that's just in 32bit operations

Careful with generalisations e.g. in bandwidth bias apps, games and SSE2 type activities the Athlon FX-51 is rivals P4 EE, while Athlon 64 3200+ rivals P4-C 3.2Ghz .

The fastest Athlon XP 3200+ @ 2.33Ghz variant, can still win some non-gaming benchmarks against Athlon FX51 @2.2Ghz.  
Title: Re: Some interesting Altivec figures
Post by: Hammer on January 02, 2004, 02:05:20 AM
Quote
but I stil stand buy the architecture of the Itanium being ####e!

I didn’t say Itanium was a “cost effective” workstation PC btw. In relation to Itanium and for "bang for buck" cases; even Apple’s PowerMac G5 has a chance**

PS; Tweaking for legacy and high performance gaming may require more development time btw...