Amiga.org
Amiga computer related discussion => Amiga Hardware Issues and discussion => Topic started by: KennyR on December 31, 2003, 03:55:33 PM
-
Dunno if people are aware of this already, but... Wanna see how powerful Altivec is?
RC5 isn't a general benchmark, let's get that straight first off. It's not even a particularly useful one. It takes no account of system speed, and only uses raw number-crunching abilities of the CPU and its internal cache and registers.
But anyway, my Pegasos G3/600 can do around 2 million keys per second. My Athlon 1.3GHz PC can do around 4 million. Pentiums are considerably weaker, and maybe more modern Athlons are too.
But from what I've heard, Pegasos-2 G4/1GHz using the Altivec core can manage 10 million keys per second. Presumably an AmigaONE/G4 can do the same. G4 Macs certainly can (http://volker.preil.bei.t-online.de/bench/bench-rc5-72.htm).
Note that this is not an open invitation for another boring x86 vs PPC thread (since this info has little practicality in everyday use), and I just thought people would be interested in seeing the power of Altivec, when used for what it was really designed for - raw maths.
-
More precise benchmarks (http://www.iki.fi/sintonen/dnetc-peg2-vec.txt).
-
May be i have to start using it again for team amiga, as I have a PowerMac duel 867 :-D
-
KennyR wrote:
But anyway, my Pegasos G3/600 can do around 2 million keys per second. My Athlon 1.3GHz PC can do around 4 million. Pentiums are considerably weaker, and maybe more modern Athlons are too.
Well you show me some Athlon64 benchmarks and cheer me up :-D
Anyway, happy new year, and what the hell happened to your boings and Rank?!?! :-o
-
Problem with more modern x86 is that they removed an instruction used by rc5, to streamline the core, or so I'm told - I don't know much about it. So Athlon64 wouldn't necessarily mean faster rc5 cracking, although it would beat the proverbial crap out of G4 every other way.
Isn't there an Athlon64 listed in that 2nd url I gave, too? Is it the same one you mean? If so it's about 6 million keys at 1.6GHz.
Anyway, happy new year, and what the hell happened to your boings and Rank?!?! :-o
I trancended. Happy new year, mortal. :-D
-
I trancended. Happy new year, mortal.
Ah, hence the avatar. Did you have to do in the DemiGod Santa Claus? :-P
-
http://n0cgi.distributed.net/speed/
OGR
my AMD Athlon XP Barton 2600+
14,937,961 nodes/sec
PowerPC 744x/745x G4 1000
10,680,517 nodes/sec
-
@Aragorn
That's OGR nodes. We're talking RC5-72 keys.
Edit: though it doesn't matter, sorry. I didn't see you were comparing different CPUs fairly.
-
Aragorn wrote:
http://n0cgi.distributed.net/speed/
OGR
my AMD Athlon XP Barton 2600+
14,937,961 nodes/sec
PowerPC 744x/745x G4 1000
10,680,517 nodes/sec
That looks about right, and what one would expect if we were to compare Clock Speed of the AthlonXP (rather than AMD PR ratings) and the G4.
That seems to support my argument that clock for clock the modern PPC and Athlons are virtually identical in performance :-)
-
bloodline wrote:
Aragorn wrote:
http://n0cgi.distributed.net/speed/
OGR
my AMD Athlon XP Barton 2600+
14,937,961 nodes/sec
PowerPC 744x/745x G4 1000
10,680,517 nodes/sec
That looks about right, and what one would expect if we were to compare Clock Speed of the AthlonXP (rather than AMD PR ratings) and the G4.
That seems to support my argument that clock for clock the modern PPC and Athlons are virtually identical in performance :-)
Thats because OGR doesnt use altivec ...
-
PowerPC 744x/745x G4 1000
10,680,517 nodes/sec
Wow. What a poor result. Dunno if that's from old OGR core or something, but I get:
13,016,835 nodes/sec.
PowerPC 7447 G4 1000. That's 21.9% more than the result on the page. MacOS overhead perhaps? :-)
-
RC5 isn't a general benchmark, let's get that straight first off. It's not even a particularly useful one. It takes no account of system speed, and only uses raw number-crunching abilities of the CPU and its internal cache and registers.
RC5 is not the only benchmark to test for raw number-crunching abilities of the CPU and its internal cache and registers.
Why not OpenSSL benchmarks (it should fit within full size L2 cache)?
But anyway, my Pegasos G3/600 can do around 2 million keys per second. My Athlon 1.3GHz PC can do around 4 million. Pentiums are considerably weaker,
Why not try it with Intel "Pentium M @1.3Ghz"?...
and maybe more modern Athlons are too.
In general terms, the Thunderbird core (Model 4) is considered weaker than Barton core (Model 10).
My old AMD K7 AThlon XP (Palomino Core) @1.5Ghz/FSB266/NT5.1, yields
RC5-72: [5,640,341 keys/sec]
I may try it on the other Athlon XP 2.2Ghz/400FSB/nForce2 400 Ultra and/or HP’s spec Athlon XP 2.33Ghz later…
-
RC5 depends heavily on specific bitwise rotation instructions, which are rarely used by anything else. The figures on that page don't tell you anything other than how fast those chips can run the RC5 client.
As I recall the old AMD K5 was a superstar at RC5 despite being an otherwise lacklustre processor simply because the instructions that RC5 executes repeatedly worked very quickly on the K5 core.
On the vast majority of tasks a modern x86 processor will crush a G4, if only because PPC systems tend to have poor infrastructure surrounding the CPU (PC133 memory, slow busses, etc).
I think we need to accept that and move on. CPU speed is no longer a critical factor in how useful a machine is anyway.
-
More precise benchmarks.
Some minor issues;
How does one get an AMD Athlon XP with "Palomino" core at 3200MHz?
My "Palomino" core max'ed out at 1.7Ghz with 1.8 core volts.
Some minor issues;
AMD Athlon XP (Barton)@ 2500MHz doesn't exist.
AMD Athlon MP (Core??)@ 2600MHz doesn't exist.
AMD Athlon XP (Barton)@ 2400MHz doesn't exist. The fastest Athlon XP with a Barton core is HP's Athlon XP 3200+ @2.33Ghz.
AMD Athlon XP (Thoroughbred) 2600MHz doesn't exist.
AMD Athlon's so-called rating is just model numbers.
I use dnetc v2.9003-481-GTR-03030111 for Win32.
PS; My AMD K7 Athlon XP 2600+ @ 2.08Ghz with Thoroughbred-B (256KB L2) core yields ~7,900,000 (with other server applications in operation).
Please note that there are several “Athlon XP 2600+” types in the market i.e.
1. Thoroughbred-A/B @ 2.16 Ghz 266FSB.
2. Thoroughbred-B @ 2.08Ghz 333FSB.
3. I’m not aware of a Barton core with a “2600+. model” number. Who knows what AMD can think of next?
4. Barton @ 1.8Ghz has a model 2500+.
-
@bloodline
?????
The number for the AthlonXP is about 40% higher than for the G4, and I have
a hard time believing that an 2600+ would actually runs at only 1.4GHz, the
same speed that my (1st gen) 1600XP runs at :-o
-
On the vast majority of tasks a modern x86 processor will crush a G4, if only because PPC systems tend to have poor infrastructure surrounding the CPU (PC133 memory, slow busses, etc).
One could compare G5 vs K8 vs K7 vs PIV EE vs PIV-C 3.2Ghz.
Note that AMD Athlons XP 2600+ can be still be installed on MSI-6330 V5 (i.e. VIA KT133A/PC133 based).
-
3. I’m not aware of a Barton core with a “2600+. model” number. Who knows what AMD can think of next?
My Athlon XP Barton is a 2600+
it runs at 1.92GHz
-
I've updated the KKS 7450 core to latest version. The RC5-72 result is now 10,678,428 keys/sec. The old core did 10,002,868 keys/sec.
-
Hammer wrote:
RC5 is not the only benchmark to test for raw number-crunching abilities of the CPU and its internal cache and registers.
Why not OpenSSL benchmarks (it should fit within full size L2 cache)?
This isn't really an exercise to prove any magical superiority of PPC over x86, simply to show how powerful Altivec was in its element. RC5 just happened to be around. As Dr_Bombcrater said, it just happens to depend on how good the CPU core is for certain instructions. Thats why x86 are relatively weak at RC5 - they had many normally obscure instructions like those used for rc5 removed for better overall speed.
Oh, and isn't L2 cache external? Seems to me you can get a lot more number-crunching speed by not using external cache at all, and I think the RC5 core does fit in L1.
Anyway...
Since you know a lot about CPUs, can you answer this - why was an equivalent of Altivec not implemented in x86 cores? Was it a marketing issue (with Altivec speed boost being 'invisible' to consumers, and higher clock speed being very visible)? Was it not possible to implement? Or was it just useless?
-
"Oh, and isn't L2 cache external? Seems to me you can get a lot more number-crunching speed by not using external cache at all, and I think the RC5 core does fit in L1."
Depends on the processor AMD k6-2 have external level2 old pentium had external. Those slot pentium3
have sort of external level2. And PPC 604 and 603 have external level2 if they have any at all.
Dont know how it is with new PPC chips.
k6-3, Athlon(XP,64), duron, opteron, p4 all have level2 in the processor. In the case of the k6-3 since it used normal super socket7 like the k6-2 it used the cache on the moderboard as level3
:-o
-
Since you know a lot about CPUs, can you answer this - why was an equivalent of Altivec not implemented in x86 cores? Was it a marketing issue (with Altivec speed boost being 'invisible' to consumers, and higher clock speed being very visible)? Was it not possible to implement? Or was it just useless?
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2... :-)
-
Bloodline wrote:
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2...
Boo, they were marketing gimmicks that only slowed the CPU down by adding more instructions to increase the instruction decode time per cycle. Their effect was negligible. Altivec's obviously isn't.
-
KennyR wrote:
Bloodline wrote:
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2...
Boo, they were marketing gimmicks that only slowed the CPU down by adding more instructions to increase the instruction decode time per cycle. Their effect was negligible. Altivec's obviously isn't.
The Altivec is just an FPU that is designed to perform vector math very fast... that is what those "Marketing Gimmicks" are too...
The original Intel version of MMX sucked as it used the x87 registers, but AMD and all later varients of thses units add their own registers.
-
KennyR wrote:
Bloodline wrote:
It is implemented in x86 CPUs, it's called MMX, MMX2, 3DNow!, SSE, and SSE2...
Boo, they were marketing gimmicks that only slowed the CPU down by adding more instructions to increase the instruction decode time per cycle. Their effect was negligible. Altivec's obviously isn't.
Actually, it does make a difference IF it done right for a certain X86 processor i.e. to remain competitive with AMD Athlon XP, Intel’s Pentium IV have rely on SSE2 code more than X87 code.
Note that PowerPC 970 has to decode or “crush” its PowerPC instructions for the relatively new out-of-order post-RISC core.
Pentium MMX’s design has almost zero relation to the modern RISC86 cores.
-
This isn't really an exercise to prove any magical superiority of PPC over x86, simply to show how powerful Altivec was in its element
One could also use PPC optimised CineBench (Beta)(MacOS) for such things.
they had many normally obscure instructions like those used for rc5 removed for better overall speed.
They could be saving on the transistor count...
Oh, and isn't L2 cache external?
Ever since Celeron 300A and most modern X86 cores has integrated L2 cache and these are;
- Pentium III
- Pentium M
- Pentium IV
- K8 Opteron/AthlonFX/Athlon64
- K7 Athlon Thunderbird (not Athlon Classic)
- K7 Athlon XP
- K7 Duron
- K6-III
One could include Intel’s Pentium Pro since it has full speed L2 cache but with two dies.
-
Aragorn wrote:
3. I’m not aware of a Barton core with a “2600+. model” number. Who knows what AMD can think of next?
My Athlon XP Barton is a 2600+
it runs at 1.92GHz
Sounds logical enough...
-
It sure seems logical. Heck my Athlon 64 FX-51 3200+ runs at a cool 2.0Ghz.
Yes it's definitly as fast as a Barton core Athlon running at 3.2Ghz like the name would describe. but that's just in 32bit operations. Move it to 64bit oprations and there's nothing Intel has to offer that will touch it. The Itanium is a cludge,,, just look at the coding reviews on it. Hell M$ of all ppl won't even embrace it and told Intel to change or suck it, basically.
-
Blitter wrote:
It sure seems logical. Heck my Athlon 64 FX-51 3200+ runs at a cool 2.0Ghz.
Note that "Athlon FX-51" (Sledge Hammer core) runs at 2.2Ghz, while "Athlon 64 3200+"/"3000+"(Claw Hammer core) runs at 2.0Ghz.
-
Note that "Athlon FX-51" (Sledge Hammer core) runs at 2.2Ghz, while "Athlon 64 3200+"/"3000+"(Claw Hammer core) runs at 2.0Ghz.
You are correct, my mental fudge.
-
Hell M$ of all ppl won't even embrace it and told Intel to change or suck it, basically.
MS Windows Anvil (AMD64 edition) is not quite ready (currently at Beta stage) for RTM status.
MS Windows Anvil is quite different to MS Windows XP Itanium Edition since Anvil is geared towards legacy and high performance gaming. MS Windows XP Itanium Edition is just geared towards PC workstations (e.g. Itanium Deerfield base systems) type activities.
-
MS Windows Anvil (AMD64 edition) is not quite ready (currently at Beta stage) for RTM status.
MS Windows Anvil is quite different to MS Windows XP Itanium Edition since Anvil is geared towards legacy and high performance gaming. MS Windows XP Itanium Edition is just geared towards PC workstations (e.g. Itanium Deerfield base systems) type activities.
I guess that was my point. The Itamium, which Itel was actaully going to market as a destop CPU... is far from that. Not only did it NOT meet expectations in the workstation/server market, but it also floundered as a desktop CPU. Hence the reason why I mention it as being a "Kludge" of a CPU. Okay, I may have mis-quoted myself there, but I stil stand buy the architecture of the Itanium being ####e!
But that's just my oppinion and I'm entitled to it. :-P
Happy New Year!
-
Yes it's definitly as fast as a Barton core Athlon running at 3.2Ghz like the name would describe. but that's just in 32bit operations
Careful with generalisations e.g. in bandwidth bias apps, games and SSE2 type activities the Athlon FX-51 is rivals P4 EE, while Athlon 64 3200+ rivals P4-C 3.2Ghz .
The fastest Athlon XP 3200+ @ 2.33Ghz variant, can still win some non-gaming benchmarks against Athlon FX51 @2.2Ghz.
-
but I stil stand buy the architecture of the Itanium being ####e!
I didn’t say Itanium was a “cost effective” workstation PC btw. In relation to Itanium and for "bang for buck" cases; even Apple’s PowerMac G5 has a chance**
PS; Tweaking for legacy and high performance gaming may require more development time btw...