Welcome, Guest. Please login or register.

Author Topic: 2.6GHz WinUAE = 10 x 50MHz 68030  (Read 3207 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Ilwrath

Re: 2.6GHz WinUAE = 10 x 50MHz 68030
« on: May 03, 2004, 04:28:25 AM »
Quote
The Benchmark program Sysinfo tells me that
my PC is 100 x as fast as my 50MHz 68030
and 250 x as fast as my 50MHz 68882 FPU
(PC vs A1200),

sadly the truth of the speed up is a mere 10 x,

Building Ghostscript 8.13 on my A1200 takes some 16 hours,

but on WinUAE it takes 1 hour 38 minutes,

ie its only 10 x as fast,


To me, a 10x improvement over a 50mhz 68030 sounds like a very impressive mark!  Especially for a processor as crappy as a Celeron!  

I mean, that's an emulated 500mhz 68030.  From a budget low-cache 2600mhz chip.  That's only 5 Celeron cycles per emulated 68030 cycle.  Honestly, I doubt the emulation (even with JIT) is that tight.  

As you found, file I/O plays a huge factor in compile times.  I imagine your Amiga HD is probably slower than the hard drive on your emulator.  This is slowing down the Amiga's compile time...

And, of course, the real moral is, don't believe artificial benchmarks run on emulated systems.  Aren't there 1000 threads about this on here?
 

Offline Ilwrath

Re: 2.6GHz WinUAE = 10 x 50MHz 68030
« Reply #1 on: May 03, 2004, 10:43:44 PM »
Quote
But Sysinfo says the emulated machine is 108 x as fast,


Yes, but Sysinfo doesn't realize the machine is emulated.  What happens with the WinUAE JIT emulation is that the 68000 (020/030, whatever) series instructions are translated on the fly into x86 instructions.  During this translation there is an optimization phase to improve performance.  This stage is "optimizing" the benchmark, and making it invalid.  Most benchmarks work by looping through a section of code multiple times.  The emulation realizes what is happening, and keeps the translated code available, thus artificially raising speeds.  Or, sometimes, the JIT even optimizes the entire loop out of existance, and only returns a final result.  (Those are usually the things that crash, because the emulator returns such an outlandishly high performance result.)

So, you see, it wasn't out of FUD or malice that emulation benchmarks don't work.  It's that it's really hard to measure emulated hardware.  The emulation is often smarter than the tests.  ;-)

Quote
can you put my CPU in perspective in terms of cache size + speed


The 2.6/400 is a pretty fast chip, but it has some issues...  3.4/800 I think is the fastest P4 on the market.  (We won't go into Opteron and Itanium stuff here -- I'm trying to keep this fairly short, and not mis-state anything -- if I do make a mistake, someone please feel free to correct me.)  

Caching -- The Celeron has 128K.  This is rather small for a modern processor.  The P4 has 512K, and the P4 Extreme has a full 1MB.  Anyhow, cache is kind of the working area, of sorts, for the processor.  There's lots of complex formulas and associative theories on how processors determine what gets cached and what gets flushed, etc.  To be honest, I don't really understand that part too well.  It's magic to me.  But, obviously, a smaller cache can't hold as much data as a larger cache.  More about this in a bit...

Ok.... Front Side Bus.  The 400mhz is your FSB speed.  That's your link between the processor and it's RAM and Bus area.  The 400mhz is fast, but the full P4 is able to double that up, to an effective 800mhz.  (These numbers aren't quite real, there is some "Double Data Rate" trickery, as well as interleaving, etc.)  But, anyhow, the basic idea is that under some conditions, Celeron RAM access can be quite a bit slower than the full P4.

Processor clock...  2.6ghz is a mighty fast processor clock.  That's the rate that the processor can work at with data it already has in it's cache.  This is the strength of the Celeron.  It has a fast clock for cheap.  

So a Celeron is quite fast for highly optimized code that fits in it's cache.  Now, the problem is, when things don't fit in the cache.  This happens more often on the Celeron, because it simply has less cache.  And it's a double-whammy, because now you have RAM access involved, which is over the slower 400mhz (effective) bus.  

So more random/less optimized code can really bog a Celeron down a lot.  I would think that because of this, emulations would not run well on a Celeron.  I haven't actually put this to the test.  If you'd like, PM me, and maybe we can set up a P4 UAE test vs. a Celeron UAE.  Might be interesting results.  It may be a chance to prove my theory wrong.  

Anyhow, this concludes this lesson on processors and emulations, as I understand them.  ;-)