Amiga.org
Operating System Specific Discussions => Amiga OS => Amiga OS -- Development => Topic started by: Karlos on August 31, 2004, 12:50:43 AM
-
Hi folks,
I need to gather some performance information on the memory bandwidth available to classic systems using PowerPC cards.
I have written a small test app, available here (http://www.nyteshade.com/karlos/util/memspeedwos.lzx).
It requires:
AmigaOS 3.x (3.0 untested) / WarpOS 4+
Graphics card capable of supporting a 16-bit workbench
About 1.5MB fast RAM. 256KB chip RAM
It will open a 640x480 window on your 16/24/32-bit workbench (which it uses to obtain a valid VRAM offscreen surface for testing) - hence the RTG requirements.
If you can spare 5 minutes, please download it and run it. It will produce a listing like this one (from my system) that your should redirect to a file:
Memory/Bus bandwidth estimation (WarpOS/PPC) (c) Karl Churchill 2004
System Info
CPU : 603e [PVR:0x00070201]
CPU : 240.000 MHz
FSB : 60.000 MHz
Estimating VRAM access bandwidth
Read VRAM (64-bit) : 5357.07 KB/s
Read VRAM (32-bit) : 5167.71 KB/s
Read VRAM (16-bit) : 2568.80 KB/s
Read VRAM (8-bit) : 1297.22 KB/s
Write VRAM (64-bit) : 15212.21 KB/s
Write VRAM (32-bit) : 14341.32 KB/s
Write VRAM (16-bit) : 7136.84 KB/s
Write VRAM (8-bit) : 3573.41 KB/s
Estimating fast RAM access bandwidth [normal]
Read FAST (64-bit) : 70531.72 KB/s
Read FAST (32-bit) : 65485.35 KB/s
Read FAST (16-bit) : 59806.60 KB/s
Read FAST (8-bit) : 50414.33 KB/s
Write FAST (64-bit) : 40211.58 KB/s
Write FAST (32-bit) : 40062.42 KB/s
Write FAST (16-bit) : 40174.10 KB/s
Write FAST (8-bit) : 40063.94 KB/s
Estimating fast RAM access bandwidth [writethrough]
Read FAST (64-bit) : 70779.57 KB/s
Read FAST (32-bit) : 65424.45 KB/s
Read FAST (16-bit) : 59734.12 KB/s
Read FAST (8-bit) : 50471.28 KB/s
Write FAST (64-bit) : 44196.65 KB/s
Write FAST (32-bit) : 27602.69 KB/s
Write FAST (16-bit) : 13950.28 KB/s
Write FAST (8-bit) : 7001.60 KB/s
Estimating chip RAM access bandwidth
Read CHIP (64-bit) : 4280.86 KB/s
Read CHIP (32-bit) : 4251.75 KB/s
Read CHIP (16-bit) : 4238.68 KB/s
Read CHIP (8-bit) : 4254.50 KB/s
Write CHIP (64-bit) : 2680.55 KB/s
Write CHIP (32-bit) : 2695.44 KB/s
Write CHIP (16-bit) : 2699.28 KB/s
Write CHIP (8-bit) : 2680.32 KB/s
Estimating fast RAM [normal] to fast RAM [normal] bandwidth
FAST -> FAST (64-bit) : 39994.81 KB/s
FAST -> FAST (32-bit) : 40019.66 KB/s
FAST -> FAST (16-bit) : 39952.29 KB/s
FAST -> FAST (8-bit) : 31528.36 KB/s
Estimating fast RAM [normal] to fast RAM [writethrough] bandwidth
FAST -> FAST (64-bit) : 27710.04 KB/s
FAST -> FAST (32-bit) : 20138.16 KB/s
FAST -> FAST (16-bit) : 11738.51 KB/s
FAST -> FAST (8-bit) : 6379.13 KB/s
Estimating fast RAM [normal] to VRAM copy bandwidth
FAST -> VRAM (64-bit) : 12474.13 KB/s
FAST -> VRAM (32-bit) : 11891.99 KB/s
FAST -> VRAM (16-bit) : 6467.78 KB/s
FAST -> VRAM (8-bit) : 3399.76 KB/s
Estimating VRAM to RAM [normal] copy bandwidth
VRAM -> FAST (64-bit) : 4739.20 KB/s
VRAM -> FAST (32-bit) : 4593.28 KB/s
VRAM -> FAST (16-bit) : 2426.28 KB/s
VRAM -> FAST (8-bit) : 1257.86 KB/s
Estimating VRAM to RAM [writethrough] copy bandwidth
VRAM -> RAM (64-bit) : 4851.96 KB/s
VRAM -> RAM (32-bit) : 4461.07 KB/s
VRAM -> RAM (16-bit) : 2251.81 KB/s
VRAM -> RAM (8-bit) : 1119.06 KB/s
Estimating FAST [normal] to CHIP copy bandwidth
FAST -> CHIP (64-bit) : 2550.21 KB/s
FAST -> CHIP (32-bit) : 2548.12 KB/s
FAST -> CHIP (16-bit) : 2546.74 KB/s
FAST -> CHIP (8-bit) : 2530.97 KB/s
Estimating CHIP to RAM [normal] copy bandwidth
CHIP -> FAST (64-bit) : 3883.21 KB/s
CHIP -> FAST (32-bit) : 3901.75 KB/s
CHIP -> FAST (16-bit) : 3870.20 KB/s
CHIP -> FAST (8-bit) : 3760.80 KB/s
Estimating CHIP to RAM [writethrough] copy bandwidth
CHIP -> FAST (64-bit) : 3911.78 KB/s
CHIP -> FAST (32-bit) : 3734.43 KB/s
CHIP -> FAST (16-bit) : 3285.09 KB/s
CHIP -> FAST (8-bit) : 2632.86 KB/s
Usual disclaimer : program is a strain test and may contain bugs, run it at your own risk.
That said, I ran it dozens of times on my machine during development with no ill effects ;-)
Please email any results to karlchurcill@gmail.com
(yes, I misspelled my own surname when registering)
Thanks in advance,
K
-
Did you get enough info from me, or do you want my slightly unfair results? :-D
-
Oops,
Forgot to add, please give a little info on your system:
Model Amiga (A1200, A4K, A3K etc)
Model Accelerator & 680x0 CPU
Memory capacity and speed (eg 60ns, 70ns etc)
Graphics card & interface used (local bus, mediator etc)
AmigaOS / WarpOS / CGX / P96 Versions
Number of fillings, name of dentist :lol:
Thanks :-)
-
redrumloa wrote:
Did you get enough info from me, or do you want my slightly unfair results? :-D
:-D
All information is useful.
BTW I emailed you something (nicknamed "relativity") recently but don't know if you got it :-/
I'd **really** like to see the results of that on your system if you can find a moment to run it :-D
-
Hi!
Results sent!
Cheers
Dragster
-
I got the following error when I attempted to run the program:
-------
10.System39:> Memory/Bus bandwidth estimation (WarpOS/PPC) (c) Karl Churchill 2004
Error creating context: 8
Couldn't create a context
-------
Brian
-
I got the same thing when I tried running it under P96. It worked properly with CGX.
-
Just out of curiosity I ran the test on my Peg2 7447 1GHz + 1 GB DDR + Radeon 9200SE. Works fine with MorphOS WOS emulation, and the results are, to put it mildly, quite crushing... :-)
-
out of curiosity i tried to run on my A1 os4 pre-release
but no luck
-
BrianJHoskins wrote:
I got the following error when I attempted to run the program:
-------
10.System39:> Memory/Bus bandwidth estimation (WarpOS/PPC) (c) Karl Churchill 2004
Error creating context: 8
Couldn't create a context
-------
Brian
Ahh. Error 8 implies it couldn't open a library - I forgot to check the CGX version required and blithely used 42 (this has caught me out before too!). It only needs some basic v3 functionality (locking bitmaps, querying them etc).
I've recompiled and uploaded. This one should work under P96.
-
Piru wrote:
Just out of curiosity I ran the test on my Peg2 7447 1GHz + 1 GB DDR + Radeon 9200SE. Works fine with MorphOS WOS emulation, and the results are, to put it mildly, quite crushing... :-)
Interesting :-)
I couldn't help noticing your normal and writethrough timings were the same. Which made me think...
I don't expect that these results are particularly accurate depending on how big your L2 cache is :-D
The program was aimed at classics and as such it allocates:
1 bitmap 640x480 (hardware rounded) at 16 bit (or higher depending on WB depth). Thats 600K VRAM (more for 24/32 bit of course).
1 fast ram area of 512K, normal cache settings
1 fast ram area of 512K, writethrough cache settings
1 chip ram area of 256K (not cahceable on classic)
IIRC, the G4 has sufficient L2 cache (512K) to completely invalidate this test.
I can recompile a version that opens a much larger window and allocates larger fast ram areas (better still, make it user definable!), which should give more control over these things.
@poweramiga2002
There's no WOS emulation in OS4 at the moment. I plan to make an OS4 native version anyway :-)
-
@Karlos
Yeah, 7447 has 512KB L2 cache, so the fastmem buffer seems to fit it completely. Ah, regarding writethough, you use the special WOS memory allocation flag for that? If so, I think wosemu just ignores the flag for now...
VRAM is noncacheable always, so buffersize doesn't really matter (other than timing accuracy).
-
@Piru
I use an accumulation based benchmark, where I time a variable number of iterations until a total time (in this case, 2 seconds) is exceeded (the actual time is recorded, not assumed to be 2 seconds of course):
The pseudocode is
iterations = 0;
while (totaltime < sampletime)
{
lock hardware;
disable task switching & interrupts;
reset timer;
perform operation;
get elapsed time;
enable task switching & interrupts;
unlock hardware;
totaltime = totaltime + elapsed time;
iterations = iterations + 1;
}
The result is then (iterations * datasize) / totaltime
I find that this gives the most reliable overall results compared to fixed iteration/variable time (which obviously gets less accurate for faster systems).
The actual tests (read/write/copy) are written in PPC asm using a Duffs Device (my favourite) style unrolled loop. They should be about as fast as possible.
On classic, the tests are also run with 680x0 task switching and interrupts disabled (just for each iteration, not the complete time). The time recorded (using GetSysTimePPC()) is just for the operation itself, hence context switch timings are hopefully irrelavent.
The writethrough memory is obtained using MEMF_WRITETHROUGH for AllocVecPPC().
I'll knock up a version that should more fairly strain a Pegasos.
-
:bump:
I know there's still more PPC users out there :-D
-
I have just downloaded it and tried it out but the "error 8" problem with p96 seems to be still there.
-
Strange :-/
I'll have another look tonight. Probably I need to lower the cybergraphics.library version again. Or better still, make it prefer picasso96 directly if it finds it. I plan to add some more tests to it (scattered memory read/write for one), and change the test to allocate as large as possible memory buffers. This will allow the Peg users who tested it to get a fairer estimate of their memory speed (and not just their L2 caches ;-) ) as well as paving the way for an OS4 native version, which would have the same L2 cache issues on an A1.
Anyway, I'd like to say thanks to everybody so far for their time.
-
Don't shout, don't shout, I'm comming.
Cannot open intuition.library ver.40
My hardware: PPC603@210. ;-)
-
Ah, b*llocks. I must have uploaded the original version twice instead of the recompiled library version fix :lol:
There is no other reason for it wanting to open v40 inuition, the last build *definately* requests v39.
I'll re upload it tonight (complete with some sort of large buffer allocation for those with plenty of RAM and/or Peg I/II).
-
Hi again,
I've uploaded the (hopefully) P96 friendly version (uses CGX3 only). It might also work on OS3.0 (I lowered the graphics/intuition requirements to v39).
Didn't yet write a large memory version so anybody wanting to measure their L2 performance on a G4 peg can still do so :-D
I will fix that when I get a moment.
-
hows the OS4 version going ?
-
poweramiga2002 wrote:
hows the OS4 version going ?
When it's done :-D
Seriously, the code needs updating a little before that. Benchmarking for A1 (and Peg) has to account for the large caches they have.
I was primarily interested only in the classic performance, but I will have to have to ask an A1 owner I know if I can write it on his since my own OS4 installation is a tad bit ropey at present :-(
-
I've uploaded the (hopefully) P96 friendly version (uses CGX3 only). It might also work on OS3.0 (I lowered the graphics/intuition requirements to v39).
Ok now it works on my system with P96, have just sent you the results.
-
@all
Still no OS4 native version :-( but I did update the code in preperation for a port.
It now allocates up to 32MB buffers for testing (to mitigate the effects of L2 caches) and can be given width/height parameters on the command line to open a larger window (default is still 640x480) to get a larger VRAM surface (or indeed smaller for people using 4MB cards).
usage: test width height
Would those Pegasos users who ran the original consider running the updated version here (http://www.nyteshade.com/karlos/util/memspeedwos.lzx) ?
Thanks :-)
The previous version reported totally unrealistic memory speeds on systems with 512K L2 cache since the test buffer was only 512K ;-)
-
Results sent :)
-
itix wrote:
Results sent :)
Cheers :-)
-
I just emailed you my results as well from my Pegasos II G4 system.
EDIT: I guess your mail didnt like it, here is the results below:
Memory/Bus bandwidth estimation (WarpOS/PPC) (c) Karl Churchill 2004
System Info
CPU : 7447/7457 (G4) [PVR:0x80020101]
CPU : 1000.000 MHz
FSB : 133.333 MHz
Fast [cache normal] allocated : 33554432 bytes at 0x214AA960
VRAM [noncacheable] allocated : 1228800 bytes at 0xE8985C00
System running MorphOS WarpUP emulation
Estimating VRAM access bandwidth
Read VRAM (64-bit) : 28914.40 KB/s
Read VRAM (32-bit) : 16266.58 KB/s
Read VRAM (16-bit) : 3615.66 KB/s
Read VRAM (8-bit) : 1414.92 KB/s
Write VRAM (64-bit) : 220626.04 KB/s
Write VRAM (32-bit) : 131799.60 KB/s
Write VRAM (16-bit) : 34544.66 KB/s
Write VRAM (8-bit) : 16834.74 KB/s
Estimating fast RAM access bandwidth [normal]
Read FAST (64-bit) : 223107.84 KB/s
Read FAST (32-bit) : 223039.96 KB/s
Read FAST (16-bit) : 222790.42 KB/s
Read FAST (8-bit) : 200725.95 KB/s
Write FAST (64-bit) : 446959.69 KB/s
Write FAST (32-bit) : 446791.03 KB/s
Write FAST (16-bit) : 442847.01 KB/s
Write FAST (8-bit) : 225898.79 KB/s
Estimating fast RAM [normal] to fast RAM [normal] bandwidth
FAST -> FAST (64-bit) : 177491.57 KB/s
FAST -> FAST (32-bit) : 168491.36 KB/s
FAST -> FAST (16-bit) : 153469.40 KB/s
FAST -> FAST (8-bit) : 121654.90 KB/s
Estimating fast RAM [normal] to VRAM copy bandwidth
FAST -> VRAM (64-bit) : 159743.28 KB/s
FAST -> VRAM (32-bit) : 107766.68 KB/s
FAST -> VRAM (16-bit) : 34014.99 KB/s
FAST -> VRAM (8-bit) : 16397.64 KB/s
Estimating VRAM to RAM [normal] copy bandwidth
VRAM -> FAST (64-bit) : 28514.57 KB/s
VRAM -> FAST (32-bit) : 16066.22 KB/s
VRAM -> FAST (16-bit) : 3614.28 KB/s
VRAM -> FAST (8-bit) : 1414.71 KB/s
-
Maybe you spelled my name right when you emailed me. Alas I didnt spell it right when I registered it :lol:
The G4 figures look far more realistic now :-)
-
Same systems as Acill's except I have a Voodoo3 3500 AGP as opposed to
his Radeon 7000 (?) AGP:
Memory/Bus bandwidth estimation (WarpOS/PPC) (c) Karl Churchill 2004
System Info
CPU : 7447/7457 (G4) [PVR:0x80020101]
CPU : 1000.000 MHz
FSB : 133.333 MHz
Fast [cache normal] allocated : 33554432 bytes at 0x21437720
VRAM [noncacheable] allocated : 614400 bytes at 0xEC5AA970
System running MorphOS WarpUP emulation
Estimating VRAM access bandwidth
Read VRAM (64-bit) : 17497.09 KB/s
Read VRAM (32-bit) : 13054.57 KB/s
Read VRAM (16-bit) : 7258.11 KB/s
Read VRAM (8-bit) : 3831.75 KB/s
Write VRAM (64-bit) : 224737.84 KB/s
Write VRAM (32-bit) : 140810.40 KB/s
Write VRAM (16-bit) : 39660.33 KB/s
Write VRAM (8-bit) : 19243.93 KB/s
Estimating fast RAM access bandwidth [normal]
Read FAST (64-bit) : 222906.23 KB/s
Read FAST (32-bit) : 222879.27 KB/s
Read FAST (16-bit) : 222625.05 KB/s
Read FAST (8-bit) : 200591.57 KB/s
Write FAST (64-bit) : 447184.60 KB/s
Write FAST (32-bit) : 447251.85 KB/s
Write FAST (16-bit) : 443442.91 KB/s
Write FAST (8-bit) : 226074.08 KB/s
Estimating fast RAM [normal] to fast RAM [normal] bandwidth
FAST -> FAST (64-bit) : 177374.39 KB/s
FAST -> FAST (32-bit) : 168457.04 KB/s
FAST -> FAST (16-bit) : 153520.09 KB/s
FAST -> FAST (8-bit) : 121715.30 KB/s
Estimating fast RAM [normal] to VRAM copy bandwidth
FAST -> VRAM (64-bit) : 172023.31 KB/s
FAST -> VRAM (32-bit) : 119516.86 KB/s
FAST -> VRAM (16-bit) : 39370.00 KB/s
FAST -> VRAM (8-bit) : 19291.89 KB/s
Estimating VRAM to RAM [normal] copy bandwidth
VRAM -> FAST (64-bit) : 17421.77 KB/s
VRAM -> FAST (32-bit) : 13017.88 KB/s
VRAM -> FAST (16-bit) : 7222.08 KB/s
VRAM -> FAST (8-bit) : 3826.98 KB/s
-
Hi,
Does anybody know if the April fix has any effect on the memory write speed for Peg1 ?
The only peg1 results I have seen so far are for a non-april version (G3 600MHz / 100MHz FSB) and I was extremely surprised that the write times were much slower than read times. Normally you'd expect similar (or better) write performance.
In fact, the write performance, 80MB/s to normal copyback memory, was only fractionally higher than a CSPPC with (604e 266MHz / 66MHz FSB) at 72MB/s, which would actually put the CSPPC in front MHz for MHz (for FSB speeds) :-?