Welcome, Guest. Please login or register.

Author Topic: Sam440ep Memory Test  (Read 3586 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Hammer

  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 1996
  • Country: 00
    • Show all replies
Re: Sam440ep Memory Test
« on: March 11, 2007, 09:44:14 AM »
@DamageX  
Try using the same stream_d benchmark.

In reference to "sam_memory_test.pdf"
http://www.sam440.com/eng/images/sam_memory_test.pdf

/usr/checkbench/stream # ./stream_d

This system uses 8 bytes per DOUBLE PRECISION word.
...
Function Rate (MB/s) RMS time Min time Max time
Copy: 285.7041 0.1121 0.1120 0.1121
Scale: 270.4713 0.1184 0.1183 0.1185
Add: 256.4075 0.1873 0.1872 0.1875
Triad: 250.5925 0.1916 0.1915 0.1918

----------------------------------------
Using a real AMD64 processor not some Sempron junk.

AMD Opteron 248 @2200Mhz with PathScale EKO complier.

Compiler: PathScale EKO Compiler Suite, Release 1.1
           Model Name: ASUS SK8N Motherboard, AMD Opteron (TM) Model 248
                  CPU: AMD Opteron 248
              CPU MHz: 2200
                  FPU: Integrated
       CPU(s) enabled: 1 core, 1 chip, 1 core/chip
      Secondary Cache: 1024KB (I+D) on chip
               Memory: 4x512MB, DDR400, PC3200, Corsair, CL2
     Operating System: SuSE Linux 9.0 (AMD64) 2.4.21-102-default

> pathcc -Ofast -lm stream_d.c second_wall.c -o 1.1ccOfast
> ./1.1ccOfast
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 7885 microseconds.
    (= 7885 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 4613.5614 0.0070 0.0069 0.0072
Scale: 4520.3330 0.0071 0.0071 0.0071
Add: 4558.4067 0.0106 0.0105 0.0107
Triad: 4554.9003 0.0105 0.0105 0.0106


--------------------
Using SunFire V40z with 4 x Opteron 2.6GHz processors.

Sun Studio 11(MB/s) complier.
Sun Studio: -fast -xarch=amd64a -xvector=simd -xprefetch -xprefetch_level=3
(setenv PARALLEL 1)
Copy
 4658

Scale
 4614
 
Add
 4628
 
Triad
 4627

Sun Studio 11(MB/s) with  4 processor Automatic Parallelization switch.

For Automatic parallelization:
cc -fast -xarch=amd64a -xvector=simd -xprefetch -xprefetch_level=3 -xautopar stream_d.c second.c
(setenv PARALLEL 4)

Copy
 18120
 
Scale
 18108
 
Add
 17758
 
Triad
  17626

-----------------------------------------------
Same machine with GCC4.1 (MB/s)complier.

GCC 4.1: -O3 -funroll-all-loops -ffast-math -fpeephole -m64 -mtune=k8 -fprefetch-loop-arrays

Copy
 2766
 
Scale
 2745
 
Add
 2970
 
Triad
  2969

AMD64 goes well with SUN’s Studio 11 and PathScale's EKO compliers.

The GCC compiler isnt able to exploit this type of scalability since doesn't support automatic parallelization or OpenMP.

Note that there are two main types of Semprons i.e. K7 based (supports only single MCH via FSB) and K8 based(available in single MCH via Socket 754 or Dual MCH via Socket AM2/Socket S1).

SUN’s Studio 11 is avilable as a free download.
Amiga 1200 PiStorm32-Emu68-RPI 4B 4GB.
Ryzen 9 7900X, DDR5-6000 64 GB, RTX 4080 16 GB PC.