Welcome, Guest. Please login or register.

Author Topic: What about the X1000 video bus performance?  (Read 6435 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline takemehomegrandmaTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2990
    • Show only replies by takemehomegrandma
What about the X1000 video bus performance?
« on: October 11, 2013, 04:31:56 PM »
OK, I was reading in this thread, and the first post contains some "RageMem" benchmarks between various OS4 machines. In the same thread, there is this post from Vox that adds X1000 numbers to the picture.

I have merged those two posts below and done some cleaning and rearranged the order to be fastest first, then descending. I also marked the X1000 in red to separate it from the G4's.

Code: [Select]
--- CPU ---

MAX MIPS: 4194 // a1-xe @ 1.4 Ghz
MAX MIPS: 3797 // peg2 @ 1266 Mhz
[COLOR=Red]MAX MIPS: 3084  // X1000 @ 1.8 Ghz[/COLOR]


It's not news (but still surprising to many who expected more from reading marketing materials about the PA6T) that the X1000 performs worse than aggressively clocked G4's. Many benchmarks has confirmed this. The published specs about the PA6T (8800 MIPS, dualcore, wasn't it?) simply doesn't seem correct. Ah, well. It was a dead-end CPU before it was seriously commercialized anyway, so I guess it doesn't matter anymore.

Code: [Select]
--- L1 cache ---

[COLOR=red]READ64: 13677 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
READ64: 10660 MB/Sec  // a1-xe @ 1.4 Ghz
READ64:  9650 MB/Sec  // peg2 @ 1266 Mhz

[COLOR=red]WRITE32: 6850 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
WRITE32: 4569 MB/Sec  // a1-xe @ 1.4 Ghz
WRITE32: 4136 MB/Sec  // peg2 @ 1266 Mhz


Not commenting the values, but the results pretty much goes hand in hand with the clock frequency of the CPU's. As expected; L1 caches runs at the same speed as the CPU, so the more "GHz", the higher L1 transfer values.

Code: [Select]
--- RAM ---

[COLOR=red]READ32: 2860 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
READ32:  233 MB/Sec  // a1-xe @ 1.4 Ghz
READ32:  146 MB/Sec  // peg2 @ 1266 Mhz

[COLOR=red]WRITE64: 3388 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
WRITE64:  645 MB/Sec  // a1-xe @ 1.4 Ghz
WRITE64:  387 MB/Sec  // peg2 @ 1266 Mhz

WRITE: 733 MB/Sec (Tricky)  // peg2 @ 1266 Mhz
WRITE: 663 MB/Sec (Tricky)  // a1-xe @ 1.4 Ghz
[COLOR=red]WRITE: 352 MB/Sec (Tricky)  // X1000 @ 1.8 Ghz[/COLOR]


I think it would be interesting to know how how many memory modules Amigakit puts in the X1000 (is both memory controllers being used)? And how fast does the memory run?

BTW, what is the "Tricky" test? The X1000 obviously doesn't like doing tricky stuff...

Anyway, here comes the point of the post:

Code: [Select]
[b]--- VIDEO BUS ---

WRITE: 221 MB/Sec  // peg2 @ 1266 Mhz with unknown video card
WRITE: 169 MB/Sec  // a1-xe @ 1.4 Ghz with Radeon 9000 Pro
[COLOR=red]WRITE: 161 MB/Sec  // X1000 @ 1.8 Ghz with Radeon HD 6870 1GB[/COLOR][/b]


First, there is nothing unexpected in the G4 machine tests.

The maximum theoretical bandwidth for AGP 1x is 266 MB/s, but that's theoretical maximum and not what you get in practice. The Pegasos 2 has "AGP 1x", and the 221MB/s in this test is similar to other benchmarks and is about what you realistically can expect from a Pegasos 2.

The A1-XE is based on the notoriously flawed Articia-S Northbridge from MAI. It was marketed and sold as an AGP 2x chip (meaning a theoretical max transfer speed of 533MB/s). However, some benchmarks (here is one) did establish a long time ago that you in real life would only reach about "AGP 0,5x" (for a G3) to "AGP 0.7x" (for a G4) on a MAI Teron board (sold as "AmigaOne" by Eyetech). This benchmark confirm this. Terrible performance from a AGP 2x computer, but again, very expected when it comes to Articia-S. We already knew this.

But the X1000?! Whoa!! What's the matter with that?!

Either the benchmark is terribly flawed someway (it worked for the G4 systems though), or something is really borked in either the X1000 or PA6T HW, or in OS4 or the driver implementations. That number is so terrible that it simply can't possibly be true!! It should be close to twenty-five times faster!

So where is the flaw?

The test? The HW? The OS/Drivers?

:confused:
MorphOS is Amiga done right! :)
 

Offline Georg

  • Jr. Member
  • **
  • Join Date: Feb 2002
  • Posts: 90
    • Show only replies by Georg
Re: What about the X1000 video bus performance?
« Reply #1 on: October 11, 2013, 05:35:37 PM »
Normal CPU transfer over PCIe is slow. Nowhere near PCIe bandwidth. Can be slower than AGP machines. Saw it on x86 PC with AROS. Also on Linux/X11 (x11perf) with unaccelerated drivers.
 

Offline smf

Re: What about the X1000 video bus performance?
« Reply #2 on: October 11, 2013, 08:15:34 PM »
I hope that i wont have to spend to much money on the "new" car i'm looking for, i really want to get myself a X1000 now :) From what i have seen it looks quite superior to my old Peg2. The peg2 costed me a small fortune when i bought it many many years ago but it have served me well and hopefully the X1000 will serve me well many years too.
 

Offline takemehomegrandmaTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2990
    • Show only replies by takemehomegrandma
Re: What about the X1000 video bus performance?
« Reply #3 on: October 11, 2013, 08:28:32 PM »
Quote from: Georg;749837
Normal CPU transfer over PCIe is slow. Nowhere near PCIe bandwidth. Can be slower than AGP machines. Saw it on x86 PC with AROS. Also on Linux/X11 (x11perf) with unaccelerated drivers.

Wait a minute, what are you saying here?

That there is no DMA in the drivers? Or that the test is explicitly using CPU for video bus transfers?

:confused:
« Last Edit: October 11, 2013, 09:00:46 PM by takemehomegrandma »
MorphOS is Amiga done right! :)
 

Offline Hans_

Re: What about the X1000 video bus performance?
« Reply #4 on: October 11, 2013, 09:52:53 PM »
@takemehomegrandma

Quote
Code: [Select]
--- RAM ---

[COLOR=red]READ32: 2860 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
READ32:  233 MB/Sec  // a1-xe @ 1.4 Ghz
READ32:  146 MB/Sec  // peg2 @ 1266 Mhz

[COLOR=red]WRITE64: 3388 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
WRITE64:  645 MB/Sec  // a1-xe @ 1.4 Ghz
WRITE64:  387 MB/Sec  // peg2 @ 1266 Mhz

WRITE: 733 MB/Sec (Tricky)  // peg2 @ 1266 Mhz
WRITE: 663 MB/Sec (Tricky)  // a1-xe @ 1.4 Ghz
[COLOR=red]WRITE: 352 MB/Sec (Tricky)  // X1000 @ 1.8 Ghz[/COLOR]


I think it would be interesting to know how how many memory modules Amigakit puts in the X1000 (is both memory controllers being used)? And how fast does the memory run?

BTW, what is the "Tricky" test? The X1000 obviously doesn't like doing tricky stuff...

I'm guessing that he's using the CPU's cache prefetching/pre-allocation instructions to try to speed things up. There are two problems:
- The dcba instruction to allocate cache space is illegal on the G5 and PA6T. Any use of that will slow things right down (NOTE: he may or may not be using this, I don't know)
- The dcbz instruction is used to zero memory a cache-line at a time to avoid unnecessary fetches from RAM. However, it operates on 32-bytes, whereas the G5 and PA6T have longer cache-lines. Hence, using this instruction on those CPUs still causes unnecessary RAM fetches for the rest of the cache-line, and that gives a performance hit. The instruction to use is dcbzl

Quote
Anyway, here comes the point of the post:

Code: [Select]
[b]--- VIDEO BUS ---

WRITE: 221 MB/Sec  // peg2 @ 1266 Mhz with unknown video card
WRITE: 169 MB/Sec  // a1-xe @ 1.4 Ghz with Radeon 9000 Pro
[COLOR=red]WRITE: 161 MB/Sec  // X1000 @ 1.8 Ghz with Radeon HD 6870 1GB[/COLOR][/b]


First, there is nothing unexpected in the G4 machine tests.

The maximum theoretical bandwidth for AGP 1x is 266 MB/s, but that's theoretical maximum and not what you get in practice. The Pegasos 2 has "AGP 1x", and the 221MB/s in this test is similar to other benchmarks and is about what you realistically can expect from a Pegasos 2.

The A1-XE is based on the notoriously flawed Articia-S Northbridge from MAI. It was marketed and sold as an AGP 2x chip (meaning a theoretical max transfer speed of 533MB/s). However, some benchmarks (here is one) did establish a long time ago that you in real life would only reach about "AGP 0,5x" (for a G3) to "AGP 0.7x" (for a G4) on a MAI Teron board (sold as "AmigaOne" by Eyetech). This benchmark confirm this. Terrible performance from a AGP 2x computer, but again, very expected when it comes to Articia-S. We already knew this.

But the X1000?! Whoa!! What's the matter with that?!

Either the benchmark is terribly flawed someway (it worked for the G4 systems though), or something is really borked in either the X1000 or PA6T HW, or in OS4 or the driver implementations. That number is so terrible that it simply can't possibly be true!! It should be close to twenty-five times faster!

So where is the flaw?

The test? The HW? The OS/Drivers?

:confused:


The test is "flawed." Ragemem uses its own custom CPU copy routine to copy to VRAM, and it's not very good. There is a huge penalty for transferring data in small blocks on the PCIe bus. DMA is the only way to get fast transfer rates.

Having said that, you can still get better transfer rates without DMA by using altivec. Have a look at the MemCopy results in this GfxBench2D result.

Hans


P.S. Before you ask, no, the driver doesn't have DMA yet. It's on the to-do list.
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project.
 

Offline itix

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2380
    • Show only replies by itix
Re: What about the X1000 video bus performance?
« Reply #5 on: October 11, 2013, 09:56:38 PM »
Quote from: takemehomegrandma;749845
Wait a minute, what are you saying here?

That there is no DMA in the drivers? Or that the test is explicitly using CPU for video bus transfers?

:confused:


Those tests are using CPU to manipulate video memory. I don't know if CGX implementation in OS4 use DMA or not but it is not purpose of those benchmarks.
My Amigas: A500, Mac Mini and PowerBook
 

Offline takemehomegrandmaTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2990
    • Show only replies by takemehomegrandma
Re: What about the X1000 video bus performance?
« Reply #6 on: October 11, 2013, 10:38:21 PM »
@Hans and itix

Hans: "The test is 'flawed.' Ragemem uses its own custom CPU copy routine to copy to VRAM, and it's not very good."

Itix: "Those tests are using CPU to manipulate video memory."

OK, thank you both for your clarifications! I learned something here!

:)
MorphOS is Amiga done right! :)
 

Offline vox

  • Hero Member
  • *****
  • Join Date: Feb 2011
  • Posts: 862
    • Show only replies by vox
    • http://anticusa.wordpress.com
Re: What about the X1000 video bus performance?
« Reply #7 on: October 12, 2013, 06:02:08 AM »
Quote from: takemehomegrandma;749833
OK, I was reading in this thread, and the first post contains some "RageMem" benchmarks between various OS4 machines. In the same thread, there is this post from Vox that adds X1000 numbers to the picture.

I have merged those two posts below and done some cleaning and rearranged the order to be fastest first, then descending. I also marked the X1000 in red to separate it from the G4's.

Code: [Select]
--- CPU ---

MAX MIPS: 4194    // a1-xe @ 1.4 Ghz
MAX MIPS: 3797    // peg2 @ 1266 Mhz
[COLOR=Red]MAX MIPS: 3084  // X1000 @ 1.8 Ghz[/COLOR]
It's not news (but still surprising to many who expected more from reading marketing materials about the PA6T) that the X1000 performs worse than aggressively clocked G4's. Many benchmarks has confirmed this. The published specs about the PA6T (8800 MIPS, dualcore, wasn't it?) simply doesn't seem correct. Ah, well. It was a dead-end CPU before it was seriously commercialized anyway, so I guess it doesn't matter anymore.

Code: [Select]
--- L1 cache ---

[COLOR=red]READ64: 13677 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
READ64: 10660 MB/Sec  // a1-xe @ 1.4 Ghz
READ64:  9650 MB/Sec  // peg2 @ 1266 Mhz

[COLOR=red]WRITE32: 6850 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
WRITE32: 4569 MB/Sec  // a1-xe @ 1.4 Ghz
WRITE32: 4136 MB/Sec  // peg2 @ 1266 Mhz
Not commenting the values, but the results pretty much goes hand in hand with the clock frequency of the CPU's. As expected; L1 caches runs at the same speed as the CPU, so the more "GHz", the higher L1 transfer values.

Code: [Select]
--- RAM ---

[COLOR=red]READ32: 2860 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
READ32:  233 MB/Sec  // a1-xe @ 1.4 Ghz
READ32:  146 MB/Sec  // peg2 @ 1266 Mhz

[COLOR=red]WRITE64: 3388 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
WRITE64:  645 MB/Sec  // a1-xe @ 1.4 Ghz
WRITE64:  387 MB/Sec  // peg2 @ 1266 Mhz

WRITE: 733 MB/Sec (Tricky)  // peg2 @ 1266 Mhz
WRITE: 663 MB/Sec (Tricky)  // a1-xe @ 1.4 Ghz
[COLOR=red]WRITE: 352 MB/Sec (Tricky)  // X1000 @ 1.8 Ghz[/COLOR]
I think it would be interesting to know how how many memory modules Amigakit puts in the X1000 (is both memory controllers being used)? And how fast does the memory run?

BTW, what is the "Tricky" test? The X1000 obviously doesn't like doing tricky stuff...

Anyway, here comes the point of the post:

Code: [Select]
[B]--- VIDEO BUS ---

WRITE: 221 MB/Sec  // peg2 @ 1266 Mhz with unknown video card
WRITE: 169 MB/Sec  // a1-xe @ 1.4 Ghz with Radeon 9000 Pro
[COLOR=red]WRITE: 161 MB/Sec  // X1000 @ 1.8 Ghz with Radeon HD 6870 1GB[/COLOR][/B]
First, there is nothing unexpected in the G4 machine tests.

The maximum theoretical bandwidth for AGP 1x is 266 MB/s, but that's theoretical maximum and not what you get in practice. The Pegasos 2 has "AGP 1x", and the 221MB/s in this test is similar to other benchmarks and is about what you realistically can expect from a Pegasos 2.

The A1-XE is based on the notoriously flawed Articia-S Northbridge from MAI. It was marketed and sold as an AGP 2x chip (meaning a theoretical max transfer speed of 533MB/s). However, some benchmarks (here is one) did establish a long time ago that you in real life would only reach about "AGP 0,5x" (for a G3) to "AGP 0.7x" (for a G4) on a MAI Teron board (sold as "AmigaOne" by Eyetech). This benchmark confirm this. Terrible performance from a AGP 2x computer, but again, very expected when it comes to Articia-S. We already knew this.

But the X1000?! Whoa!! What's the matter with that?!

Either the benchmark is terribly flawed someway (it worked for the G4 systems though), or something is really borked in either the X1000 or PA6T HW, or in OS4 or the driver implementations. That number is so terrible that it simply can't possibly be true!! It should be close to twenty-five times faster!

So where is the flaw?

The test? The HW? The OS/Drivers?

:confused:

If you quote these results
http://amigaworld.net/modules/newbb/viewtopic.php?mode=viewtopic&topic_id=32815&forum=14&start=80&viewmode=flat&order=0#718165

Yes, they are mine.

I can run test under Unity, and some preliminaries are
are also dissapointing. Its Sempron level core, but FPU, Memory transfers and Altivec where used are nice. Its not a killer. X2000 will do better.

On AmigaOS results, I can run you more benchmarks as OS drivers for graphics, kernel etc. go mature.

Here are Linux Phoenix tests
http://openbenchmarking.org/result/1310106-AR-AMIGAONE027

Hard Info vs AthlonXP
http://forum.opensource-srbija.org/topic/2019-ubuntu-1204-lts-sta-me-ocekuje-u-odnosu-na-mint/page-3#entry35367

I do plan to try to put lighter Linux and do better.

It confirms Pirus tests. Thus X2000 for more power hungry users, and PA Semis will soon be next to Unicorns. But I can live with it. AmigaOS feels better then ASUS i7 dual core x64 laptop.

On AmigaOS side, no matter the benchs, its faaast :-)

Oh yes, everyone does tricky on tricky, re-read tests.
http://amigaworld.net/modules/newbb/viewtopic.php?topic_id=32815&forum=14

Some real tests under Linux can follow, if you can provide me same samples and good stopwatch app for PPC Linux 32 bit.
« Last Edit: October 12, 2013, 06:05:11 AM by vox »
Future Acube and MOS supporter, fi di good, nothing fi di unprofessionals. Learn it harder way! http://www.youtube.com/user/rasvoja and https://www.facebook.com/rasvoja
 

Offline vox

  • Hero Member
  • *****
  • Join Date: Feb 2011
  • Posts: 862
    • Show only replies by vox
    • http://anticusa.wordpress.com
Re: What about the X1000 video bus performance?
« Reply #8 on: October 12, 2013, 06:11:41 AM »
CPu would do better if reached more cache, scaled to 8 cores with special AmigaOS 5 and LinuxPPC 64 bit made just for it (FPU, Altivec) and maybe gone up above 2Ghz to 3Ghz.

But Apple killed it, and that is only sample.

Like name sais, it does it on 7W.

AthlonXP of same frequency performs way better but takes x10 electricity.

It would be great chip for SAM440 board if made multithreaded, for its time or went scaled to mobiles and tablets.

Its used mainly for devices
http://en.wikipedia.org/wiki/PWRficient#Notable_users

And only in x1000 as main CPU in big big AMIGA MONOLITH :-)

However, features,  design for multi-core and features are quite impressive for its time. Compare it to G4 or G5.

CPU

 PA6T
 
Memory system

 CONEXIUM
 
I/O

 ENVOI
 
Xena will not help there, so its on AmigaOS and Linux to make its best use.
Future Acube and MOS supporter, fi di good, nothing fi di unprofessionals. Learn it harder way! http://www.youtube.com/user/rasvoja and https://www.facebook.com/rasvoja
 

Offline vox

  • Hero Member
  • *****
  • Join Date: Feb 2011
  • Posts: 862
    • Show only replies by vox
    • http://anticusa.wordpress.com
Re: What about the X1000 video bus performance?
« Reply #9 on: October 12, 2013, 06:15:08 AM »
Quote from: takemehomegrandma;749859
@Hans and itix

Hans: "The test is 'flawed.' Ragemem uses its own custom CPU copy routine to copy to VRAM, and it's not very good."

Itix: "Those tests are using CPU to manipulate video memory."

OK, thank you both for your clarifications! I learned something here!

:)

Remember Hans 2D bench here, X1000 does best, surely compled with mighty cards it has space for.

Its 2D only, but you can compare X1000 and other systems with same cards.

http://hdrlab.org.nz/benchmark/gfxbench2d/OS/AmigaOS

And yes, CPU is used for video transfers, and yes Ragemem is NOT a GFX benchmark, even it provides some insight
Future Acube and MOS supporter, fi di good, nothing fi di unprofessionals. Learn it harder way! http://www.youtube.com/user/rasvoja and https://www.facebook.com/rasvoja