Welcome, Guest. Please login or register.

Author Topic: What about the X1000 video bus performance?  (Read 6873 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Hans_

Re: What about the X1000 video bus performance?
« on: October 11, 2013, 09:52:53 PM »
@takemehomegrandma

Quote
Code: [Select]
--- RAM ---

[COLOR=red]READ32: 2860 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
READ32:  233 MB/Sec  // a1-xe @ 1.4 Ghz
READ32:  146 MB/Sec  // peg2 @ 1266 Mhz

[COLOR=red]WRITE64: 3388 MB/Sec  // X1000 @ 1.8 Ghz[/COLOR]
WRITE64:  645 MB/Sec  // a1-xe @ 1.4 Ghz
WRITE64:  387 MB/Sec  // peg2 @ 1266 Mhz

WRITE: 733 MB/Sec (Tricky)  // peg2 @ 1266 Mhz
WRITE: 663 MB/Sec (Tricky)  // a1-xe @ 1.4 Ghz
[COLOR=red]WRITE: 352 MB/Sec (Tricky)  // X1000 @ 1.8 Ghz[/COLOR]


I think it would be interesting to know how how many memory modules Amigakit puts in the X1000 (is both memory controllers being used)? And how fast does the memory run?

BTW, what is the "Tricky" test? The X1000 obviously doesn't like doing tricky stuff...

I'm guessing that he's using the CPU's cache prefetching/pre-allocation instructions to try to speed things up. There are two problems:
- The dcba instruction to allocate cache space is illegal on the G5 and PA6T. Any use of that will slow things right down (NOTE: he may or may not be using this, I don't know)
- The dcbz instruction is used to zero memory a cache-line at a time to avoid unnecessary fetches from RAM. However, it operates on 32-bytes, whereas the G5 and PA6T have longer cache-lines. Hence, using this instruction on those CPUs still causes unnecessary RAM fetches for the rest of the cache-line, and that gives a performance hit. The instruction to use is dcbzl

Quote
Anyway, here comes the point of the post:

Code: [Select]
[b]--- VIDEO BUS ---

WRITE: 221 MB/Sec  // peg2 @ 1266 Mhz with unknown video card
WRITE: 169 MB/Sec  // a1-xe @ 1.4 Ghz with Radeon 9000 Pro
[COLOR=red]WRITE: 161 MB/Sec  // X1000 @ 1.8 Ghz with Radeon HD 6870 1GB[/COLOR][/b]


First, there is nothing unexpected in the G4 machine tests.

The maximum theoretical bandwidth for AGP 1x is 266 MB/s, but that's theoretical maximum and not what you get in practice. The Pegasos 2 has "AGP 1x", and the 221MB/s in this test is similar to other benchmarks and is about what you realistically can expect from a Pegasos 2.

The A1-XE is based on the notoriously flawed Articia-S Northbridge from MAI. It was marketed and sold as an AGP 2x chip (meaning a theoretical max transfer speed of 533MB/s). However, some benchmarks (here is one) did establish a long time ago that you in real life would only reach about "AGP 0,5x" (for a G3) to "AGP 0.7x" (for a G4) on a MAI Teron board (sold as "AmigaOne" by Eyetech). This benchmark confirm this. Terrible performance from a AGP 2x computer, but again, very expected when it comes to Articia-S. We already knew this.

But the X1000?! Whoa!! What's the matter with that?!

Either the benchmark is terribly flawed someway (it worked for the G4 systems though), or something is really borked in either the X1000 or PA6T HW, or in OS4 or the driver implementations. That number is so terrible that it simply can't possibly be true!! It should be close to twenty-five times faster!

So where is the flaw?

The test? The HW? The OS/Drivers?

:confused:


The test is "flawed." Ragemem uses its own custom CPU copy routine to copy to VRAM, and it's not very good. There is a huge penalty for transferring data in small blocks on the PCIe bus. DMA is the only way to get fast transfer rates.

Having said that, you can still get better transfer rates without DMA by using altivec. Have a look at the MemCopy results in this GfxBench2D result.

Hans


P.S. Before you ask, no, the driver doesn't have DMA yet. It's on the to-do list.
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project.