x56h34 wrote:
He will most likely blame it on the poorly written benchmarking software. ;-) :-P
Exactly. As we all know, it is fundamentally flawed of me to use the CPU to read longwords or move16's in a hand written, icache friendly asm loop to benchmark this, when used inside a locked environment, accumulatively timed and pre calibrated using something as terribly innacurate as the EClock over a several second duration...
The card it self will do it all 10x faster over the exact same bus using Direct Magic Access