@patrick
Well, its a bit involved. It was designed to measure the performace of my direct-to-video ram pixel translations (a backend part of a system I have been designing).
However, if you run it without specifying a source pixel format, it just performs what is basically a copy operation. The code for this is loop unrolled asm, moving 16x32-bits in the main loop. It so happens that this gives a pretty good estimate of the bus speed which has proven to be the limiting factor on all systems so far, even directly connected cards like the BVision.
Later, when looking at 3D issues, I found I needed a quick test for VRAM access badwidth (limited by the speed of the bus, basically) and it was handy to use my existing program.
As for line transfers and the like, I am not sure how likely that is. VRAM especially is usually mapped as non cacheable due to its inherently volatile nature.
The PPC can indeed address anywhere the 680x0 can. That was part of the design aim of the first PPC cards.
If, however, the 680x0 can't write at full speed because of the bus logic limitations, its fair to assume the PPC wont be much faster. Hence my thoughs about better bus logic(northbridge or whatever) that might be available for PPC only systems.