Author Topic: p96 is unbelievably Slow! (Read 13608 times)

Karlos · « **on:** December 18, 2010, 08:24:22 PM »

In my experience, the only performance conscious way to work with RTG is to pre-convert all your graphic assets to whatever hardware format the display is using. Never, ever assume colourspace conversion will happen in hardware.

The only other way I've found (which in my case was ideal for supported hardware on OS3.x) is to make your own 2D drawing layer on top of say Warp3D. The advantages there are the ability to use alpha channel transparency and so on.

Karlos · « **Reply #1 on:** December 18, 2010, 08:53:17 PM »

@wawrzon

Not sure I can help you. I never use SDL in my coding projects I'm sorry to say. I'm one of those slightly weird people that gets more fun out of writing the middleware myself.

Karlos · « **Reply #2 on:** December 18, 2010, 10:32:10 PM »

@wawrzon

Endianess is not such a problem for PPC machines which can do byteswapping for load/store operations. Not sure if it is used in MOS or OS4 but I believe you can even designate areas of the address space as big or little endian using the MMU, at least on some PPC processors.

Karlos · « **Reply #3 on:** December 19, 2010, 03:10:33 AM »

If I recall correctly, on my 68040/BVision, I can get up to 17MB/s copy to VRAM (using a loop unrolled move16 transfer), using a regular move.l based copy is around 15 or so.

On my 040/Mediator/Voodoo setup, the speed was around 9-11MB/s maximum and that's a slightly faster 68040.

Karlos · « **Reply #4 on:** December 19, 2010, 03:27:48 AM »

Don't take the Voodoo figures too seriously, they are off the top of my head. I need to find them (or retest, but that machine is currently in need of attention). The BVision figures are good though.

I experimented a lot with move16 for both copying and other operations, like byteswap copying. Here, you allocate a cache aligned block on the stack, read data from the source swapping as you go, then using move16 to copy the block out to the VRAM. If you allocate enough cache-aligned space (say 64 bytes) you can unroll your transfer loop 4x which was about ideal (with some carefully optimised routines you could handle misaligned data since you do that reading from the source rather than transfering to the bitmap).

Not sure why move16 was faster on BVision VRAM and also it wasn't on every system tested. However, it was never slower. On some other cards, IIRC, like the CVision64, it was slower though.

All very hardware-dependent.

Karlos · « **Reply #5 on:** December 20, 2010, 11:45:08 PM »

Quote from: wawrzon;600228

ive found the reason why p4/p96 on one of my machines was so slow. i removed the cyberstorm ram module and forgot it lying disassembled on my desk just in front of my eyes. how dumb is that!!!! :facepalm: the machine was running from z3 and mobo memory all the time!!

We've all had days like that...

Author Topic: p96 is unbelievably Slow! (Read 13608 times)

Karlos

Re: p96 is unbelievably Slow!

Karlos

Re: p96 is unbelievably Slow!

Karlos

Re: p96 is unbelievably Slow!

Karlos

Re: p96 is unbelievably Slow!

Karlos

Re: p96 is unbelievably Slow!

Karlos

Re: p96 is unbelievably Slow!