Welcome, Guest. Please login or register.

Author Topic: p96 is unbelievably Slow!  (Read 6326 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline wawrzonTopic starter

p96 is unbelievably Slow!
« on: December 18, 2010, 08:09:47 PM »
since quite a while im porting different sdl stuff to 68k. i have been on hold releasing though because of a lot of flaws that occured to be present in amiga system solutions like warp3d for instance. now i took a closer look at p96 performance, which is the default and only gfx solution for my main a4k/060-50/mediator/voodoo3 setup.
 
today i have made sdl benchmarks on my both secondary systems (a4k/060-50/cv64/cgx) and (a4k/060-50/p4/p96) to find out how to avoid bad performance of sdl apps in little endian modi. the reason was that sdl opens these modis when toggling on fullscreen.

the conclusion ive posted also on german a1k forum is groundbreaking!! http://www.a1k.org/forum/showthread.php?p=391077#post391077
 
1.the little endian modis are speciality of p96 (i have them available on picassoIV under p96, and they are the primary choice with elbox/mediator/radeon solution). on my only cgx system with cv64 all modis seem to be big endian by default.  i dont understand what little endian modes are for when they are so slow on amiga. perhaps newer gfx cards are not capable of big endian anymore (i have an radeon 9250, not tested yet), voodoo can do them though.
 
2. ive tested sdl bench and a sdl port of mine, kuklomenos on both secondary systems. ive compiled kuklomenos to work both with sw- and hwsurface.
 
on p4/p96 both version perform effectively under  0fps!!!!!!
on cv64/cgx i get with swsurface around 5fps and with hwsurface moderate 12 fps.
 
also on my voodoo setup the little endian modes are very although not as slow, big endian being a little faster, still no match for cgx as it seems.

in this respect i also remember similar experience with amiblitz compiled stuff way back.

ive been aware p96 is slower that cgx but this is really an ultimate showstopper here.
we are talking about a real world factor 10 or so.

now: what to do??
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: p96 is unbelievably Slow!
« Reply #1 on: December 18, 2010, 08:24:22 PM »
In my experience, the only performance conscious way to work with RTG is to pre-convert all your graphic assets to whatever hardware format the display is using. Never, ever assume colourspace conversion will happen in hardware.

The only other way I've found (which in my case was ideal for supported hardware on OS3.x) is to make your own 2D drawing layer on top of say Warp3D. The advantages there are the ability to use alpha channel transparency and so on.
int p; // A
 

Offline Thematic

  • Jr. Member
  • **
  • Join Date: Aug 2003
  • Posts: 69
    • Show only replies by Thematic
Re: p96 is unbelievably Slow!
« Reply #2 on: December 18, 2010, 08:29:11 PM »
Quote from: wawrzon;599784

on p4/p96 both version perform effectively under  0fps!!!!!!

You mean "under 1 fps".

Quote from: wawrzon;599784

ive been aware p96 is slower that cgx but this is really an ultimate showstopper here.
we are talking about a real world factor 10 or so.


Funny, it's been five years since I used a A1200-with-Voodoo system on a daily basis, but I do not remember it being terribly slow. I knew the bus speed wasn't the absolute best possible. Perhaps I used "heavy" software that would have slow refresh in any case. SDL on that system wasn't fast and I've been meaning to test how the new ScummVM AGA compares to RTG version when using a 320x resolution screen or window.

I have 68060 also, but Blizzard is slightly slower.
So you have the strings in your palm. Do you know what they are for?
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #3 on: December 18, 2010, 08:42:30 PM »
@karlos: you mean to use 3d hardware to output 2d gfx? yes, it works faster even on p96, here are in some cases other w3d driver specific problems involved. though for simple things it works.

but this is a hack and takes a lot of effort to adapt every single program if possible at all.
my point is to make quick sdl application ports usable on 68k in general.

in p96 it burns down to:

1. do not use or even disable little endian modis on your system. this isnt possible at this time while the opposite (disabling faster big endian modis) is possible and enabled by default. i dont understend this logic.

2. p96 is in general *very* slow in comparison to cgx. it is a number of times slower on faster/newer hardware. how much slower it is is quite a shock to me! this means would we have an optimized solution the alltogether usability of amiga systems based on p96 like mediator boxes were much higher. i start to wonder if winuae (and eventually future aros68k atop of that) is not going to suffer from such flaws. hope not, but as far as i know winuae gfx implementation is based on p96.

people, can we do anything about it?
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #4 on: December 18, 2010, 08:49:35 PM »
Quote from: Thematic;599788
You mean "under 1 fps".



Funny, it's been five years since I used a A1200-with-Voodoo system on a daily basis, but I do not remember it being terribly slow. I knew the bus speed wasn't the absolute best possible. Perhaps I used "heavy" software that would have slow refresh in any case. SDL on that system wasn't fast and I've been meaning to test how the new ScummVM AGA compares to RTG version when using a 320x resolution screen or window.

I have 68060 also, but Blizzard is slightly slower.


i mean the fps result is a fraction of frame per second.

ive been using voodoo system for years now blaming its bad performance on zorro3 bus limitations alone. as we know it does 7mbs. cv64 can do about the double of it but this is no explanation for the kind of difference.

generally the warp3d is performing quite equal on both systems i think. the problems are 2d blitting and drawing functions. and also byte order swapping as soon as little endian is involved. but this in particular is comperatively easy to solve, just use your 16bit instead of 16bitPC modi.

and another thing: you likely wont notice the performance flaw just moving windows around workbench. this makes a difference on applications that reall depend on fast screen redraw like games, in particular sdl 2d games.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: p96 is unbelievably Slow!
« Reply #5 on: December 18, 2010, 08:53:17 PM »
@wawrzon

Not sure I can help you. I never use SDL in my coding projects I'm sorry to say. I'm one of those slightly weird people that gets more fun out of writing the middleware myself.
int p; // A
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #6 on: December 18, 2010, 09:08:00 PM »
hm, what i can only think of, is someone takes p96 binaries, disassemble them, fix and optimize. like matthey did with warp3d for voodoo. one could maybe involve some clean room approach to catch up with cgx. but this would be up to some dedicated person.

note, im not asking for help with my sdl ports, i am discussing a general issue that might be the question of improvement for many 68k/p96 users or even os4 users if these flaws apply there as well.
 

Offline itix

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2380
    • Show only replies by itix
Re: p96 is unbelievably Slow!
« Reply #7 on: December 18, 2010, 09:31:18 PM »
Quote
today i have made sdl benchmarks on my both secondary systems (a4k/060-50/cv64/cgx) and (a4k/060-50/p4/p96) to find out how to avoid bad performance of sdl apps in little endian modi.

Problem is SDL doesnt support LE modes natively on BE machines (when speaking of hicolor mode). You can only render into BE framebuffer in fast ram and SDL has to convert framebuffer to LE when copying buffer to VMEM.

I had this problem on MorphOS when I was trying to get some games run at decent speed. It is sort of design flaw in SDL.
My Amigas: A500, Mac Mini and PowerBook
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #8 on: December 18, 2010, 10:27:31 PM »
@itix: this is understandable, and i do not expect  to remove this problem as this is general sdl issue. it can be worked around not using any le modes, or disabling them from the start. btw, morphos uses little endian for gfx? i wouldnt expect that, given cgx privides quite good performance on classic so it must be working in big endian.

i am not sure if im right here but i recall cgx had some sort fast support for alpha blits. how was that possible? maybe this is the issue?
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: p96 is unbelievably Slow!
« Reply #9 on: December 18, 2010, 10:32:10 PM »
@wawrzon

Endianess is not such a problem for PPC machines which can do byteswapping for load/store operations. Not sure if it is used in MOS or OS4 but I believe you can even designate areas of the address space as big or little endian using the MMU, at least on some PPC processors.
int p; // A
 

Offline itix

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2380
    • Show only replies by itix
Re: p96 is unbelievably Slow!
« Reply #10 on: December 18, 2010, 11:36:29 PM »
Quote from: wawrzon;599808
@itix: this is understandable, and i do not expect  to remove this problem as this is general sdl issue. it can be worked around not using any le modes, or disabling them from the start.


Disabling LE 16bit modes is not an option because then some games will fail.

Quote

btw, morphos uses little endian for gfx? i wouldnt expect that, given cgx privides quite good performance on classic so it must be working in big endian.


AFAIK some gfx cards only support LE so it was chosen as default in CGX 5. I think it is because CGX 5 had to support all Radeon variants unlike those phase5 gfx cards tailored for Amiga.

Quote

i am not sure if im right here but i recall cgx had some sort fast support for alpha blits. how was that possible? maybe this is the issue?


In CGX 5 there is WritePixelArrayAlpha() and BltBitMap[RastPort]Alpha(). Depending on underlying hardware they can be GPU and AltiVec accelerated. But alpha blits are not usually problem in SDL. MorphOS 1 didnt have accelerated alpha blits yet nobody noticed it when alpha blit accelerated SDL came available for MorphOS 2 users.
My Amigas: A500, Mac Mini and PowerBook
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #11 on: December 19, 2010, 12:02:45 AM »
@itix: thanks for the info. thats what i was afraid of.
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #12 on: December 19, 2010, 01:15:18 AM »
ok, now i have compiled a simple test that actually does not confirm my previous conclussions. the executable will be here for a few days in case anybody likes to try for himself:

http://www.daten-transport.de/?id=SY75DmCyVKkE

the comparison indicates actually that 16bit and 16bitPC is about equally fast but cv64 is two times faster than picasso4. this is what is hardwarewise expectable, but then the test is pretty simple so perhaps the question is to find the particular functions that are so slow on p96.
-----------------------------

a4k060-50/cv64/cgx/16bit

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.265024 0.265217 0.0335766 0.0335769
Fast points (frames/sec):  11.6954  11.6927  2.95159  2.95166
   Rect fill (rects/sec):  536.196  535.775  124.385  124.362
 32x32 blits (blits/sec):  2727.03  2714.38  2385.56  2393.92

a4k060-50/p4/p96/16bit

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.161336 0.161564 0.0208532 0.0208488
Fast points (frames/sec):  8.86488  8.86949  2.27465  2.27327
   Rect fill (rects/sec):  246.911  246.658  66.2494  66.2622
 32x32 blits (blits/sec):   1033.3  1035.39  964.673  965.127

a4k060-50/p4/p96/16bitPC

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.164305 0.164376 0.0209344 0.0209436
Fast points (frames/sec):  8.93449  8.93293  2.27618  2.27695
   Rect fill (rects/sec):  280.222  280.183   67.207   67.163
 32x32 blits (blits/sec):  1486.21  1486.75  1204.71  1198.71


and winuae on my 2ghz notebook just for lolz:

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec):  10.7239  10.4439  1.34612  1.34476
Fast points (frames/sec):  230.423  200.156  55.2319  54.9946
   Rect fill (rects/sec):  10893.6  10722.5  5050.55  4970.87
 32x32 blits (blits/sec):  15283.6    14681  14733.8  15058.8
 

Offline stevieu

  • Jr. Member
  • **
  • Join Date: Sep 2007
  • Posts: 84
    • Show only replies by stevieu
    • http://myspace.com/stevieu83
Re: p96 is unbelievably Slow!
« Reply #13 on: December 19, 2010, 03:03:11 AM »
Who knows :)

Maybe it was psychological, but I found OS3.9 running CyberGraphX much more responsive with a BVision than I did a Mediator with a Voodoo 3 under P96.. maybe that was due to bus speed? Either way, it was never as nippy for me.

Steve

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: p96 is unbelievably Slow!
« Reply #14 on: December 19, 2010, 03:10:33 AM »
If I recall correctly, on my 68040/BVision, I can get up to 17MB/s copy to VRAM (using a loop unrolled move16 transfer), using a regular move.l based copy is around 15 or so.

On my 040/Mediator/Voodoo setup, the speed was around 9-11MB/s maximum and that's a slightly faster 68040.
int p; // A