Welcome, Guest. Please login or register.

Author Topic: p96 is unbelievably Slow!  (Read 11535 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline wawrzonTopic starter

p96 is unbelievably Slow!
« on: December 18, 2010, 08:09:47 PM »
since quite a while im porting different sdl stuff to 68k. i have been on hold releasing though because of a lot of flaws that occured to be present in amiga system solutions like warp3d for instance. now i took a closer look at p96 performance, which is the default and only gfx solution for my main a4k/060-50/mediator/voodoo3 setup.
 
today i have made sdl benchmarks on my both secondary systems (a4k/060-50/cv64/cgx) and (a4k/060-50/p4/p96) to find out how to avoid bad performance of sdl apps in little endian modi. the reason was that sdl opens these modis when toggling on fullscreen.

the conclusion ive posted also on german a1k forum is groundbreaking!! http://www.a1k.org/forum/showthread.php?p=391077#post391077
 
1.the little endian modis are speciality of p96 (i have them available on picassoIV under p96, and they are the primary choice with elbox/mediator/radeon solution). on my only cgx system with cv64 all modis seem to be big endian by default.  i dont understand what little endian modes are for when they are so slow on amiga. perhaps newer gfx cards are not capable of big endian anymore (i have an radeon 9250, not tested yet), voodoo can do them though.
 
2. ive tested sdl bench and a sdl port of mine, kuklomenos on both secondary systems. ive compiled kuklomenos to work both with sw- and hwsurface.
 
on p4/p96 both version perform effectively under  0fps!!!!!!
on cv64/cgx i get with swsurface around 5fps and with hwsurface moderate 12 fps.
 
also on my voodoo setup the little endian modes are very although not as slow, big endian being a little faster, still no match for cgx as it seems.

in this respect i also remember similar experience with amiblitz compiled stuff way back.

ive been aware p96 is slower that cgx but this is really an ultimate showstopper here.
we are talking about a real world factor 10 or so.

now: what to do??
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #1 on: December 18, 2010, 08:42:30 PM »
@karlos: you mean to use 3d hardware to output 2d gfx? yes, it works faster even on p96, here are in some cases other w3d driver specific problems involved. though for simple things it works.

but this is a hack and takes a lot of effort to adapt every single program if possible at all.
my point is to make quick sdl application ports usable on 68k in general.

in p96 it burns down to:

1. do not use or even disable little endian modis on your system. this isnt possible at this time while the opposite (disabling faster big endian modis) is possible and enabled by default. i dont understend this logic.

2. p96 is in general *very* slow in comparison to cgx. it is a number of times slower on faster/newer hardware. how much slower it is is quite a shock to me! this means would we have an optimized solution the alltogether usability of amiga systems based on p96 like mediator boxes were much higher. i start to wonder if winuae (and eventually future aros68k atop of that) is not going to suffer from such flaws. hope not, but as far as i know winuae gfx implementation is based on p96.

people, can we do anything about it?
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #2 on: December 18, 2010, 08:49:35 PM »
Quote from: Thematic;599788
You mean "under 1 fps".



Funny, it's been five years since I used a A1200-with-Voodoo system on a daily basis, but I do not remember it being terribly slow. I knew the bus speed wasn't the absolute best possible. Perhaps I used "heavy" software that would have slow refresh in any case. SDL on that system wasn't fast and I've been meaning to test how the new ScummVM AGA compares to RTG version when using a 320x resolution screen or window.

I have 68060 also, but Blizzard is slightly slower.


i mean the fps result is a fraction of frame per second.

ive been using voodoo system for years now blaming its bad performance on zorro3 bus limitations alone. as we know it does 7mbs. cv64 can do about the double of it but this is no explanation for the kind of difference.

generally the warp3d is performing quite equal on both systems i think. the problems are 2d blitting and drawing functions. and also byte order swapping as soon as little endian is involved. but this in particular is comperatively easy to solve, just use your 16bit instead of 16bitPC modi.

and another thing: you likely wont notice the performance flaw just moving windows around workbench. this makes a difference on applications that reall depend on fast screen redraw like games, in particular sdl 2d games.
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #3 on: December 18, 2010, 09:08:00 PM »
hm, what i can only think of, is someone takes p96 binaries, disassemble them, fix and optimize. like matthey did with warp3d for voodoo. one could maybe involve some clean room approach to catch up with cgx. but this would be up to some dedicated person.

note, im not asking for help with my sdl ports, i am discussing a general issue that might be the question of improvement for many 68k/p96 users or even os4 users if these flaws apply there as well.
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #4 on: December 18, 2010, 10:27:31 PM »
@itix: this is understandable, and i do not expect  to remove this problem as this is general sdl issue. it can be worked around not using any le modes, or disabling them from the start. btw, morphos uses little endian for gfx? i wouldnt expect that, given cgx privides quite good performance on classic so it must be working in big endian.

i am not sure if im right here but i recall cgx had some sort fast support for alpha blits. how was that possible? maybe this is the issue?
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #5 on: December 19, 2010, 12:02:45 AM »
@itix: thanks for the info. thats what i was afraid of.
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #6 on: December 19, 2010, 01:15:18 AM »
ok, now i have compiled a simple test that actually does not confirm my previous conclussions. the executable will be here for a few days in case anybody likes to try for himself:

http://www.daten-transport.de/?id=SY75DmCyVKkE

the comparison indicates actually that 16bit and 16bitPC is about equally fast but cv64 is two times faster than picasso4. this is what is hardwarewise expectable, but then the test is pretty simple so perhaps the question is to find the particular functions that are so slow on p96.
-----------------------------

a4k060-50/cv64/cgx/16bit

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.265024 0.265217 0.0335766 0.0335769
Fast points (frames/sec):  11.6954  11.6927  2.95159  2.95166
   Rect fill (rects/sec):  536.196  535.775  124.385  124.362
 32x32 blits (blits/sec):  2727.03  2714.38  2385.56  2393.92

a4k060-50/p4/p96/16bit

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.161336 0.161564 0.0208532 0.0208488
Fast points (frames/sec):  8.86488  8.86949  2.27465  2.27327
   Rect fill (rects/sec):  246.911  246.658  66.2494  66.2622
 32x32 blits (blits/sec):   1033.3  1035.39  964.673  965.127

a4k060-50/p4/p96/16bitPC

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.164305 0.164376 0.0209344 0.0209436
Fast points (frames/sec):  8.93449  8.93293  2.27618  2.27695
   Rect fill (rects/sec):  280.222  280.183   67.207   67.163
 32x32 blits (blits/sec):  1486.21  1486.75  1204.71  1198.71


and winuae on my 2ghz notebook just for lolz:

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec):  10.7239  10.4439  1.34612  1.34476
Fast points (frames/sec):  230.423  200.156  55.2319  54.9946
   Rect fill (rects/sec):  10893.6  10722.5  5050.55  4970.87
 32x32 blits (blits/sec):  15283.6    14681  14733.8  15058.8
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #7 on: December 19, 2010, 03:17:58 AM »
im evaluating it.
here is my current kuklomenos port that has been the reason of all that uproar:
http://www.daten-transport.de/?id=XfGNZPE2AaEn
it is compiled for hwsurface, that means the graphics go ito the vram, source included.
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #8 on: December 19, 2010, 03:20:49 AM »
@karlos: the figures are slightly higher that i would even expect for voodoo, its 7mbs for me with a4k/060. bvision sounds likely though. but thats thanks to your optimized code i take it.
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #9 on: December 19, 2010, 04:25:40 AM »
another quick attempt on an sdl test application.
http://www.daten-transport.de/?id=EXKChmTTKWvf
this should run without ixemul as well. ive compiled it for 020 without any optimizations or weird options. works very well on my cgs/cv64 setup. it crawls on pIV/p96. and it runs on uae. what should i say?
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #10 on: December 19, 2010, 04:28:43 AM »
@matthey: not bad even if cv still looks better. but i start to suspect the test doesnt reflect sdl reality very well. try the above one.
« Last Edit: December 19, 2010, 04:49:33 AM by wawrzon »
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #11 on: December 19, 2010, 05:03:28 AM »
now, updated out of fooly without backup and there goes the damn sfs again!!! grr!! access outside the partition and the like. i have not used the damn machines for more then a year and forgot they still are infected with this superfu**ed_up_filesystem!

gulliver, just to be sure: you have tested the update with muforce on?
« Last Edit: December 19, 2010, 05:53:05 AM by wawrzon »
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #12 on: December 19, 2010, 05:49:50 AM »
ok i can report that the new libraries of gulliver give me significant speedup with pig and sdlbench (details after the sun rises) but not with kulomenos alas. i have an impression they are quite bugged though and demand serious cleanup. i will probably have to backup and replace my filesystems on the p4 machine here before i go on with it. which i will not do before end of the year. but i might check the stuff more carefully on the voodoo system later today.

may i ask, who made these fixes?

@matthey: strange i get much higher results with these libs on p4 than you on voodoo, is that to be trusted? sorcery?
« Last Edit: December 19, 2010, 06:20:16 AM by wawrzon »
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #13 on: December 20, 2010, 12:03:22 AM »
okay ive updated my voodoo3 setup as well. by hand this time. was not much to do, most libs were up to date. also my picture.datatype is newer than the contained in the archive. i dont recall where it is from. maybe it is some wos version, will have to take look at version number. all is working well, no hits, not much difference to before, excep a slight speedup in "pig" on little endian mode. in kuklomenos im getting 10fps in big endian and 7 in little.

on p4 things are quite different. there is quite a boost, but not in kuklomenos. i wonder why is this so slow. i suspect p96 is quite slow drawing the lines. if i inderstood you correctly, matt, also to draw lines in 3d cgx or p96 is used. is this correct? and this too is explicitely slow in w3d.

apart of that i now get with picassoIV system also a hit on startup. but that might be due to corrupted filesystem. i have to bring that in order first. here is a log, i dont think it will indicate anything without the sources though.
--------------------------------------
30-Sep-08  22:29:33
LONG READ from 00000020                        PC: 07373FE8
USP : 070B0FAC SR: 0010  (U0)(-)(-)  TCB: 070B0728
Data: 00000000 00000060 00000044 00000084 073579EC 07357ABA 07357ADE 00000000
----> 073579EC - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000016C
----> 07357ABA - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000023A
----> 07357ADE - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000025E
Addr: 00000000 FFFFFFFF 0738D354 0738D354 0735766E 0738D354 000046CC 07002340
Stck: 070008D4 0738A18C 00F81CBA 00000400 00000000 00000000 00000000 00000000
Stck: 00000000 07357ADE 0735766E 073586A6 00000000 073581BB 070B06B0 00F81100
Stck: 07357884 0738A18C 00FD56BE 070B0728 070008D4 00FD560E 00F8A06E 00000800
Stck: 72616D6C 69620000 00000000 070B0710 070B0772 00000000 00000001 070B1030
Stck: 00000014 00000002 003E26A2 003E3AFA 003F5571 00000000 00000000 00000038
Stck: 00000001 FFFFFFFF 070B11CC 01C50F87 00000001 00000001 00F8F5DE 00F8F6C2
Stck: 00F8F6B6 070B104C 00000000 000000C4 FEDCBA98 00000000 00000000 070113D8
Stck: 0000A4A4 10100000 00010005 00000000 00000000 0000A4A4 00101008 4DEF0000
----> 07373FE8 - "Work:Libs/Picasso96/rtg.library"  Hunk 0002 Offset 00017D58
----> 00F81CBA - "ROM - exec 45.20 (6.1.2002)"  Hunk 0000 Offset 00001C0C
----> 07357ADE - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000025E
----> 073586A6 - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 00000E26
----> 073581BB - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000093B
----> 00F81100 - "ROM - exec 45.20 (6.1.2002)"  Hunk 0000 Offset 00001052
----> 07357884 - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 00000004
----> 00FD56BE - "ROM - ramlib 40.2 (5.3.93)"  Hunk 0000 Offset 000003E6
----> 00FD560E - "ROM - ramlib 40.2 (5.3.93)"  Hunk 0000 Offset 00000336
----> 00F8A06E - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 000005B2
----> 00F8F5DE - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 00005B22
----> 00F8F6C2 - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 00005C06
----> 00F8F6B6 - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 00005BFA
PC-8: FF0C60BE 2F0E6068 700043FA 00424EAE FDD82C40 70FF91C8 72604EAE FFB82040
PC *: 20280020 41FAFFD2 208041FA FF7C2080 41FAFF60 208041FA FF0A2080 41FAF7B8
07373fc4 :  6dff fffe ff0c             blt.l $7363ed2 ;extended opcode
07373fca :  60be                       bra.s $7373f8a
07373fcc :  2f0e                       move.l a6,-(a7)
07373fce :  6068                       bra.s $7374038
07373fd0 :  7000                       moveq.l #$0,d0
07373fd2 :  43fa 0042                  lea.l $7374016(pc),a1
07373fd6 :  4eae fdd8                  jsr -$228(a6)
07373fda :  2c40                       movea.l d0,a6
07373fdc :  70ff                       moveq.l #-$1,d0
07373fde :  91c8                       suba.l a0,a0
07373fe0 :  7260                       moveq.l #$60,d1
07373fe2 :  4eae ffb8                  jsr -$48(a6)
07373fe6 :  2040                       movea.l d0,a0
07373fe8 : *2028 0020                  move.l $20(a0),d0
07373fec :  41fa ffd2                  lea.l $7373fc0(pc),a0
07373ff0 :  2080                       move.l d0,(a0)
07373ff2 :  41fa ff7c                  lea.l $7373f70(pc),a0
07373ff6 :  2080                       move.l d0,(a0)
07373ff8 :  41fa ff60                  lea.l $7373f5a(pc),a0
07373ffc :  2080                       move.l d0,(a0)
07373ffe :  41fa ff0a                  lea.l $7373f0a(pc),a0
07374002 :  2080                       move.l d0,(a0)
07374004 :  41fa f7b8                  lea.l $73737be(pc),a0
Name: "ramlib"
 

Offline wawrzonTopic starter

Re: p96 is unbelievably Slow!
« Reply #14 on: December 20, 2010, 04:42:45 PM »
Quote from: matthey;600052
If you look at the speed results at http://www.amigaspeed.de.vu/, you will see that the Voodoo 3 is almost 10 times faster at 2D line drawing than the Picasso 4 with CGFX 4. It also looks like P96 is a little faster than CGFX for line drawing with the Voodoo 3+ at least. I would expect the Picasso 4 driver to be good with P96 as well. The Picasso 4 is slow compared to Voodoo 3+ except where the gfx bus speed matters (bitmaps).

im using picasso4 with p96 system and respectively cv64 with cgx (4)?. so this should be the optimal setup. though since i got to get rid of sfs im going to install mirror partitions with the alternative gfx system on each of the machines for the sake of testing.

Quote

No. I don't think so. Sorry if I mislead you. The Avenger libraries do call the appropriate CGFX or P96 Warp3D libraries which call the appropriate CGFX or P96 functions but these shouldn't be used for 3D lines. 3D lines need the Z value and are affected by the Z buffer. They are actually drawn as triangles as the Avenger has no support for 3D lines (or points). The Avenger does have support for 2D line drawing that is very fast in comparison. It would be very inefficient to use W3D_DrawLine() or similar to draw 2D lines.

i stay corrected. i  have still the impression though that the line drawing in w3d is generally not as fast as it should be. see 4dtris and snake3d. btw have you corrected that issue in your w3d libs where the color texturing loses saturation (foobillard,brass parts)? i can look at what function is actually used there if you dont know what im talking about.

Quote


I doubt it is the filesystem. It looks like a NULL pointer that is not tested. Let me translate it to something you might be able to read...

...
OpenLibrary (libName="expansion.library", version=0)
configDev = FindConfigDev (oldConfigDev=0, manufacturer=-1, product=$60)
tmp = configDev->cd_BoardAddr
globalvar1 = tmp
globalvar2 = tmp
...

See, configDev was never tested for NULL before it was used (neither was the OpenLibrary return). cd_BoardAddr offset is $20 from 0 which explains the read from address $20. I would say that this code is a patch that was added in by an assembler programmer at a later date than when it was compiled. The biggest hint is that the patch does some self modifying code and then calls exec/CacheClearU(). This is not normally done in C. I would suggest reverting back to the original rtg.library 40.3994 (08/22/04) 217988 bytes. Actually, this is the version that I have been using all along without problems. It does not contain this hackish patch.


sounds you are right. since i did it by hand on voodoo system i didnt install gullivers 40.3994 because iv already had a lib with this version number. in this case gulliver should also revert his package to this version of rtg.library.
« Last Edit: December 20, 2010, 04:46:01 PM by wawrzon »