Amiga.org

Operating System Specific Discussions => Amiga OS => Amiga OS -- Development => Topic started by: wawrzon on December 18, 2010, 08:09:47 PM

Title: p96 is unbelievably Slow!
Post by: wawrzon on December 18, 2010, 08:09:47 PM
since quite a while im porting different sdl stuff to 68k. i have been on hold releasing though because of a lot of flaws that occured to be present in amiga system solutions like warp3d for instance. now i took a closer look at p96 performance, which is the default and only gfx solution for my main a4k/060-50/mediator/voodoo3 setup.
 
today i have made sdl benchmarks on my both secondary systems (a4k/060-50/cv64/cgx) and (a4k/060-50/p4/p96) to find out how to avoid bad performance of sdl apps in little endian modi. the reason was that sdl opens these modis when toggling on fullscreen.

the conclusion ive posted also on german a1k forum is groundbreaking!! http://www.a1k.org/forum/showthread.php?p=391077#post391077
 
1.the little endian modis are speciality of p96 (i have them available on picassoIV under p96, and they are the primary choice with elbox/mediator/radeon solution). on my only cgx system with cv64 all modis seem to be big endian by default.  i dont understand what little endian modes are for when they are so slow on amiga. perhaps newer gfx cards are not capable of big endian anymore (i have an radeon 9250, not tested yet), voodoo can do them though.
 
2. ive tested sdl bench and a sdl port of mine, kuklomenos on both secondary systems. ive compiled kuklomenos to work both with sw- and hwsurface.
 
on p4/p96 both version perform effectively under  0fps!!!!!!
on cv64/cgx i get with swsurface around 5fps and with hwsurface moderate 12 fps.
 
also on my voodoo setup the little endian modes are very although not as slow, big endian being a little faster, still no match for cgx as it seems.

in this respect i also remember similar experience with amiblitz compiled stuff way back.

ive been aware p96 is slower that cgx but this is really an ultimate showstopper here.
we are talking about a real world factor 10 or so.

now: what to do??
Title: Re: p96 is unbelievably Slow!
Post by: Karlos on December 18, 2010, 08:24:22 PM
In my experience, the only performance conscious way to work with RTG is to pre-convert all your graphic assets to whatever hardware format the display is using. Never, ever assume colourspace conversion will happen in hardware.

The only other way I've found (which in my case was ideal for supported hardware on OS3.x) is to make your own 2D drawing layer on top of say Warp3D. The advantages there are the ability to use alpha channel transparency and so on.
Title: Re: p96 is unbelievably Slow!
Post by: Thematic on December 18, 2010, 08:29:11 PM
Quote from: wawrzon;599784

on p4/p96 both version perform effectively under  0fps!!!!!!

You mean "under 1 fps".

Quote from: wawrzon;599784

ive been aware p96 is slower that cgx but this is really an ultimate showstopper here.
we are talking about a real world factor 10 or so.


Funny, it's been five years since I used a A1200-with-Voodoo system on a daily basis, but I do not remember it being terribly slow. I knew the bus speed wasn't the absolute best possible. Perhaps I used "heavy" software that would have slow refresh in any case. SDL on that system wasn't fast and I've been meaning to test how the new ScummVM AGA compares to RTG version when using a 320x resolution screen or window.

I have 68060 also, but Blizzard is slightly slower.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 18, 2010, 08:42:30 PM
@karlos: you mean to use 3d hardware to output 2d gfx? yes, it works faster even on p96, here are in some cases other w3d driver specific problems involved. though for simple things it works.

but this is a hack and takes a lot of effort to adapt every single program if possible at all.
my point is to make quick sdl application ports usable on 68k in general.

in p96 it burns down to:

1. do not use or even disable little endian modis on your system. this isnt possible at this time while the opposite (disabling faster big endian modis) is possible and enabled by default. i dont understend this logic.

2. p96 is in general *very* slow in comparison to cgx. it is a number of times slower on faster/newer hardware. how much slower it is is quite a shock to me! this means would we have an optimized solution the alltogether usability of amiga systems based on p96 like mediator boxes were much higher. i start to wonder if winuae (and eventually future aros68k atop of that) is not going to suffer from such flaws. hope not, but as far as i know winuae gfx implementation is based on p96.

people, can we do anything about it?
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 18, 2010, 08:49:35 PM
Quote from: Thematic;599788
You mean "under 1 fps".



Funny, it's been five years since I used a A1200-with-Voodoo system on a daily basis, but I do not remember it being terribly slow. I knew the bus speed wasn't the absolute best possible. Perhaps I used "heavy" software that would have slow refresh in any case. SDL on that system wasn't fast and I've been meaning to test how the new ScummVM AGA compares to RTG version when using a 320x resolution screen or window.

I have 68060 also, but Blizzard is slightly slower.


i mean the fps result is a fraction of frame per second.

ive been using voodoo system for years now blaming its bad performance on zorro3 bus limitations alone. as we know it does 7mbs. cv64 can do about the double of it but this is no explanation for the kind of difference.

generally the warp3d is performing quite equal on both systems i think. the problems are 2d blitting and drawing functions. and also byte order swapping as soon as little endian is involved. but this in particular is comperatively easy to solve, just use your 16bit instead of 16bitPC modi.

and another thing: you likely wont notice the performance flaw just moving windows around workbench. this makes a difference on applications that reall depend on fast screen redraw like games, in particular sdl 2d games.
Title: Re: p96 is unbelievably Slow!
Post by: Karlos on December 18, 2010, 08:53:17 PM
@wawrzon

Not sure I can help you. I never use SDL in my coding projects I'm sorry to say. I'm one of those slightly weird people that gets more fun out of writing the middleware myself.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 18, 2010, 09:08:00 PM
hm, what i can only think of, is someone takes p96 binaries, disassemble them, fix and optimize. like matthey did with warp3d for voodoo. one could maybe involve some clean room approach to catch up with cgx. but this would be up to some dedicated person.

note, im not asking for help with my sdl ports, i am discussing a general issue that might be the question of improvement for many 68k/p96 users or even os4 users if these flaws apply there as well.
Title: Re: p96 is unbelievably Slow!
Post by: itix on December 18, 2010, 09:31:18 PM
Quote
today i have made sdl benchmarks on my both secondary systems (a4k/060-50/cv64/cgx) and (a4k/060-50/p4/p96) to find out how to avoid bad performance of sdl apps in little endian modi.

Problem is SDL doesnt support LE modes natively on BE machines (when speaking of hicolor mode). You can only render into BE framebuffer in fast ram and SDL has to convert framebuffer to LE when copying buffer to VMEM.

I had this problem on MorphOS when I was trying to get some games run at decent speed. It is sort of design flaw in SDL.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 18, 2010, 10:27:31 PM
@itix: this is understandable, and i do not expect  to remove this problem as this is general sdl issue. it can be worked around not using any le modes, or disabling them from the start. btw, morphos uses little endian for gfx? i wouldnt expect that, given cgx privides quite good performance on classic so it must be working in big endian.

i am not sure if im right here but i recall cgx had some sort fast support for alpha blits. how was that possible? maybe this is the issue?
Title: Re: p96 is unbelievably Slow!
Post by: Karlos on December 18, 2010, 10:32:10 PM
@wawrzon

Endianess is not such a problem for PPC machines which can do byteswapping for load/store operations. Not sure if it is used in MOS or OS4 but I believe you can even designate areas of the address space as big or little endian using the MMU, at least on some PPC processors.
Title: Re: p96 is unbelievably Slow!
Post by: itix on December 18, 2010, 11:36:29 PM
Quote from: wawrzon;599808
@itix: this is understandable, and i do not expect  to remove this problem as this is general sdl issue. it can be worked around not using any le modes, or disabling them from the start.


Disabling LE 16bit modes is not an option because then some games will fail.

Quote

btw, morphos uses little endian for gfx? i wouldnt expect that, given cgx privides quite good performance on classic so it must be working in big endian.


AFAIK some gfx cards only support LE so it was chosen as default in CGX 5. I think it is because CGX 5 had to support all Radeon variants unlike those phase5 gfx cards tailored for Amiga.

Quote

i am not sure if im right here but i recall cgx had some sort fast support for alpha blits. how was that possible? maybe this is the issue?


In CGX 5 there is WritePixelArrayAlpha() and BltBitMap[RastPort]Alpha(). Depending on underlying hardware they can be GPU and AltiVec accelerated. But alpha blits are not usually problem in SDL. MorphOS 1 didnt have accelerated alpha blits yet nobody noticed it when alpha blit accelerated SDL came available for MorphOS 2 users.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 12:02:45 AM
@itix: thanks for the info. thats what i was afraid of.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 01:15:18 AM
ok, now i have compiled a simple test that actually does not confirm my previous conclussions. the executable will be here for a few days in case anybody likes to try for himself:

http://www.daten-transport.de/?id=SY75DmCyVKkE

the comparison indicates actually that 16bit and 16bitPC is about equally fast but cv64 is two times faster than picasso4. this is what is hardwarewise expectable, but then the test is pretty simple so perhaps the question is to find the particular functions that are so slow on p96.
-----------------------------

a4k060-50/cv64/cgx/16bit

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.265024 0.265217 0.0335766 0.0335769
Fast points (frames/sec):  11.6954  11.6927  2.95159  2.95166
   Rect fill (rects/sec):  536.196  535.775  124.385  124.362
 32x32 blits (blits/sec):  2727.03  2714.38  2385.56  2393.92

a4k060-50/p4/p96/16bit

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.161336 0.161564 0.0208532 0.0208488
Fast points (frames/sec):  8.86488  8.86949  2.27465  2.27327
   Rect fill (rects/sec):  246.911  246.658  66.2494  66.2622
 32x32 blits (blits/sec):   1033.3  1035.39  964.673  965.127

a4k060-50/p4/p96/16bitPC

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.164305 0.164376 0.0209344 0.0209436
Fast points (frames/sec):  8.93449  8.93293  2.27618  2.27695
   Rect fill (rects/sec):  280.222  280.183   67.207   67.163
 32x32 blits (blits/sec):  1486.21  1486.75  1204.71  1198.71


and winuae on my 2ghz notebook just for lolz:

Pitch = 640
Hardware surfaces avail  = 1
Window manager avail     = 1
Blitter hardware         = 1
Colorkey blit hardware   = 0
Alpha blit hardware      = 0
Software->Hardware accel = 1
Video memory             = 8000

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec):  10.7239  10.4439  1.34612  1.34476
Fast points (frames/sec):  230.423  200.156  55.2319  54.9946
   Rect fill (rects/sec):  10893.6  10722.5  5050.55  4970.87
 32x32 blits (blits/sec):  15283.6    14681  14733.8  15058.8
Title: Re: p96 is unbelievably Slow!
Post by: stevieu on December 19, 2010, 03:03:11 AM
Who knows :)

Maybe it was psychological, but I found OS3.9 running CyberGraphX much more responsive with a BVision than I did a Mediator with a Voodoo 3 under P96.. maybe that was due to bus speed? Either way, it was never as nippy for me.

Steve
Title: Re: p96 is unbelievably Slow!
Post by: Karlos on December 19, 2010, 03:10:33 AM
If I recall correctly, on my 68040/BVision, I can get up to 17MB/s copy to VRAM (using a loop unrolled move16 transfer), using a regular move.l based copy is around 15 or so.

On my 040/Mediator/Voodoo setup, the speed was around 9-11MB/s maximum and that's a slightly faster 68040.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 03:17:58 AM
im evaluating it.
here is my current kuklomenos port that has been the reason of all that uproar:
http://www.daten-transport.de/?id=XfGNZPE2AaEn
it is compiled for hwsurface, that means the graphics go ito the vram, source included.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 03:20:49 AM
@karlos: the figures are slightly higher that i would even expect for voodoo, its 7mbs for me with a4k/060. bvision sounds likely though. but thats thanks to your optimized code i take it.
Title: Re: p96 is unbelievably Slow!
Post by: Karlos on December 19, 2010, 03:27:48 AM
Don't take the Voodoo figures too seriously, they are off the top of my head. I need to find them (or retest, but that machine is currently in need of attention). The BVision figures are good though.

I experimented a lot with move16 for both copying and other operations, like byteswap copying. Here, you allocate a cache aligned block on the stack, read data from the source swapping as you go, then using move16 to copy the block out to the VRAM. If you allocate enough cache-aligned space (say 64 bytes) you can unroll your transfer loop 4x which was about ideal (with some carefully optimised routines you could handle misaligned data since you do that reading from the source rather than transfering to the bitmap).

Not sure why move16 was faster on BVision VRAM and also it wasn't on every system tested. However, it was never slower. On some other cards, IIRC, like the CVision64, it was slower though.

All very hardware-dependent.
Title: Re: p96 is unbelievably Slow!
Post by: matthey on December 19, 2010, 04:24:24 AM
3kT060-75/Voodoo4/P96/16bitPC

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.236337 0.236337 0.029795 0.0297915
Fast points (frames/sec):  14.1695  14.1695  3.56382  3.56298
   Rect fill (rects/sec):  489.601  489.601  117.307  117.196
 32x32 blits (blits/sec):  2318.05  2318.05  2195.07  2214.05

***

3kT060-75/Voodoo4/P96/16bit

                          320x240  320x240  640x480  640x480
                          software hardware software hardware
Slow points (frames/sec): 0.236686 0.236798 0.0299196 0.0299196
Fast points (frames/sec):  14.0659  14.0659  3.54566  3.54492
   Rect fill (rects/sec):   500.55   500.55  120.177  120.234
 32x32 blits (blits/sec):  2340.57  2340.57  2234.59  2214.05
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 04:25:40 AM
another quick attempt on an sdl test application.
http://www.daten-transport.de/?id=EXKChmTTKWvf
this should run without ixemul as well. ive compiled it for 020 without any optimizations or weird options. works very well on my cgs/cv64 setup. it crawls on pIV/p96. and it runs on uae. what should i say?
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 04:28:43 AM
@matthey: not bad even if cv still looks better. but i start to suspect the test doesnt reflect sdl reality very well. try the above one.
Title: Re: p96 is unbelievably Slow!
Post by: Gulliver on December 19, 2010, 04:38:16 AM
@wawrzon

Have you tested this P96 version? http://lilliput.amiga-projects.net/Picasso96.htm

Give it a try, maybe it performs a bit better. Anyway, if you do, just let me know about your findings.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 05:03:28 AM
now, updated out of fooly without backup and there goes the damn sfs again!!! grr!! access outside the partition and the like. i have not used the damn machines for more then a year and forgot they still are infected with this superfu**ed_up_filesystem!

gulliver, just to be sure: you have tested the update with muforce on?
Title: Re: p96 is unbelievably Slow!
Post by: matthey on December 19, 2010, 05:39:39 AM
The pig game "feels" like it runs at full speed here. I get...

Average rendering frame rate: 7.4 fps
Average logic frame rate: 20 fps

@Gulliver
I am using those updates. Wawa needs to recheck his installations and maybe reinstall it because of that not so smart filesystem.

There is a site that has P96 speed tests. Compare the Voodoo 3 + G-Rex + CGFX to the Voodoo 4 + Mediator + P96. The G-Rex set up should be faster but it's not. This looks to me like the Mediator is faster because P96 is faster. The Voodoo 4 is not significanty faster than Voodoo 3 and in many cases on the Amiga slower.

http://www.amigaspeed.de.vu/

P.S. I added 16 bit "BE" test for sdlbench with my 16 bit "LE" PC test above.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 19, 2010, 05:49:50 AM
ok i can report that the new libraries of gulliver give me significant speedup with pig and sdlbench (details after the sun rises) but not with kulomenos alas. i have an impression they are quite bugged though and demand serious cleanup. i will probably have to backup and replace my filesystems on the p4 machine here before i go on with it. which i will not do before end of the year. but i might check the stuff more carefully on the voodoo system later today.

may i ask, who made these fixes?

@matthey: strange i get much higher results with these libs on p4 than you on voodoo, is that to be trusted? sorcery?
Title: Re: p96 is unbelievably Slow!
Post by: Gulliver on December 19, 2010, 11:29:32 AM
@wawrzon

I havent tested those libraries with muforce. So I was expecting some feedback from a devs point of view ;)

All these updates were obtained from post 2.1b Picasso96 updates, Prometheus drivers, Amithlon/AmigaOSXL updates, WinUAE, users submissions, ect. And were collected thru the years and added alltogether.

For various comments regarding this subject see http://eab.abime.net/showthread.php?t=50410&highlight=picasso96

@matthey

So with this new libraries in your system, now that you tested them for a while, how do think they perform?
Title: Re: p96 is unbelievably Slow!
Post by: matthey on December 19, 2010, 02:16:39 PM
Quote from: Gulliver;599878

@matthey
So with this new libraries in your system, now that you tested them for a while, how do think they perform?


I think the core P96 libraries in the update are good. I haven't ever received a MuForce/MuGaurdian Angel hit, they fix a few bugs, and they are a little faster. I can't say about the gfx card specific drivers as I don't have much if any experience with them. You did good :). Thanks.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 20, 2010, 12:03:22 AM
okay ive updated my voodoo3 setup as well. by hand this time. was not much to do, most libs were up to date. also my picture.datatype is newer than the contained in the archive. i dont recall where it is from. maybe it is some wos version, will have to take look at version number. all is working well, no hits, not much difference to before, excep a slight speedup in "pig" on little endian mode. in kuklomenos im getting 10fps in big endian and 7 in little.

on p4 things are quite different. there is quite a boost, but not in kuklomenos. i wonder why is this so slow. i suspect p96 is quite slow drawing the lines. if i inderstood you correctly, matt, also to draw lines in 3d cgx or p96 is used. is this correct? and this too is explicitely slow in w3d.

apart of that i now get with picassoIV system also a hit on startup. but that might be due to corrupted filesystem. i have to bring that in order first. here is a log, i dont think it will indicate anything without the sources though.
--------------------------------------
30-Sep-08  22:29:33
LONG READ from 00000020                        PC: 07373FE8
USP : 070B0FAC SR: 0010  (U0)(-)(-)  TCB: 070B0728
Data: 00000000 00000060 00000044 00000084 073579EC 07357ABA 07357ADE 00000000
----> 073579EC - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000016C
----> 07357ABA - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000023A
----> 07357ADE - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000025E
Addr: 00000000 FFFFFFFF 0738D354 0738D354 0735766E 0738D354 000046CC 07002340
Stck: 070008D4 0738A18C 00F81CBA 00000400 00000000 00000000 00000000 00000000
Stck: 00000000 07357ADE 0735766E 073586A6 00000000 073581BB 070B06B0 00F81100
Stck: 07357884 0738A18C 00FD56BE 070B0728 070008D4 00FD560E 00F8A06E 00000800
Stck: 72616D6C 69620000 00000000 070B0710 070B0772 00000000 00000001 070B1030
Stck: 00000014 00000002 003E26A2 003E3AFA 003F5571 00000000 00000000 00000038
Stck: 00000001 FFFFFFFF 070B11CC 01C50F87 00000001 00000001 00F8F5DE 00F8F6C2
Stck: 00F8F6B6 070B104C 00000000 000000C4 FEDCBA98 00000000 00000000 070113D8
Stck: 0000A4A4 10100000 00010005 00000000 00000000 0000A4A4 00101008 4DEF0000
----> 07373FE8 - "Work:Libs/Picasso96/rtg.library"  Hunk 0002 Offset 00017D58
----> 00F81CBA - "ROM - exec 45.20 (6.1.2002)"  Hunk 0000 Offset 00001C0C
----> 07357ADE - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000025E
----> 073586A6 - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 00000E26
----> 073581BB - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 0000093B
----> 00F81100 - "ROM - exec 45.20 (6.1.2002)"  Hunk 0000 Offset 00001052
----> 07357884 - "Work:Libs/Picasso96/rtg.library"  Hunk 0000 Offset 00000004
----> 00FD56BE - "ROM - ramlib 40.2 (5.3.93)"  Hunk 0000 Offset 000003E6
----> 00FD560E - "ROM - ramlib 40.2 (5.3.93)"  Hunk 0000 Offset 00000336
----> 00F8A06E - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 000005B2
----> 00F8F5DE - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 00005B22
----> 00F8F6C2 - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 00005C06
----> 00F8F6B6 - "ROM - dos 40.3 (1.4.93)"  Hunk 0000 Offset 00005BFA
PC-8: FF0C60BE 2F0E6068 700043FA 00424EAE FDD82C40 70FF91C8 72604EAE FFB82040
PC *: 20280020 41FAFFD2 208041FA FF7C2080 41FAFF60 208041FA FF0A2080 41FAF7B8
07373fc4 :  6dff fffe ff0c             blt.l $7363ed2 ;extended opcode
07373fca :  60be                       bra.s $7373f8a
07373fcc :  2f0e                       move.l a6,-(a7)
07373fce :  6068                       bra.s $7374038
07373fd0 :  7000                       moveq.l #$0,d0
07373fd2 :  43fa 0042                  lea.l $7374016(pc),a1
07373fd6 :  4eae fdd8                  jsr -$228(a6)
07373fda :  2c40                       movea.l d0,a6
07373fdc :  70ff                       moveq.l #-$1,d0
07373fde :  91c8                       suba.l a0,a0
07373fe0 :  7260                       moveq.l #$60,d1
07373fe2 :  4eae ffb8                  jsr -$48(a6)
07373fe6 :  2040                       movea.l d0,a0
07373fe8 : *2028 0020                  move.l $20(a0),d0
07373fec :  41fa ffd2                  lea.l $7373fc0(pc),a0
07373ff0 :  2080                       move.l d0,(a0)
07373ff2 :  41fa ff7c                  lea.l $7373f70(pc),a0
07373ff6 :  2080                       move.l d0,(a0)
07373ff8 :  41fa ff60                  lea.l $7373f5a(pc),a0
07373ffc :  2080                       move.l d0,(a0)
07373ffe :  41fa ff0a                  lea.l $7373f0a(pc),a0
07374002 :  2080                       move.l d0,(a0)
07374004 :  41fa f7b8                  lea.l $73737be(pc),a0
Name: "ramlib"
Title: Re: p96 is unbelievably Slow!
Post by: matthey on December 20, 2010, 03:58:17 AM
Quote from: wawrzon;600029
on p4 things are quite different. there is quite a boost, but not in kuklomenos. i wonder why is this so slow. i suspect p96 is quite slow drawing the lines.

If you look at the speed results at http://www.amigaspeed.de.vu/, you will see that the Voodoo 3 is almost 10 times faster at 2D line drawing than the Picasso 4 with CGFX 4. It also looks like P96 is a little faster than CGFX for line drawing with the Voodoo 3+ at least. I would expect the Picasso 4 driver to be good with P96 as well. The Picasso 4 is slow compared to Voodoo 3+ except where the gfx bus speed matters (bitmaps).

Quote
if i inderstood you correctly, matt, also to draw lines in 3d cgx or p96 is used. is this correct? and this too is explicitely slow in w3d.

No. I don't think so. Sorry if I mislead you. The Avenger libraries do call the appropriate CGFX or P96 Warp3D libraries which call the appropriate CGFX or P96 functions but these shouldn't be used for 3D lines. 3D lines need the Z value and are affected by the Z buffer. They are actually drawn as triangles as the Avenger has no support for 3D lines (or points). The Avenger does have support for 2D line drawing that is very fast in comparison. It would be very inefficient to use W3D_DrawLine() or similar to draw 2D lines.

Quote
apart of that i now get with picassoIV system also a hit on startup. but that might be due to corrupted filesystem. i have to bring that in order first. here is a log, i dont think it will indicate anything without the sources though.

I doubt it is the filesystem. It looks like a NULL pointer that is not tested. Let me translate it to something you might be able to read...

...
OpenLibrary (libName="expansion.library", version=0)
configDev = FindConfigDev (oldConfigDev=0, manufacturer=-1, product=$60)
tmp = configDev->cd_BoardAddr
globalvar1 = tmp
globalvar2 = tmp
...

See, configDev was never tested for NULL before it was used (neither was the OpenLibrary return). cd_BoardAddr offset is $20 from 0 which explains the read from address $20. I would say that this code is a patch that was added in by an assembler programmer at a later date than when it was compiled. The biggest hint is that the patch does some self modifying code and then calls exec/CacheClearU(). This is not normally done in C. I would suggest reverting back to the original rtg.library 40.3994 (08/22/04) 217988 bytes. Actually, this is the version that I have been using all along without problems. It does not contain this hackish patch.
Title: Re: p96 is unbelievably Slow!
Post by: ChaosLord on December 20, 2010, 05:50:20 AM
MattHey FTW!
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 20, 2010, 04:42:45 PM
Quote from: matthey;600052
If you look at the speed results at http://www.amigaspeed.de.vu/, you will see that the Voodoo 3 is almost 10 times faster at 2D line drawing than the Picasso 4 with CGFX 4. It also looks like P96 is a little faster than CGFX for line drawing with the Voodoo 3+ at least. I would expect the Picasso 4 driver to be good with P96 as well. The Picasso 4 is slow compared to Voodoo 3+ except where the gfx bus speed matters (bitmaps).

im using picasso4 with p96 system and respectively cv64 with cgx (4)?. so this should be the optimal setup. though since i got to get rid of sfs im going to install mirror partitions with the alternative gfx system on each of the machines for the sake of testing.

Quote

No. I don't think so. Sorry if I mislead you. The Avenger libraries do call the appropriate CGFX or P96 Warp3D libraries which call the appropriate CGFX or P96 functions but these shouldn't be used for 3D lines. 3D lines need the Z value and are affected by the Z buffer. They are actually drawn as triangles as the Avenger has no support for 3D lines (or points). The Avenger does have support for 2D line drawing that is very fast in comparison. It would be very inefficient to use W3D_DrawLine() or similar to draw 2D lines.

i stay corrected. i  have still the impression though that the line drawing in w3d is generally not as fast as it should be. see 4dtris and snake3d. btw have you corrected that issue in your w3d libs where the color texturing loses saturation (foobillard,brass parts)? i can look at what function is actually used there if you dont know what im talking about.

Quote


I doubt it is the filesystem. It looks like a NULL pointer that is not tested. Let me translate it to something you might be able to read...

...
OpenLibrary (libName="expansion.library", version=0)
configDev = FindConfigDev (oldConfigDev=0, manufacturer=-1, product=$60)
tmp = configDev->cd_BoardAddr
globalvar1 = tmp
globalvar2 = tmp
...

See, configDev was never tested for NULL before it was used (neither was the OpenLibrary return). cd_BoardAddr offset is $20 from 0 which explains the read from address $20. I would say that this code is a patch that was added in by an assembler programmer at a later date than when it was compiled. The biggest hint is that the patch does some self modifying code and then calls exec/CacheClearU(). This is not normally done in C. I would suggest reverting back to the original rtg.library 40.3994 (08/22/04) 217988 bytes. Actually, this is the version that I have been using all along without problems. It does not contain this hackish patch.


sounds you are right. since i did it by hand on voodoo system i didnt install gullivers 40.3994 because iv already had a lib with this version number. in this case gulliver should also revert his package to this version of rtg.library.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 20, 2010, 08:06:11 PM
Quote from: matthey;600052

I would suggest reverting back to the original rtg.library 40.3994 (08/22/04) 217988 bytes. Actually, this is the version that I have been using all along without problems. It does not contain this hackish patch.


i can confirm now the hits due to the gillivers rtg.library have vanished when reverted ot the amithlon 40.3994 (08/22/04) 217988 bytes version.
@gulliver: either roll back your boing bag to this version or (better yet) let the new library edit again, whoever did the patching. it seems, it was working well apart of this one initial read hit. read hit is not critical but may be annoying.
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 20, 2010, 11:11:05 PM
ive found the reason why p4/p96 on one of my machines was so slow. i removed the cyberstorm ram module and forgot it lying disassembled on my desk just in front of my eyes. how dumb is that!!!! :facepalm: the machine was running from z3 and mobo memory all the time!!
Title: Re: p96 is unbelievably Slow!
Post by: adz on December 20, 2010, 11:27:58 PM
Quote from: wawrzon;600228
ive found the reason why p4/p96 on one of my machines was so slow. i removed the cyberstorm ram module and forgot it lying disassembled on my desk just in front of my eyes. how dumb is that!!!! :facepalm: the machine was running from z3 and mobo memory all the time!!


(http://www.force10.co.uk/pics/doublefacepalm.jpg)
Title: Re: p96 is unbelievably Slow!
Post by: Karlos on December 20, 2010, 11:45:08 PM
Quote from: wawrzon;600228
ive found the reason why p4/p96 on one of my machines was so slow. i removed the cyberstorm ram module and forgot it lying disassembled on my desk just in front of my eyes. how dumb is that!!!! :facepalm: the machine was running from z3 and mobo memory all the time!!


(http://extropia.co.uk/img/noram.png)

We've all had days like that...
Title: Re: p96 is unbelievably Slow!
Post by: wawrzon on December 21, 2010, 01:08:57 AM
now, is that for real??:
Mode = 320x240, software
Pitch = 640
Hardware surfaces avail = 1
Window manager avail = 1
Blitter hardware = 1
Colorkey blit hardware = 0
Alpha blit hardware = 0
Software->Hardware accel = 1
Video memory = 8000

Slow points test
Fast points test
Rect fill test
32x32 Blitter test
Mode = 320x240, hardware
Slow points test
Fast points test
Rect fill test
32x32 Blitter test
Mode = 640x480, software
Slow points test
Fast points test
Rect fill test
32x32 Blitter test
Mode = 640x480, hardware
Slow points test
Fast points test
Rect fill test
32x32 Blitter test
320x240 320x240 640x480 640x480
software hardware software hardware
Slow points (frames/sec): 1.386 1.38961 0.181905 0.181934
Fast points (frames/sec): 14.6027 14.6127 3.67341 3.67415
Rect fill (rects/sec): 2377.25 2380.01 790.581 790.886
32x32 blits (blits/sec): 5403.69 5375.33 5354.25 5340.29