Welcome, Guest. Please login or register.

Author Topic: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)  (Read 4330 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #14 on: September 02, 2012, 02:47:55 PM »
I am hoping that with some experimentation and a limitless supply of irreplaceable old BVision cards*, I might be able to get DMA working at least.

At the bare minimum, I would like to be able to use DMA command / vertex transport as it would increase the parallelism between the CPU and the Permedia significantly. The CPU would also be able to spend far less time polling the chip to see if it is busy and whether or not it has space in it's limited FIFO for more data. Which, having stripped away many layers of code from the innermost loops, is where it now spends most of it's time.

* I wish...
« Last Edit: September 02, 2012, 02:56:39 PM by Karlos »
int p; // A
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #15 on: September 02, 2012, 03:14:36 PM »
@Karlos
It seams that the Amiga PCI buses were setup to do the bare minimum to work rather than being compliant. I had similar problems trying to get swizzling (endian translation) of 3D registers to work on the Avenger/Napalm with the Mediator. Either I'm reading the docs incorrectly or the Mediator PCI bus mapping is "funny". That's pretty simple compared to getting DMA working ;). At least these old cards are easier to program and better documented than the new ones. Have you seen the ATI docs? Ugh. I did have a look at the Permedia 2 docs. It's pretty clean. Too bad ease of use is not a consideration for GPUs (or CPUs). It's possible for programmers to get more out of what is simpler and easier to use.
 

Offline delshay

  • Hero Member
  • *****
  • Join Date: Mar 2004
  • Posts: 1009
    • Show only replies by delshay
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #16 on: September 02, 2012, 03:16:07 PM »
GLQUAKEWOS is super smooth here 800x600,it will interesting to see how it performs against 640x480. I don't expect the gap to be large between my set-up, which is set-up for high-speed bus with no overclocking which is banned for this year.

Focus here is switch to Bvision performance,its getting quicker that's the good news,and I learn some new things about the Bvision,which I have now added to both my cards,but one card has a few more changes than the other but none of this is visible when looking at the card.
« Last Edit: September 02, 2012, 03:20:27 PM by delshay »
-------------
power is nothing without control
 

Offline delshay

  • Hero Member
  • *****
  • Join Date: Mar 2004
  • Posts: 1009
    • Show only replies by delshay
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #17 on: September 02, 2012, 03:40:03 PM »
If Bvision or Blizzard get damage during testing not a problem here I just repair it. But like to point out its very very difficult to damage my card as I have fixed everything that I know that can go wrong. So I can throw insane frequency at any card and it will work or just not work.

Both my cards Blizzard/Bvision has massive advantage over any other classic card,this is why some
Bvision driver software settings  which does not work on all users set-up works here, because some problems here have being fixed.

 This some if you already know with permedia 2 clocked @125Mhz+.
« Last Edit: September 02, 2012, 04:49:03 PM by delshay »
-------------
power is nothing without control
 

Offline rvo_nl

  • Lifetime Member
  • Hero Member
  • *****
  • Join Date: Oct 2006
  • Posts: 860
    • Show only replies by rvo_nl
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #18 on: September 02, 2012, 04:46:22 PM »
Hello delshay,
 
Even though you seem to 'spam' every ppc/bvision - related thread with your overclocking 'progress reports', Im still curious about what you are doing there.
 
Enough now with all the secrecy (we are adults, right?), care to post a few pics of your mods so far?
Amiga 1200 (1d4) Kickstart 3.1 (40.68), Elbox Power/Winner tower (450w psu), BlizzardPPC 603e+ @240mhz & 060 @50mhz, 256MB, Bvision, IDE-fix Express, IndivisionAGA, 120GB IDE, cd, dvd, Cocolino, Micronik Keycase, PCMCIA Ethernet, Ratte monitor switcher, Prelude1200, triple boot WB3.1 / OS3.9 / OS4.1, Win95 / MacOS8.1
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #19 on: September 02, 2012, 04:52:13 PM »
Quote from: delshay;706244

Both my cards Blizzard/Bvision has massive advantage over any other classic card,this is why Bvision driver software settings  which does not work on all users set-up works here, because some problems here have being fixed.


Does overclocking help 3D performance that much? I overclocked a Voodoo 3 by quite a bit and it helped 2D performance but I couldn't see any change in 3D performance. It probably helped some but very little.

Is the Permedia 3 pin compatible and register compatible to the Permedia 2? How much gfx memory does it allow? There was someone who upgraded a Cybervision 64/3D Virge to a Virge DX. Supposedly the DX was used on some later Cybervision 64/3Ds anyway. Upgrading the amount of gfx memory would be the best hack but also the most difficult.
 

Offline delshay

  • Hero Member
  • *****
  • Join Date: Mar 2004
  • Posts: 1009
    • Show only replies by delshay
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #20 on: September 02, 2012, 05:12:47 PM »
The biggest performance comes from the PCI bus. As soon as the PCI bus hits 38.5Mhz+ you will find all ppc games speeds-up its very very clear. Heretic II (800x600) for one corvus looks like he's jogging than walking & wipeout you can't tell the difference between 800x600 & 1024x768 speed wise. The most impressive thing here is when you destroy a ship with a laser in wipeout there not even a hint of any slowdown which I had before the update.

Part of the set-up lies in what this card is doing but its not available on any other card as it needs specific EDO memory.

 http://www.amiga.org/gallery/index.php?n=3692
« Last Edit: September 02, 2012, 05:27:35 PM by delshay »
-------------
power is nothing without control
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #21 on: September 02, 2012, 08:51:28 PM »
Quote from: matthey;706240
@Karlos
It seams that the Amiga PCI buses were setup to do the bare minimum to work rather than being compliant. I had similar problems trying to get swizzling (endian translation) of 3D registers to work on the Avenger/Napalm with the Mediator. Either I'm reading the docs incorrectly or the Mediator PCI bus mapping is "funny". That's pretty simple compared to getting DMA working ;).

I did try it in the past, but didn't manage to get the mapping correct. The Permedia would read some address range unrelated to the one I was trying to give it and would lock up very quickly in a manner that's nigh on impossible to debug. You just have no way of seeing what location it sees.

Perhaps the pci.library will help this time round.

Quote
At least these old cards are easier to program and better documented than the new ones. Have you seen the ATI docs? Ugh.


They aren't so bad, but they are quite big :) I've had to refer to them a number of times trying to bring the R100/R200 drivers up to date.

Quote
I did have a look at the Permedia 2 docs. It's pretty clean. Too bad ease of use is not a consideration for GPUs (or CPUs). It's possible for programmers to get more out of what is simpler and easier to use.


The permedia documentation is OK, but misses out various things. For instance, the address calculations used by patched (and subpatched) buffers are just not anywhere in the programmer manuals.
int p; // A
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #22 on: September 03, 2012, 07:09:10 PM »
Quote from: Karlos;706308
I did try it in the past, but didn't manage to get the mapping correct. The Permedia would read some address range unrelated to the one I was trying to give it and would lock up very quickly in a manner that's nigh on impossible to debug. You just have no way of seeing what location it sees.

Did you Byte Swap (BE->LE) the data used for the DMA buffer? Does writing to the raw GPU FIFO work instead of writing the 3D registers? The raw FIFO data format should be more similar to the data used in the DMA buffer is why I ask. It's too bad they don't allow expanding the FIFO buffer like the Avenger/Napalm. Just writing to the Avenger/Napalm registers provides pretty good performance and is very easy to use (minus the fact that the byte swapped BE->LE 3D register space doesn't seem to exist).

Quote from: Karlos;706308
They aren't so bad, but they are quite big :) I've had to refer to them a number of times trying to bring the R100/R200 drivers up to date.

Yea, there is way to many variations of the ATI boards with minor differences. It seems to me like the docs were written by the designers who didn't want to write the docs and assume everyone already knows as much as they do. The GPU is quite a bit different than the older ones. I did find the GPU instruction set very interesting but it's not easy to use. It's more like a SIMD/DSP processor with minimal conditional/branch support which makes sense for what it is I guess. There are more friendly and flexible ways to avoid branch hazards.

Quote from: Karlos;706308
The permedia documentation is OK, but misses out various things. For instance, the address calculations used by patched (and subpatched) buffers are just not anywhere in the programmer manuals.

That's always the case. The Avenger/Napalm docs are probably the best for any half way modern GFX board and there is still places that need more clarification.
« Last Edit: September 03, 2012, 07:11:13 PM by matthey »
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #23 on: September 03, 2012, 08:52:52 PM »
Quote from: matthey;706434
Did you Byte Swap (BE->LE) the data used for the DMA buffer?


I tried all the obvious byte and word swap combinations for both the buffer contents and the address and nothing seemed to work. I wondered if it just isn't initialised properly and maybe the driver is relying on PIO.

Quote
Does writing to the raw GPU FIFO work instead of writing the 3D registers?


It does. That's what the current driver is doing. The DMA format is actually a bit different in that it consists of register address tag / value pairs, which at first glance double the size of the data you are writing. But it also has a number of sequential and index address modes that allow you to compact data in the buffer somewhat.

Quote
It's too bad they don't allow expanding the FIFO buffer like the Avenger/Napalm.


Well, the buffer is up to 64KB and is address latched, so you can fill one buffer, set it off and then in parallel start building the next and if you fill it before the Permedia completes the previous, you can reset the DMA address without affecting the ongoing operation. What I had in mind was something like a ringbuffer type arrangement (for simplicity). Most of the time, you'd only be writing a few hundred bytes of vertex/command data, so a single fullsize buffer with a pair of pointers could be effective.

Quote
Yea, there is way to many variations of the ATI boards with minor differences. It seems to me like the docs were written by the designers who didn't want to write the docs and assume everyone already knows as much as they do. The GPU is quite a bit different than the older ones. I did find the GPU instruction set very interesting but it's not easy to use. It's more like a SIMD/DSP processor with minimal conditional/branch support which makes sense for what it is I guess. There are more friendly and flexible ways to avoid branch hazards.


You'd like CUDA / Stream / OpenCL if that's your bang :)
int p; // A
 

Offline delshay

  • Hero Member
  • *****
  • Join Date: Mar 2004
  • Posts: 1009
    • Show only replies by delshay
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #24 on: September 04, 2012, 01:07:35 AM »
Quote from: matthey;706255

Is the Permedia 3 pin compatible and register compatible to the Permedia 2? How much gfx memory does it allow? There was someone who upgraded a Cybervision 64/3D Virge to a Virge DX. Supposedly the DX was used on some later Cybervision 64/3Ds anyway. Upgrading the amount of gfx memory would be the best hack but also the most difficult.

Permedia 3 has higher BGA count,but would like to take a look at PDF if anyone has this.

Permedia 2 maximum memory 8Mb.

Maybe access to the Blizzard flashrom or another update may help. ( Maintainer )
« Last Edit: September 05, 2012, 10:49:11 AM by delshay »
-------------
power is nothing without control
 

Offline delshay

  • Hero Member
  • *****
  • Join Date: Mar 2004
  • Posts: 1009
    • Show only replies by delshay
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #25 on: September 05, 2012, 11:08:58 AM »
Quote from: matthey;706255
Does overclocking help 3D performance that much? I overclocked a Voodoo 3 by quite a bit and it helped 2D performance but I couldn't see any change in 3D performance. It probably helped some but very little.

I sometimes get myself confused and type gibberish in forums,but this link may give you some answers. I bet someone here is going to disagree with some of the posting in link below.

 http://forums.extremeoverclocking.com/t132793.html
« Last Edit: September 05, 2012, 01:23:13 PM by delshay »
-------------
power is nothing without control
 

Offline Iggy

  • Hero Member
  • *****
  • Join Date: Aug 2009
  • Posts: 5348
    • Show only replies by Iggy
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #26 on: September 05, 2012, 06:47:34 PM »
I don't know about older devices like the Permedia2.
 
But a Voodoo3 wouldn't really benefit from a highger clocked PCI bus as it can already run in a 33 or a 66 MHz slot.
"Not making any hard and fast rules means that the moderators can use their good judgment in moderation, and we think the results speak for themselves." - Amiga.org, terms of service

"You, got to stem the evil tide, and keep it on the the inside" - Rogers Waters

"God was never on your side" - Lemmy

Amiga! "Our appeal has become more selective"
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #27 on: September 05, 2012, 09:05:20 PM »
@Karlos
Maybe the DMA is disabled but still appears to work. The PCI CFGCommand register (offset 0x04 from configuration base) has several configuration bits that might make a difference to DMA. It's readable as well as writable so you could read it, change 1 bit and write it. Turning on bits 0 to 2 might make a difference to DMA? This is supposed to be setup correctly by the gfx card boot ROM but that doesn't happen on an Amiga. Initialization is up to the 2D driver writer. I'm still not sure that DMA is going to be the Holy Grail of 3D performance. It sounds like you are already getting more out of the card than higher spec x86 :). Take a look at the Quake III Arena hardware and test results here:

http://www.ultimatehardware.net/006.htm

Surely they would have DMA enabled for their driver but with that lack of performance?

Also, I flashed my Cyberstorm MK3 and now I have PCI configuration info in my "ESC" settings. It supposedly fixes some PCI bugs too. I think delshay suggested flashing your BPPC if I read his English correctly :-/.

@delshay
No, I don't have any docs on the Permedia 3. You can see the test results above for how it compares to the Permedia 2. It still didn't perform well. It might be more worthwhile to reverse engineer a G-Rex PCI if you have enough electrical experience. Then you could use something a bit more modern with more gfx memory.

I'm not a big fan of overclocking I/O buses like PCI, Zorro or SCSI. It can cause timing errors when not in spec of the standard any more. I have done conservative overclocking of the CPU, GPU, and memory buses where I observed a large enough benefit and where there was better than average cooling. I overclocked my CSMK3 to 75MHz (rev 6 68060) and the memory with it but left the SCSI bus and motherboard speed alone by using a different oscillator. It's very stable and fast.

@Iggy
I overclocked the Voodoo 3 GPU with memory (SGRAM) and possibly other gfx card chips (gives a higher RAMDAC) by about 25%, but that shouldn't affect the PCI speed. The 2D performance improved modestly while any difference in my 3D tests would not have been statistically significant. I do not overclock my Voodoo 4 at all. I have read that it is not tolerant of overclocking (risk of burning out GPU) and there is not much benefit. The Voodoo 3 with SGRAM is a little faster than the Voodoo 4 at most lower resolution (bandwidth) operations by the way. The Voodoo 3 with SGRAM is probably faster with memory or gfx bus or both.
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #28 on: September 05, 2012, 09:37:01 PM »
Quote from: matthey;706699
@Karlos
Maybe the DMA is disabled but still appears to work. The PCI CFGCommand register (offset 0x04 from configuration base) has several configuration bits that might make a difference to DMA. It's readable as well as writable so you could read it, change 1 bit and write it. Turning on bits 0 to 2 might make a difference to DMA? This is supposed to be setup correctly by the gfx card boot ROM but that doesn't happen on an Amiga.


It's definitely something on my "to revisit" list/

Quote
Initialization is up to the 2D driver writer. I'm still not sure that DMA is going to be the Holy Grail of 3D performance. It sounds like you are already getting more out of the card than higher spec x86 :). Take a look at the Quake III Arena hardware and test results here:

http://www.ultimatehardware.net/006.htm

Surely they would have DMA enabled for their driver but with that lack of performance?


DMA should definitely improve performance over the current PIO version provided the driver is able to optimize the structure of the DMA buffer properly (i.e., making best use of the available addressing modes to maximise the amount of data per address tag). The reason being that right now, the driver spends a lot of time polling the FIFO for space, despite having optimized the FIFO code to minimise the number of reads necessary. That time could be spent processing the data the chip will use next in a DMA / multi buffer strategy.
int p; // A
 

Offline delshay

  • Hero Member
  • *****
  • Join Date: Mar 2004
  • Posts: 1009
    • Show only replies by delshay
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #29 from previous page: September 25, 2012, 08:33:09 PM »
Diamond fire 1000 has two advantage over the Bvision one is the AGP the other is it has 125@Mhz Ultra Low Latency SGRAM.

Bvision is fitted with 100Mhz Fujitsu or Micron Sgram as standard. I do believe there is one other type of sgram fitted to card.

Permedia 2 works better with faster sgram. The difference between 100Mhz and 125Mhz sgram is big,and the difference between 125Mhz and 143Mhz sgram is smaller when using permedia 2.

How well the Permedia 2 overclocks depends on the sgram fitted and other factors which I will not discuss.

It will be interesting to see what the timing parameters are and is it taking advantage of Low Latency SGRAM.
« Last Edit: September 25, 2012, 09:43:52 PM by delshay »
-------------
power is nothing without control