Amiga.org

Amiga computer related discussion => Amiga Gaming => Topic started by: Karlos on August 31, 2012, 09:08:07 AM

Title: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on August 31, 2012, 09:08:07 AM
Just for the lulz...

[youtube]UcuNR3yyIo4[/youtube]

You'd have to be certifiable to try and play it though.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: paul1981 on August 31, 2012, 12:50:39 PM
Was that 640x480?
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on August 31, 2012, 12:53:34 PM
Quote from: paul1981;706012
Was that 640x480?

Yes. I don't have any lower resolution modes defined at the moment.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: rvo_nl on August 31, 2012, 04:01:26 PM
Hah, great video. I was hoping for better, but I guess this is already quite an achievement. I remember when I got my first pc this was one of the first games I played on it and I remember thinking my Amiga -which was standing next to it and was equipped with ppc+bvision, would never be able to run that.

Wrong, wrong.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on August 31, 2012, 08:21:13 PM
Quote from: rvo_nl;706024
Hah, great video. I was hoping for better,


A CSPPC + CVisionPPC would be better ;)

Quote
but I guess this is already quite an achievement. I remember when I got my first pc this was one of the first games I played on it and I remember thinking my Amiga -which was standing next to it and was equipped with ppc+bvision, would never be able to run that.

Wrong, wrong.


It runs, but it's not exactly what I'd call playable. I just thought I'd share it to show that improvements to the driver have been made. Recent iterations were suffering from a bad texture corruption bug which I've finally more or less solved.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Damion on August 31, 2012, 08:50:46 PM
Wow! Never thought I'd see that!
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: rvo_nl on August 31, 2012, 10:26:29 PM
Quote from: Karlos;706054
I just thought I'd share it to show that improvements to the driver have been made. Recent iterations were suffering from a bad texture corruption bug which I've finally more or less solved.

well done then :) Are there ways to improve performance by lowering settings or screen size?
 
Cant wait to get my system running again, I got to try this.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on August 31, 2012, 10:43:18 PM
Quote from: rvo_nl;706063
well done then :) Are there ways to improve performance by lowering settings or screen size?


Well, it might help, but I'm not sure that fillrate is the limit at 640x480. The main problems are:

1) Lack of texture memory - even at the lowest texture detail possible, there's not enough VRAM to hold them all, so paging them in and out is practically unavoidable.

2) The driver runs in PIO mode. Theoretically, DMA is possible, but I've never gotten it working. DMA based drawing would allow better overlap between the CPU and Permedia2. DMA could also theoretically help with texture transfer from main memory.

3) The driver only handles rendering as the final stage of a software 3D pipeline. Even as a PIO mode driver, it spends some time waiting to be able to send the chip commands. Instead, it could be doing some software 3D transformation calculations for the next polygon. But that would require massive updates to Warp3D (to include 3D transformation into the system) and whatever flavour of GL is implemented on it in order to use it.
 
Quote
Cant wait to get my system running again, I got to try this.


256MB is an absolute must. Just loading the game alone used close to 170MB...
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: rvo_nl on August 31, 2012, 11:16:27 PM
Wow, that means it would never work on Cvisionppc? :| Im glad I have the full 256. Lets just hope I can quickly find out whats wrong with my machine.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 02, 2012, 10:21:27 AM
Quote from: rvo_nl;706066
Wow, that means it would never work on Cvisionppc? :| Im glad I have the full 256. Lets just hope I can quickly find out whats wrong with my machine.


Actually, Darren Eveland was able to run it on his CSPPC previously. I suspect the game actually grabs as much RAM as it is able and perhaps 128MB is enough.

There's a video of an older (pre-release) version of the driver on his system running quake 3 with the same (or at least similar) basic settings and it was certainly a bit faster.

[youtube]3GNtOM3ZLQM[/youtube]

You'll see some nasty graphics corruption at the end. This was a long-standing bug that has only been kicked into touch recently.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 02, 2012, 11:46:50 AM
I think a modified Blizzard/Bvision will out class a Cyberstorm/cybervision in GFX ONLY according to the benchmark I have seen posted on other website. Cyberstorm has CPU power over Blizzard, but according to benchmark figures Blizzard has a faster PCI bus.

This is reflected here that I never needed to use a screemode below 800x600 & a small number of PPC games showed no difference with 1024x768 but you need a very high PCI bus speed to play at those screenmode.

Cyberstorm 400Mhz will not hold this crown for long on classic amiga, I just need to finish the last Bvision project ( Master Bvision ).
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 02, 2012, 02:35:03 PM
Quote from: delshay;706220
This is reflected here that I never needed to use a screemode below 800x600 & a small number of PPC games showed no difference with 1024x768 but you need a very high PCI bus speed to play at those screenmode.

Only with games where you don't have texture paging. And the larger the screenmode, the less room you have for textures and the more likely paging becomes.

Even with some recent optimisations in this area, Quake 3 on full texture detail will bring your Blizzard/BVision combination to it's knees, I guarantee it.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: matthey on September 02, 2012, 02:35:11 PM
Quote from: delshay;706220
I think a modified Blizzard/Bvision will out class a Cyberstorm/cybervision in GFX ONLY according to the benchmark I have seen posted on other website. Cyberstorm has CPU power over Blizzard, but according to benchmark figures Blizzard has a faster PCI bus.

This is reflected here that I never needed to use a screemode below 800x600 & a small number of PPC games showed no difference with 1024x768 but you need a very high PCI bus speed to play at those screenmode.

The only reason PCI bus speed is important is because there is not enough texture memory on the Permedia 2 as Karlos pointed out. As the resolution increases, more and more textures have to be retransferred over the gfx bus. A fast 68060 with a slow gfx bus Mediator and Voodoo 3+ with plenty of texture memory can compete with a faster PPC with faster gfx bus and Permedia 2 with low texture memory for some 3D. It's a matter of how much work doesn't have to be redone. My 68060@75MHz with Voodoo 4 640x480x16 is more playable with GLQuake than Karlos's demo. Don't be surprised if that same rev6 68060@100MHz with Voodoo3+ in a Natami with non-handicapped PCI gfx bus outperforms Classic PowerPC boards for 3D requiring a lot of texture memory.

Edit: Karlos beat me to it. Texture memory rules, at least when there isn't enough ;).
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 02, 2012, 02:42:30 PM
Incidentally, if you take a Permedia2, put it on an AGP bus, with a proper DMA enabled driver (where command buffers and textures are downloaded by the chip from system memory, interrupts can be used to signal stuff we have to poll for), and attach it to a CPU where the rest of the time spent is irrelevant, it can turn out good performance:

[youtube]5EA9xtM22tA[/youtube]

The above is on a FireGL card in a 1.4GHz P3 machine, where the chip is at least able to run at it's full potential, unhampered by all the problems it has in a BVision machine.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 02, 2012, 02:47:55 PM
I am hoping that with some experimentation and a limitless supply of irreplaceable old BVision cards*, I might be able to get DMA working at least.

At the bare minimum, I would like to be able to use DMA command / vertex transport as it would increase the parallelism between the CPU and the Permedia significantly. The CPU would also be able to spend far less time polling the chip to see if it is busy and whether or not it has space in it's limited FIFO for more data. Which, having stripped away many layers of code from the innermost loops, is where it now spends most of it's time.

* I wish...
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: matthey on September 02, 2012, 03:14:36 PM
@Karlos
It seams that the Amiga PCI buses were setup to do the bare minimum to work rather than being compliant. I had similar problems trying to get swizzling (endian translation) of 3D registers to work on the Avenger/Napalm with the Mediator. Either I'm reading the docs incorrectly or the Mediator PCI bus mapping is "funny". That's pretty simple compared to getting DMA working ;). At least these old cards are easier to program and better documented than the new ones. Have you seen the ATI docs? Ugh. I did have a look at the Permedia 2 docs. It's pretty clean. Too bad ease of use is not a consideration for GPUs (or CPUs). It's possible for programmers to get more out of what is simpler and easier to use.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 02, 2012, 03:16:07 PM
GLQUAKEWOS is super smooth here 800x600,it will interesting to see how it performs against 640x480. I don't expect the gap to be large between my set-up, which is set-up for high-speed bus with no overclocking which is banned for this year.

Focus here is switch to Bvision performance,its getting quicker that's the good news,and I learn some new things about the Bvision,which I have now added to both my cards,but one card has a few more changes than the other but none of this is visible when looking at the card.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 02, 2012, 03:40:03 PM
If Bvision or Blizzard get damage during testing not a problem here I just repair it. But like to point out its very very difficult to damage my card as I have fixed everything that I know that can go wrong. So I can throw insane frequency at any card and it will work or just not work.

Both my cards Blizzard/Bvision has massive advantage over any other classic card,this is why some
Bvision driver software settings  which does not work on all users set-up works here, because some problems here have being fixed.

 This some if you already know with permedia 2 clocked @125Mhz+.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: rvo_nl on September 02, 2012, 04:46:22 PM
Hello delshay,
 
Even though you seem to 'spam' every ppc/bvision - related thread with your overclocking 'progress reports', Im still curious about what you are doing there.
 
Enough now with all the secrecy (we are adults, right?), care to post a few pics of your mods so far?
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: matthey on September 02, 2012, 04:52:13 PM
Quote from: delshay;706244

Both my cards Blizzard/Bvision has massive advantage over any other classic card,this is why Bvision driver software settings  which does not work on all users set-up works here, because some problems here have being fixed.


Does overclocking help 3D performance that much? I overclocked a Voodoo 3 by quite a bit and it helped 2D performance but I couldn't see any change in 3D performance. It probably helped some but very little.

Is the Permedia 3 pin compatible and register compatible to the Permedia 2? How much gfx memory does it allow? There was someone who upgraded a Cybervision 64/3D Virge to a Virge DX. Supposedly the DX was used on some later Cybervision 64/3Ds anyway. Upgrading the amount of gfx memory would be the best hack but also the most difficult.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 02, 2012, 05:12:47 PM
The biggest performance comes from the PCI bus. As soon as the PCI bus hits 38.5Mhz+ you will find all ppc games speeds-up its very very clear. Heretic II (800x600) for one corvus looks like he's jogging than walking & wipeout you can't tell the difference between 800x600 & 1024x768 speed wise. The most impressive thing here is when you destroy a ship with a laser in wipeout there not even a hint of any slowdown which I had before the update.

Part of the set-up lies in what this card is doing but its not available on any other card as it needs specific EDO memory.

 http://www.amiga.org/gallery/index.php?n=3692
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 02, 2012, 08:51:28 PM
Quote from: matthey;706240
@Karlos
It seams that the Amiga PCI buses were setup to do the bare minimum to work rather than being compliant. I had similar problems trying to get swizzling (endian translation) of 3D registers to work on the Avenger/Napalm with the Mediator. Either I'm reading the docs incorrectly or the Mediator PCI bus mapping is "funny". That's pretty simple compared to getting DMA working ;).

I did try it in the past, but didn't manage to get the mapping correct. The Permedia would read some address range unrelated to the one I was trying to give it and would lock up very quickly in a manner that's nigh on impossible to debug. You just have no way of seeing what location it sees.

Perhaps the pci.library will help this time round.

Quote
At least these old cards are easier to program and better documented than the new ones. Have you seen the ATI docs? Ugh.


They aren't so bad, but they are quite big :) I've had to refer to them a number of times trying to bring the R100/R200 drivers up to date.

Quote
I did have a look at the Permedia 2 docs. It's pretty clean. Too bad ease of use is not a consideration for GPUs (or CPUs). It's possible for programmers to get more out of what is simpler and easier to use.


The permedia documentation is OK, but misses out various things. For instance, the address calculations used by patched (and subpatched) buffers are just not anywhere in the programmer manuals.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: matthey on September 03, 2012, 07:09:10 PM
Quote from: Karlos;706308
I did try it in the past, but didn't manage to get the mapping correct. The Permedia would read some address range unrelated to the one I was trying to give it and would lock up very quickly in a manner that's nigh on impossible to debug. You just have no way of seeing what location it sees.

Did you Byte Swap (BE->LE) the data used for the DMA buffer? Does writing to the raw GPU FIFO work instead of writing the 3D registers? The raw FIFO data format should be more similar to the data used in the DMA buffer is why I ask. It's too bad they don't allow expanding the FIFO buffer like the Avenger/Napalm. Just writing to the Avenger/Napalm registers provides pretty good performance and is very easy to use (minus the fact that the byte swapped BE->LE 3D register space doesn't seem to exist).

Quote from: Karlos;706308
They aren't so bad, but they are quite big :) I've had to refer to them a number of times trying to bring the R100/R200 drivers up to date.

Yea, there is way to many variations of the ATI boards with minor differences. It seems to me like the docs were written by the designers who didn't want to write the docs and assume everyone already knows as much as they do. The GPU is quite a bit different than the older ones. I did find the GPU instruction set very interesting but it's not easy to use. It's more like a SIMD/DSP processor with minimal conditional/branch support which makes sense for what it is I guess. There are more friendly and flexible ways to avoid branch hazards.

Quote from: Karlos;706308
The permedia documentation is OK, but misses out various things. For instance, the address calculations used by patched (and subpatched) buffers are just not anywhere in the programmer manuals.

That's always the case. The Avenger/Napalm docs are probably the best for any half way modern GFX board and there is still places that need more clarification.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 03, 2012, 08:52:52 PM
Quote from: matthey;706434
Did you Byte Swap (BE->LE) the data used for the DMA buffer?


I tried all the obvious byte and word swap combinations for both the buffer contents and the address and nothing seemed to work. I wondered if it just isn't initialised properly and maybe the driver is relying on PIO.

Quote
Does writing to the raw GPU FIFO work instead of writing the 3D registers?


It does. That's what the current driver is doing. The DMA format is actually a bit different in that it consists of register address tag / value pairs, which at first glance double the size of the data you are writing. But it also has a number of sequential and index address modes that allow you to compact data in the buffer somewhat.

Quote
It's too bad they don't allow expanding the FIFO buffer like the Avenger/Napalm.


Well, the buffer is up to 64KB and is address latched, so you can fill one buffer, set it off and then in parallel start building the next and if you fill it before the Permedia completes the previous, you can reset the DMA address without affecting the ongoing operation. What I had in mind was something like a ringbuffer type arrangement (for simplicity). Most of the time, you'd only be writing a few hundred bytes of vertex/command data, so a single fullsize buffer with a pair of pointers could be effective.

Quote
Yea, there is way to many variations of the ATI boards with minor differences. It seems to me like the docs were written by the designers who didn't want to write the docs and assume everyone already knows as much as they do. The GPU is quite a bit different than the older ones. I did find the GPU instruction set very interesting but it's not easy to use. It's more like a SIMD/DSP processor with minimal conditional/branch support which makes sense for what it is I guess. There are more friendly and flexible ways to avoid branch hazards.


You'd like CUDA / Stream / OpenCL if that's your bang :)
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 04, 2012, 01:07:35 AM
Quote from: matthey;706255

Is the Permedia 3 pin compatible and register compatible to the Permedia 2? How much gfx memory does it allow? There was someone who upgraded a Cybervision 64/3D Virge to a Virge DX. Supposedly the DX was used on some later Cybervision 64/3Ds anyway. Upgrading the amount of gfx memory would be the best hack but also the most difficult.

Permedia 3 has higher BGA count,but would like to take a look at PDF if anyone has this.

Permedia 2 maximum memory 8Mb.

Maybe access to the Blizzard flashrom or another update may help. ( Maintainer )
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 05, 2012, 11:08:58 AM
Quote from: matthey;706255
Does overclocking help 3D performance that much? I overclocked a Voodoo 3 by quite a bit and it helped 2D performance but I couldn't see any change in 3D performance. It probably helped some but very little.

I sometimes get myself confused and type gibberish in forums,but this link may give you some answers. I bet someone here is going to disagree with some of the posting in link below.

 http://forums.extremeoverclocking.com/t132793.html
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Iggy on September 05, 2012, 06:47:34 PM
I don't know about older devices like the Permedia2.
 
But a Voodoo3 wouldn't really benefit from a highger clocked PCI bus as it can already run in a 33 or a 66 MHz slot.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: matthey on September 05, 2012, 09:05:20 PM
@Karlos
Maybe the DMA is disabled but still appears to work. The PCI CFGCommand register (offset 0x04 from configuration base) has several configuration bits that might make a difference to DMA. It's readable as well as writable so you could read it, change 1 bit and write it. Turning on bits 0 to 2 might make a difference to DMA? This is supposed to be setup correctly by the gfx card boot ROM but that doesn't happen on an Amiga. Initialization is up to the 2D driver writer. I'm still not sure that DMA is going to be the Holy Grail of 3D performance. It sounds like you are already getting more out of the card than higher spec x86 :). Take a look at the Quake III Arena hardware and test results here:

http://www.ultimatehardware.net/006.htm

Surely they would have DMA enabled for their driver but with that lack of performance?

Also, I flashed my Cyberstorm MK3 and now I have PCI configuration info in my "ESC" settings. It supposedly fixes some PCI bugs too. I think delshay suggested flashing your BPPC if I read his English correctly :-/.

@delshay
No, I don't have any docs on the Permedia 3. You can see the test results above for how it compares to the Permedia 2. It still didn't perform well. It might be more worthwhile to reverse engineer a G-Rex PCI if you have enough electrical experience. Then you could use something a bit more modern with more gfx memory.

I'm not a big fan of overclocking I/O buses like PCI, Zorro or SCSI. It can cause timing errors when not in spec of the standard any more. I have done conservative overclocking of the CPU, GPU, and memory buses where I observed a large enough benefit and where there was better than average cooling. I overclocked my CSMK3 to 75MHz (rev 6 68060) and the memory with it but left the SCSI bus and motherboard speed alone by using a different oscillator. It's very stable and fast.

@Iggy
I overclocked the Voodoo 3 GPU with memory (SGRAM) and possibly other gfx card chips (gives a higher RAMDAC) by about 25%, but that shouldn't affect the PCI speed. The 2D performance improved modestly while any difference in my 3D tests would not have been statistically significant. I do not overclock my Voodoo 4 at all. I have read that it is not tolerant of overclocking (risk of burning out GPU) and there is not much benefit. The Voodoo 3 with SGRAM is a little faster than the Voodoo 4 at most lower resolution (bandwidth) operations by the way. The Voodoo 3 with SGRAM is probably faster with memory or gfx bus or both.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: Karlos on September 05, 2012, 09:37:01 PM
Quote from: matthey;706699
@Karlos
Maybe the DMA is disabled but still appears to work. The PCI CFGCommand register (offset 0x04 from configuration base) has several configuration bits that might make a difference to DMA. It's readable as well as writable so you could read it, change 1 bit and write it. Turning on bits 0 to 2 might make a difference to DMA? This is supposed to be setup correctly by the gfx card boot ROM but that doesn't happen on an Amiga.


It's definitely something on my "to revisit" list/

Quote
Initialization is up to the 2D driver writer. I'm still not sure that DMA is going to be the Holy Grail of 3D performance. It sounds like you are already getting more out of the card than higher spec x86 :). Take a look at the Quake III Arena hardware and test results here:

http://www.ultimatehardware.net/006.htm

Surely they would have DMA enabled for their driver but with that lack of performance?


DMA should definitely improve performance over the current PIO version provided the driver is able to optimize the structure of the DMA buffer properly (i.e., making best use of the available addressing modes to maximise the amount of data per address tag). The reason being that right now, the driver spends a lot of time polling the FIFO for space, despite having optimized the FIFO code to minimise the number of reads necessary. That time could be spent processing the data the chip will use next in a DMA / multi buffer strategy.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 25, 2012, 08:33:09 PM
Diamond fire 1000 has two advantage over the Bvision one is the AGP the other is it has 125@Mhz Ultra Low Latency SGRAM.

Bvision is fitted with 100Mhz Fujitsu or Micron Sgram as standard. I do believe there is one other type of sgram fitted to card.

Permedia 2 works better with faster sgram. The difference between 100Mhz and 125Mhz sgram is big,and the difference between 125Mhz and 143Mhz sgram is smaller when using permedia 2.

How well the Permedia 2 overclocks depends on the sgram fitted and other factors which I will not discuss.

It will be interesting to see what the timing parameters are and is it taking advantage of Low Latency SGRAM.
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: MiAmigo on September 25, 2012, 09:06:06 PM
To quote from QIII: IMPRESSIVE!

I KNEW it could be done!
Title: Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
Post by: delshay on September 26, 2012, 07:14:36 AM
I do believe diamond fire GL 1000 pro has 125Mhz low latency sgram,please correct me if im wrong