Welcome, Guest. Please login or register.

Author Topic: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)  (Read 4349 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Just for the lulz...

[youtube]UcuNR3yyIo4[/youtube]

You'd have to be certifiable to try and play it though.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #1 on: August 31, 2012, 12:53:34 PM »
Quote from: paul1981;706012
Was that 640x480?

Yes. I don't have any lower resolution modes defined at the moment.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #2 on: August 31, 2012, 08:21:13 PM »
Quote from: rvo_nl;706024
Hah, great video. I was hoping for better,


A CSPPC + CVisionPPC would be better ;)

Quote
but I guess this is already quite an achievement. I remember when I got my first pc this was one of the first games I played on it and I remember thinking my Amiga -which was standing next to it and was equipped with ppc+bvision, would never be able to run that.

Wrong, wrong.


It runs, but it's not exactly what I'd call playable. I just thought I'd share it to show that improvements to the driver have been made. Recent iterations were suffering from a bad texture corruption bug which I've finally more or less solved.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #3 on: August 31, 2012, 10:43:18 PM »
Quote from: rvo_nl;706063
well done then :) Are there ways to improve performance by lowering settings or screen size?


Well, it might help, but I'm not sure that fillrate is the limit at 640x480. The main problems are:

1) Lack of texture memory - even at the lowest texture detail possible, there's not enough VRAM to hold them all, so paging them in and out is practically unavoidable.

2) The driver runs in PIO mode. Theoretically, DMA is possible, but I've never gotten it working. DMA based drawing would allow better overlap between the CPU and Permedia2. DMA could also theoretically help with texture transfer from main memory.

3) The driver only handles rendering as the final stage of a software 3D pipeline. Even as a PIO mode driver, it spends some time waiting to be able to send the chip commands. Instead, it could be doing some software 3D transformation calculations for the next polygon. But that would require massive updates to Warp3D (to include 3D transformation into the system) and whatever flavour of GL is implemented on it in order to use it.
 
Quote
Cant wait to get my system running again, I got to try this.


256MB is an absolute must. Just loading the game alone used close to 170MB...
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #4 on: September 02, 2012, 10:21:27 AM »
Quote from: rvo_nl;706066
Wow, that means it would never work on Cvisionppc? :| Im glad I have the full 256. Lets just hope I can quickly find out whats wrong with my machine.


Actually, Darren Eveland was able to run it on his CSPPC previously. I suspect the game actually grabs as much RAM as it is able and perhaps 128MB is enough.

There's a video of an older (pre-release) version of the driver on his system running quake 3 with the same (or at least similar) basic settings and it was certainly a bit faster.

[youtube]3GNtOM3ZLQM[/youtube]

You'll see some nasty graphics corruption at the end. This was a long-standing bug that has only been kicked into touch recently.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #5 on: September 02, 2012, 02:35:03 PM »
Quote from: delshay;706220
This is reflected here that I never needed to use a screemode below 800x600 & a small number of PPC games showed no difference with 1024x768 but you need a very high PCI bus speed to play at those screenmode.

Only with games where you don't have texture paging. And the larger the screenmode, the less room you have for textures and the more likely paging becomes.

Even with some recent optimisations in this area, Quake 3 on full texture detail will bring your Blizzard/BVision combination to it's knees, I guarantee it.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #6 on: September 02, 2012, 02:42:30 PM »
Incidentally, if you take a Permedia2, put it on an AGP bus, with a proper DMA enabled driver (where command buffers and textures are downloaded by the chip from system memory, interrupts can be used to signal stuff we have to poll for), and attach it to a CPU where the rest of the time spent is irrelevant, it can turn out good performance:

[youtube]5EA9xtM22tA[/youtube]

The above is on a FireGL card in a 1.4GHz P3 machine, where the chip is at least able to run at it's full potential, unhampered by all the problems it has in a BVision machine.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #7 on: September 02, 2012, 02:47:55 PM »
I am hoping that with some experimentation and a limitless supply of irreplaceable old BVision cards*, I might be able to get DMA working at least.

At the bare minimum, I would like to be able to use DMA command / vertex transport as it would increase the parallelism between the CPU and the Permedia significantly. The CPU would also be able to spend far less time polling the chip to see if it is busy and whether or not it has space in it's limited FIFO for more data. Which, having stripped away many layers of code from the innermost loops, is where it now spends most of it's time.

* I wish...
« Last Edit: September 02, 2012, 02:56:39 PM by Karlos »
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #8 on: September 02, 2012, 08:51:28 PM »
Quote from: matthey;706240
@Karlos
It seams that the Amiga PCI buses were setup to do the bare minimum to work rather than being compliant. I had similar problems trying to get swizzling (endian translation) of 3D registers to work on the Avenger/Napalm with the Mediator. Either I'm reading the docs incorrectly or the Mediator PCI bus mapping is "funny". That's pretty simple compared to getting DMA working ;).

I did try it in the past, but didn't manage to get the mapping correct. The Permedia would read some address range unrelated to the one I was trying to give it and would lock up very quickly in a manner that's nigh on impossible to debug. You just have no way of seeing what location it sees.

Perhaps the pci.library will help this time round.

Quote
At least these old cards are easier to program and better documented than the new ones. Have you seen the ATI docs? Ugh.


They aren't so bad, but they are quite big :) I've had to refer to them a number of times trying to bring the R100/R200 drivers up to date.

Quote
I did have a look at the Permedia 2 docs. It's pretty clean. Too bad ease of use is not a consideration for GPUs (or CPUs). It's possible for programmers to get more out of what is simpler and easier to use.


The permedia documentation is OK, but misses out various things. For instance, the address calculations used by patched (and subpatched) buffers are just not anywhere in the programmer manuals.
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #9 on: September 03, 2012, 08:52:52 PM »
Quote from: matthey;706434
Did you Byte Swap (BE->LE) the data used for the DMA buffer?


I tried all the obvious byte and word swap combinations for both the buffer contents and the address and nothing seemed to work. I wondered if it just isn't initialised properly and maybe the driver is relying on PIO.

Quote
Does writing to the raw GPU FIFO work instead of writing the 3D registers?


It does. That's what the current driver is doing. The DMA format is actually a bit different in that it consists of register address tag / value pairs, which at first glance double the size of the data you are writing. But it also has a number of sequential and index address modes that allow you to compact data in the buffer somewhat.

Quote
It's too bad they don't allow expanding the FIFO buffer like the Avenger/Napalm.


Well, the buffer is up to 64KB and is address latched, so you can fill one buffer, set it off and then in parallel start building the next and if you fill it before the Permedia completes the previous, you can reset the DMA address without affecting the ongoing operation. What I had in mind was something like a ringbuffer type arrangement (for simplicity). Most of the time, you'd only be writing a few hundred bytes of vertex/command data, so a single fullsize buffer with a pair of pointers could be effective.

Quote
Yea, there is way to many variations of the ATI boards with minor differences. It seems to me like the docs were written by the designers who didn't want to write the docs and assume everyone already knows as much as they do. The GPU is quite a bit different than the older ones. I did find the GPU instruction set very interesting but it's not easy to use. It's more like a SIMD/DSP processor with minimal conditional/branch support which makes sense for what it is I guess. There are more friendly and flexible ways to avoid branch hazards.


You'd like CUDA / Stream / OpenCL if that's your bang :)
int p; // A
 

Offline KarlosTopic starter

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show all replies
Re: Well, it's almost as fast as fullscreen AB3D2 on my machine ;)
« Reply #10 on: September 05, 2012, 09:37:01 PM »
Quote from: matthey;706699
@Karlos
Maybe the DMA is disabled but still appears to work. The PCI CFGCommand register (offset 0x04 from configuration base) has several configuration bits that might make a difference to DMA. It's readable as well as writable so you could read it, change 1 bit and write it. Turning on bits 0 to 2 might make a difference to DMA? This is supposed to be setup correctly by the gfx card boot ROM but that doesn't happen on an Amiga.


It's definitely something on my "to revisit" list/

Quote
Initialization is up to the 2D driver writer. I'm still not sure that DMA is going to be the Holy Grail of 3D performance. It sounds like you are already getting more out of the card than higher spec x86 :). Take a look at the Quake III Arena hardware and test results here:

http://www.ultimatehardware.net/006.htm

Surely they would have DMA enabled for their driver but with that lack of performance?


DMA should definitely improve performance over the current PIO version provided the driver is able to optimize the structure of the DMA buffer properly (i.e., making best use of the available addressing modes to maximise the amount of data per address tag). The reason being that right now, the driver spends a lot of time polling the FIFO for space, despite having optimized the FIFO code to minimise the number of reads necessary. That time could be spent processing the data the chip will use next in a DMA / multi buffer strategy.
int p; // A