Welcome, Guest. Please login or register.

Author Topic: Off topic: Warp3D stuff (branched from AmiQuake2 AGA thread)  (Read 1429 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: AmiQuake 2 - new 68k Quake 2 Port
« on: February 06, 2013, 10:09:49 PM »
Quote from: Crumb;725494
Is there any "legal" way to put new commands into the ring buffer of the gpu (aka GPU FIFO)? Or just examining the GPU registers to see if the read pointer and write pointer are equal and assume it's ready to add the new commands there?


I'm not sure what you are asking exactly. The R100/200 3D drivers communicate with the radeon's ring buffer through the same system resource that the 2D driver does. Just to be clear, my musings above are in relation to the Permedia2 specifically.

Purely hypothetically, if I were to retrofit a 3D T&L pipeline to Warp3D I'd probably do so by exposing an interface that's not a million miles from the model already defined for the R100/200. Some methods to set transformation matrices to be applied to geometry, texture coordinates, lighting and material properties and clipping planes. In the corresponding drivers for R100/200 these would be mapped onto it's hardware implementation of these things for maximum efficiency.

However, as it's still a fairly low-level model, it would lend itself to drivers like the Permedia also. In the Permedia, data registers and command operations are loaded through a FIFO that can be written to directly or through DMA from a buffer. The latter is what I'd like to use but it's proven elusive for me to get working in the BVision (I'm sure it ought to be possible, even if it's from a location in VRAM rather than host mapped memory which is what the original intention was). In the current driver, the FIFO is loaded with data and then the driver has to wait for it to drain. I don't query the remaining FIFO space every time, instead I get a count and only read it again after writing as many entries as the last count value. That removes some overhead, but in the end, if you are rendering blended, perspective textured, fogged z-buffered polygons you will inevitably reach a position where the CPU is able to load the FIFO faster than the Permedia is emptying it and you end up polling the count register until there's enough room for your next operation.

We could use that time more productively if we were reading untransformed vertex data from memory, performing the transformation calculations in software and then writing the result to the FIFO because there'd be an improvement in the parallelism. It would take slightly longer per vertex to take a user-coordinate-space triangle fan and write it to the chip, but at the same time, you'd be doing those calculations at a point where you previously were just polling a register having already spent a similar amount of CPU time beforehand to do the transformation elsewhere.

Quote
I guess p96/cgx waits until GPU FIFO is ready, otherwise crashes may happen. I guess that the difference between CGX and P96 in this respect is that CGX driver will probably offer a "legal" way to add new pointers to command buffers and with P96 you can't "interleave" commands from both systems.


As I said, both the 3D and 2D drivers use the same resource on R200/100. However, Warp3D takes owenrship of the hardware system so at least when a hardware lock is in place, nobody else can be submitting packets to it. Outside of that situation, anybody can write to the resource. That's what the command processor / ring buffer was intended for.

Quote
And to continue the hijacking... any plans for a BlizzardPPC OS4 scsi driver?


Well I started, but I haven't really made progress due to a lack of available free time. I had to prioritise and the Warp3D I worked on affects more users overall. It's definitely something I'd like to complete if I get a chance.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show all replies
Re: AmiQuake 2 - new 68k Quake 2 Port
« Reply #1 on: February 07, 2013, 08:51:17 PM »
Quote from: Crumb;725649
Oh, sorry, I mean: in a hypothetic case one wanted to create a Warp3D/Wazp3D driver for m68k would there be a "official" way to add stuff to the ring buffer?


Not really, at least as far as I know. The radeon ringbuffer is a hardware feature that's used to submit command packets to the graphics hardware. The architecture defines a relatively sensible packet structure (no direct register poking) for commands and even their data (up to a certain size). Software writes packets through one pointer, updating it as it goes and the chip eats them, updating it's own pointer. It's one of several mechanisms to get data to the chip. In OS4.1, communication is mediated through the RadeonCP.resource and the 2D, compositor and Warp3D drivers all use it.

Quote
Since m68k warp3D also runs on CGX drivers I guess that the system resource is "new" or just p96 related.


As far as I know, the resource is part of OS4.1.

Quote
I guess Wazp3D does something like that with its Warp3D wrapper for windows OpenGL.


I think it just implements a basic state tracker for the W3D_Context structure and trampolines into OpenGL with all of OpenGL's T&L features disabled or set in a manner that won't modify the incoming vertex data. Remember, Warp3D is just a rasterizer and expects that all the transformation, lighting and clipping was done elsewhere. That suited the cards it originally was written for but is also it's key drawback on anything more advanced than the voodoo cards - since the API provides no T&L functionality, there's nothing for any existing hardware T&L to accelerate. In short, the R100 and above are very underutilized.

It is my conjecture that a primitive T&L pipeline could be added that is easy to map onto the R100/R200's hardware while simultaneously being relatively easy to interleave (such that it optimally overlaps with the hardware rasterizing) into the other drivers as software operations, but even if this were done, without a rewrite of MiniGL to take advantage of it, there'd be little benefit.

Quote
Perhaps you could start reserving the buffer memory in BlizzardVision memory so it accesses gfx ram, once you get that working, use a buffer in fastram, addresses may be different thought


Believe me, I've tried all sorts of things to get the Permedia2 on the BVision to load it's own input FIFO via DMA from a buffer. Conceptually, it's a piece of cake, but in reality it's not so straight forward.

Quote
is that resource a flag in a register of R100/R200 or a real AmigaOS resource?


It's an OS resource.
int p; // A