Author Topic: Off topic: Warp3D stuff (branched from AmiQuake2 AGA thread) (Read 1426 times)

Crumb · « **on:** February 06, 2013, 12:45:15 AM »

Quote from: Karlos;725478

One idea I have is that if Warp3D were to include a primitive T&L pipeline, GL could be improved on this old hardware. A lot of time in the driver is wasted waiting for the GPU FIFO when instead it could be performing transformation and clipping.

Is there any "legal" way to put new commands into the ring buffer of the gpu (aka GPU FIFO)? Or just examining the GPU registers to see if the read pointer and write pointer are equal and assume it's ready to add the new commands there? I guess p96/cgx waits until GPU FIFO is ready, otherwise crashes may happen. I guess that the difference between CGX and P96 in this respect is that CGX driver will probably offer a "legal" way to add new pointers to command buffers and with P96 you can't "interleave" commands from both systems.

So with p96 and a Radeon:
1.- seek base address of the gfx card
2.- reserve and lock memory for textures and command buffers
3.- check the addresses of write ptr and read ptr, if both are equal GPU FIFO is empty.
4.- write command in the FIFO to execute the chunk of commands pointed in the command buffer
5.- pray :-)
I ask because it would be nice to have a 68k Warp3D (or Wazp3D probably) driver for Radeon 9x00 :-)

And to continue the hijacking... any plans for a BlizzardPPC OS4 scsi driver? I guess you could add some "legal" way to add commands to move lots of data using the scsi chip in parallel to the cpu*:-) And since G-Rex is supported in NetBSD are you thinking about OS4 support too?

Karlos · « **Reply #1 on:** February 06, 2013, 10:09:49 PM »

Quote from: Crumb;725494

Is there any "legal" way to put new commands into the ring buffer of the gpu (aka GPU FIFO)? Or just examining the GPU registers to see if the read pointer and write pointer are equal and assume it's ready to add the new commands there?

I'm not sure what you are asking exactly. The R100/200 3D drivers communicate with the radeon's ring buffer through the same system resource that the 2D driver does. Just to be clear, my musings above are in relation to the Permedia2 specifically.

Purely hypothetically, if I were to retrofit a 3D T&L pipeline to Warp3D I'd probably do so by exposing an interface that's not a million miles from the model already defined for the R100/200. Some methods to set transformation matrices to be applied to geometry, texture coordinates, lighting and material properties and clipping planes. In the corresponding drivers for R100/200 these would be mapped onto it's hardware implementation of these things for maximum efficiency.

However, as it's still a fairly low-level model, it would lend itself to drivers like the Permedia also. In the Permedia, data registers and command operations are loaded through a FIFO that can be written to directly or through DMA from a buffer. The latter is what I'd like to use but it's proven elusive for me to get working in the BVision (I'm sure it ought to be possible, even if it's from a location in VRAM rather than host mapped memory which is what the original intention was). In the current driver, the FIFO is loaded with data and then the driver has to wait for it to drain. I don't query the remaining FIFO space every time, instead I get a count and only read it again after writing as many entries as the last count value. That removes some overhead, but in the end, if you are rendering blended, perspective textured, fogged z-buffered polygons you will inevitably reach a position where the CPU is able to load the FIFO faster than the Permedia is emptying it and you end up polling the count register until there's enough room for your next operation.

We could use that time more productively if we were reading untransformed vertex data from memory, performing the transformation calculations in software and then writing the result to the FIFO because there'd be an improvement in the parallelism. It would take slightly longer per vertex to take a user-coordinate-space triangle fan and write it to the chip, but at the same time, you'd be doing those calculations at a point where you previously were just polling a register having already spent a similar amount of CPU time beforehand to do the transformation elsewhere.

Quote

I guess p96/cgx waits until GPU FIFO is ready, otherwise crashes may happen. I guess that the difference between CGX and P96 in this respect is that CGX driver will probably offer a "legal" way to add new pointers to command buffers and with P96 you can't "interleave" commands from both systems.

As I said, both the 3D and 2D drivers use the same resource on R200/100. However, Warp3D takes owenrship of the hardware system so at least when a hardware lock is in place, nobody else can be submitting packets to it. Outside of that situation, anybody can write to the resource. That's what the command processor / ring buffer was intended for.

Quote

And to continue the hijacking... any plans for a BlizzardPPC OS4 scsi driver?

Well I started, but I haven't really made progress due to a lack of available free time. I had to prioritise and the Warp3D I worked on affects more users overall. It's definitely something I'd like to complete if I get a chance.

Crumb · « **Reply #2 on:** February 07, 2013, 09:38:39 AM »

Quote from: Karlos;725609

I'm not sure what you are asking exactly.

Oh, sorry, I mean: in a hypothetic case one wanted to create a Warp3D/Wazp3D driver for m68k would there be a "official" way to add stuff to the ring buffer?

Quote

The R100/200 3D drivers communicate with the radeon's ring buffer through the same system resource that the 2D driver does. Just to be clear, my musings above are in relation to the Permedia2 specifically.

Since m68k warp3D also runs on CGX drivers I guess that the system resource is "new" or just p96 related.

Quote

Purely hypothetically, if I were to retrofit a 3D T&L pipeline to Warp3D I'd probably do so by exposing an interface that's not a million miles from the model already defined for the R100/200. Some methods to set transformation matrices to be applied to geometry, texture coordinates, lighting and material properties and clipping planes. In the corresponding drivers for R100/200 these would be mapped onto it's hardware implementation of these things for maximum efficiency.

I guess Wazp3D does something like that with its Warp3D wrapper for windows OpenGL.

Quote

However, as it's still a fairly low-level model, it would lend itself to drivers like the Permedia also. In the Permedia, data registers and command operations are loaded through a FIFO that can be written to directly or through DMA from a buffer.

Perhaps you could start reserving the buffer memory in BlizzardVision memory so it accesses gfx ram, once you get that working, use a buffer in fastram, addresses may be different thought

Quote

As I said, both the 3D and 2D drivers use the same resource on R200/100. However, Warp3D takes owenrship of the hardware system so at least when a hardware lock is in place, nobody else can be submitting packets to it. Outside of that situation, anybody can write to the resource. That's what the command processor / ring buffer was intended for.

is that resource a flag in a register of R100/R200 or a real AmigaOS resource?

Karlos · « **Reply #3 on:** February 07, 2013, 08:51:17 PM »

Quote from: Crumb;725649

Oh, sorry, I mean: in a hypothetic case one wanted to create a Warp3D/Wazp3D driver for m68k would there be a "official" way to add stuff to the ring buffer?

Not really, at least as far as I know. The radeon ringbuffer is a hardware feature that's used to submit command packets to the graphics hardware. The architecture defines a relatively sensible packet structure (no direct register poking) for commands and even their data (up to a certain size). Software writes packets through one pointer, updating it as it goes and the chip eats them, updating it's own pointer. It's one of several mechanisms to get data to the chip. In OS4.1, communication is mediated through the RadeonCP.resource and the 2D, compositor and Warp3D drivers all use it.

Quote

Since m68k warp3D also runs on CGX drivers I guess that the system resource is "new" or just p96 related.

As far as I know, the resource is part of OS4.1.

Quote

I guess Wazp3D does something like that with its Warp3D wrapper for windows OpenGL.

I think it just implements a basic state tracker for the W3D_Context structure and trampolines into OpenGL with all of OpenGL's T&L features disabled or set in a manner that won't modify the incoming vertex data. Remember, Warp3D is just a rasterizer and expects that all the transformation, lighting and clipping was done elsewhere. That suited the cards it originally was written for but is also it's key drawback on anything more advanced than the voodoo cards - since the API provides no T&L functionality, there's nothing for any existing hardware T&L to accelerate. In short, the R100 and above are very underutilized.

It is my conjecture that a primitive T&L pipeline could be added that is easy to map onto the R100/R200's hardware while simultaneously being relatively easy to interleave (such that it optimally overlaps with the hardware rasterizing) into the other drivers as software operations, but even if this were done, without a rewrite of MiniGL to take advantage of it, there'd be little benefit.

Quote

Perhaps you could start reserving the buffer memory in BlizzardVision memory so it accesses gfx ram, once you get that working, use a buffer in fastram, addresses may be different thought

Believe me, I've tried all sorts of things to get the Permedia2 on the BVision to load it's own input FIFO via DMA from a buffer. Conceptually, it's a piece of cake, but in reality it's not so straight forward.

Quote

is that resource a flag in a register of R100/R200 or a real AmigaOS resource?

It's an OS resource.

delshay · « **Reply #4 on:** February 08, 2013, 09:39:41 PM »

DMA driver for Bvision would be nice as I have got the Blizzard card PCI slot to work upto 41.5Mhz*.

* can operate higher but is PPC CPU/memory bus limited. Expect new world record in dram bandwidth & PCI benchmark never seen before on Blizzard Card.. PCI 41.5Mhz+

Author Topic: Off topic: Warp3D stuff (branched from AmiQuake2 AGA thread) (Read 1426 times)

Crumb

Off topic: Warp3D stuff (branched from AmiQuake2 AGA thread)

Karlos

Re: AmiQuake 2 - new 68k Quake 2 Port

Crumb

Re: AmiQuake 2 - new 68k Quake 2 Port

Karlos

Re: AmiQuake 2 - new 68k Quake 2 Port

delshay

Re: Off topic: Warp3D stuff (branched from AmiQuake2 AGA thread)