Agreed. However, I'm specifically talking about tests that were carried out at 640x480 in 16-bit RGB.
The thing is that we never even reach the fill rate of either chipset on 680x0. The setup code (yes, after you've transformed your triangle list) is still expensive - there is a lot of steps required between reading a vertex array and writing the chip specific data. Then there is the physical bus speed which ultimately puts a wall on how fast the chip specific data can be fed to the GPU. On a system where your CPU power is very high, this last factor is decisive.
Imagine, for the sake of argument, each full vertex definition requires 32 bytes of data (coordinates, colour, texture coordinates, fog value etc).
There are 3 unique vertices per triangle. The Maximum write bandwidth of the BVision on a 25MHz 68040 is about 10M/s (was 18M/s on 060 which crucified the Mediator4000)
This gives us 10*1024*1024/32 = 327680 vertices/sec, which is 109227 triangles/sec limit. The Permedia2's delta unit can actually handle 800,000 triangles/sec as long as you can feed it quickly enough. Yes, the fill rate is low for the permedia2, only 43M/pix sec for fully shaded/textured/depth buffered (double that for non depth buffered) but in realistic applications you dont get close to this limit.
You can see, however, the P2 is already underutilised. You will only run into the fill rate limitation for large screen sizes, which due to the 8M RAM limit are impractical for 3D anyway.
The Voodoo3000, of course can handle way more than this. However, it too is limited by the physical bus speed (especially on 680x0/Mediator). Typical 68040 VRAM write speed is 7M/s on my loaned 28MHz 68040/Mediator1200/Voodoo3000 system. Thats already 30% lower than a slightly slower CPU on the BPPC.
Both cards are underutilised, but the voodoo is underutilised more than the Permedia2. Especially considering the voodoo hardware can do multitexturing in a single pass but is not yet supported at a driver level (to the best of my knowledge).