I can't say I've ever sat there twiddling my thumbs waiting for the memory/system to react. It usually works efficiently and silently, unnoticed in the background, though I don't ever do a CPU RAM check to see what is being used where and when, and how fast it is being used, etc., but it seems OK for my needs. I also have never noticed anyone on the Mediator forum ever state they have been waiting for the RAM/CPU to catch up with a task, seeing as you are making out that it is so slow, as it just isn't a noticeable problem IMHO.
Though you seem to comprehend what I've said, you seem to have misunderstood the point I was making. The following applies to the Mediator 1200. I can't speak for the 4000 model as I don't own one.
Firstly, I'm not making anything out to be slow, it is slow. I've measured the performance of reading and writing video memory. In fact, when these expansions first appeared, I got lots of people to perform benchmarks for me so that I could compare the memory access speed (from the CPU) of Mediator versus things like BVision. The video ram on my Mediator 1200 / Voodoo 3000 system I have is significantly slower to read and write from the CPU than that my BVision, for example, even though the 040 in my Mediator system is faster than the one in my BVision one.
So, to reiterate, you made an issue that the DMA methods supported in 3.9 should be supported in 4.x or it's all a bit pointless. The truth of the matter is that the DMA methods supported in 3.9 are of limited usefulness even in 3.9 since the limiting factor in a Mediator configuration is not the speed at which different PCI cards can talk to each other and it never was. The limiting factor is how fast any expansion on the Mediator can be read from or written to by the CPU.
In any normal PCI machine, your attached devices don't talk to each other that much (the only obvious exception here would be something like a TV tuner card, where having the ability to DMA write the decoded video into your graphic's card's overlay memory would be perfectly sensible), but instead will perform DMA transfers to and from
main memory. However, that's just not possible on the Mediator since it's not part of your accelerator card and has no access to the memory on it. So, what Elbox did, was to use a chunk of ram on the video card as a DMA buffer for data that would be going between PCI cards and the rest of the system. In theory it's not a bad idea, but as I said, the benefit of buffering depends on the specific operation you're doing. Holding on to incoming audio that will be eventually streamed to disk is a case that makes sense. Holding onto incoming network packets is not.
In the end, whether you are reading buffered data from the video card's memory or reading data directly from any of the PCI devices on the bus there's little speed difference, so no matter how fast your cards can DMA into your video card's memory, in the end it doesn't help that much - TV tuner card example excepted.
Elbox's claim about zero CPU intervention DMA transfers is a bit of a red herring when looking at the full picture. Sure, your network or sound card (both of which should have some local buffer memory of their own) can DMA data to your video card without the CPU getting involved, but once that stage of the transfer is done, the CPU has to read that data from the video ram in order to do anything with it. I'm not convinced it's much faster than reading data directly from the source device on the PCI bus. For some devices, it might even be slower.