The gain using 060 over 040 should set give hints as to where the real bottlenecks are. If the whole system was working ideally (true DMA, busmastering and what not), the CPU would be about the last thing to have any effect on transfer speeds here.
However, you ultimately end up copying data to/from the network card, over the bus to system memory using the CPU. To make it worse, the typical stack implementation tends to cause data to be re-copied as it passes between layers.
Hence the speed of these things is strongly dependent on the CPU. Even if you had a super fast CPU, you won't be able to reach the full potential of the 100M card on this hardware before you hit bus saturation on the mediator.